engineer at Quantcast
accelerating data processing
some development tips,
a few minutes talking
how we use Spark SQL
some of the modeling problems
and we're going to discuss
order to make this data
fast with Pandas UDFs,
and very aggressively optimize
SQL intermediate rows.
in memory as possible.
try to aggregate keys.
our specific data problem
very well with sparse data
for this example problem,
talk about Python libraries.
now that you have written
case, the factor of about three.
feedback is very important.
review these sessions
with our first section.
is a user defined function
working with structured data.
a great way of writing
that isn't supported
way to do all of this.
types of Pandas UDFs.
struck types in Spark SQL,