Accelerating Data Processing in Spark SQL with Pandas UDFs

Created with glancer

engineer at Quantcast accelerating data processing some development tips, a few minutes talking how we use Spark SQL some of the modeling problems
and we're going to discuss order to make this data fast with Pandas UDFs, and very aggressively optimize
SQL intermediate rows. in memory as possible. try to aggregate keys. our specific data problem
very well with sparse data for this example problem, talk about Python libraries. now that you have written case, the factor of about three. feedback is very important.
review these sessions with our first section. is a user defined function working with structured data. a great way of writing
that isn't supported way to do all of this. types of Pandas UDFs. struck types in Spark SQL,