2021#03 Readings
4 minutes read | 803 words by Ruben BerenguelThis is also a video-heavy edition, I keep chugging along my watch list with Glancer
I have also created a small video of how GuillotineJS works (below). I created another one for Guillotine, but it didn’t look as good (yeah).
📯 Down memory lane: the Hive paper
I have summarised the original (Apache) Hive paper. It is a very readable paper, and the summary is short. Of course I recommend you give it a look 😉.
🍿 Using Super Glue and Baking Soda to Repair a Plastic Switch Plunger
Worth knowing for repairing stuff that breaks.
🍿 Airflow 2.0
I don’t care that much about the HA scheduler (we only schedule batch, Airflow down would be a critical issue… that anyone in infrastructure/devops can fix in 15 minutes) but smart sensors and a better way to do xcom seem a godsent. Also, a better REST API makes it easier to trigger DAGs from DAGs (which can be better than subdags for certain operations, like cleanups, or vacuums).
🍿On Improving Broadcast Joins in Apache Spark SQL
I wasn’t aware broadcasted joins passed through the driver (it makes sense on hindsight). Recently I had to un-broadcast what used to be an easy “map side join” due to it growing until it passed 1GB, where the payoff of broadcast does not seem to compensate network transfer cost. AQE shoudl help in this case though, and the ideas here are interesting to see implemented.
🍿Pandas UDF and Python Type Hint in Apache Spark 3.0
The use of type hints is a massive usability improvement from Spark 2.4. This talk is an excellent walkthrough over the different types that can be used.
🍿 Poetic Computation
These are a set of inspiring ideas on generative art. This year’s Github Universe has had a really excellent Play track.
An overview of end-to-end entity resolution for big data
Interesting. Technically I implemented entity resolution for our mapping system, although in our case it is fully deterministic (same cookie -> same user). It’s basically a very large (incremental) connected component computation in Spark. Our graph has 3 billion nodes!
🍿 GeoShred Tutorial
I have said many times I suck at music but love making sounds. I recently got this iOS app (it is more or less a physical simulator for real instruments) and it is pretty impressive. Here’s a sample of play (not mine of course). The tutorial is a great showcase of features and guitar modelling.
Ditherpunk — The article I wish I had about monochrome image dithering
This is a fascinating rabbit hole on dithering algorithms, inspired by the relatively famous game Return of the Obra Dinn, by Lucas Pope (of Papers, Please fame). I have checked a bit of references about dithering for tweaking images for my 7-color eInk display, and this was the icong on top.
Metaballs and Marching Squares
This is a brilliant reference for marching squares (which is like marching cubes but in 2D), and it can be used for metaballs, to boot. When I find the time I will use it to speed up Blot/Painting.
Smooth Voxel Terrain (Part 2)
A marching cubes one.
New Yorkers Can Now Legally Nunchuck
So, New Yorkers could not own nunchucks until now… Well, have a look at Bruce Lee playing table tennis with nunchucks just in case.
Feature Store
I’m not sure I get the feature store craze. Most examples I have heard are glorified databases. After some comments from Uwe Korn, I think the idea of feature store would make sense if it was a store for functions or lambdas: you would define the extraction of the feature from the dataset and then you serve the feature against the target dataset.
The Geometric Landscapes of Lorenz Stoer (1567)
They look pretty good as an iPhone background. They are also pretty cool-looking on their own.
Permutate parsers, don’t validate
You won’t go to bed without learning something new about parser combinators. This will come in handy for a parser I have (a project I may post in February or March).