2020#41 Readings of the Week
4 minutes read | 713 words by Ruben BerenguelNo specific theme this week, but feels more data engineering heavy than lately. As it should. Oh, and beware door knobs, they can bring evil.
NOTE: The themes are varied, and some links below are affiliate links. Apache Spark, category theory, Scala, Python, data engineering, engineering management. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here. You can also get these as a weekly newsletter by subscribing here.
Coronavirus: the second-weirdest solution?
Maybe some kind of mental training can work.
Enhance Your Databricks Workflow
We’ve been using Databricks for some time now. I’m more in the Scala side than the Python side, but if we decide to start writing jobs mostly in Python, I’ll keep this post in mind.
When Bloom filters don’t bloom
I think a Bloom filter didn’t fit the problem too well from the start, but I guess I’m biased because I have read the article. Hindsight is so 2020.
Loading NumPy arrays from disk: mmap() vs. Zarr/HDF5
I don’t think I have ever been in a position where I have needed such speed, but I’ll keep this in mind.
AWS unit tests with localstack and scalatest
This is a code gist. Worth starring if you understand what the subject is about.
Prevent Repetition with Assignment Expressions
The infamous walrus from Python. Oh, Python simplicity of yore, where thou hidst?
Hit the mute button: why everyone is trying to silence the outside world
Active Noise Cancelling (or Control) is a rising business. Typed while wearing Bowers & Wilkins' PX7s set at high.
A Vaccine Won’t Stop the New Coronavirus - The Atlantic
We can only expect to slow down the spread so that health systems can cope with most people getting ill.
Typed functional Dependency Injection in Python
I hate dependency injection. Or at least, magic DI frameworks that don’t state explicitly what is going on.
BFG & Muddy Waters - Billy F Gibbons of ZZ Top
I don’t remember how I found this article. Probably the harmonica roots of Muddy Waters and the fun looks of ZZ Top made me read it, and it wasn’t wasted.
The prisoner’s dilemma at 70 – at what we get wrong about it
From Tim Harford. I found slightly weird that he mentioned climate change as an example of the Prisoner’s Dilemma, for me it’s one of the best examples of Tragedy of the Commons. Maybe there is something I can’t see there.
Reorder JOIN optimizer - cost-based optimization
An overview (with code and examples) of how the join optimiser in Apache Spark uses CBO.
Down on the Farm That Harvests Metal From Plants
Beware the NY Times Great Paywall. The article is super interesting though.
Product vs. Feature Teams
There is something vaguely depressing about this post and I can’t put my finger on what or where.
Five principles that will keep your data warehouse organized
From the creators of DBT. Good rules.
On editing text
A categorical approach to text editing. It’s unexpected.
🔊 Matt Rocklin - Parallel Computing & Founding OSS Companies (PyData Deep Dive podcast)
I’ve been interested in Dask for some time now (wow, closing in to… 3 years?). This is a good chat about it.
Newsletter?
These weekly posts are also available as a newsletter. These days (since RSS went into limbo) most of my regular information comes from several newsletters I’m subscribed to, instead of me going directly to a blog. If this is also your case, subscribe by clicking here.
Buy me a coffee