2020#62 Readings
3 minutes read | 602 words by Ruben BerenguelThis is a short edition.
📯 D3 v6 Chord Diagram
This is a direct link to my sketches repository, where I post my generative fun code. This is a bit borderline, since it’s not that fun. Some months ago I was looking for a chord diagram in D3 that worked fine in version 6 and was more or less straightforward to understand. Since I could find none, I created one from older pieces and a Python notebook to generate the dataset. I will be writing up some more details in a blog post, but you can take a look already.
Building an analytical data lake with Apache Spark and Apache Hudi - Part 1
Hudi is (kind of) the fully open source driven alternative to Delta Lake.
📚 The dip
A Seth Godin classic that for some reason I hadn’t read. It’s ok. Since it’s a very short read, it doesn’t matter that much if it’s good or bad.
Technical debt is overhyped; let’s talk about product debt
Product debt can be orders of magnitude larger than technical debt, in terms of cost for the company. But nobody gives a s**t about it until it’s too late.
How DAGs grow: Deeper, wider, and thicker
So true! To be fair to Great Expectations, they help in making DAGs appear in the deeper section. But for a good reason.
Chaos Data Engineering
This looks like a very interesting approach (although its overlap with Delta Lake is large). When writing new jobs and pipelines, we’d use any failures (since jobs will eventually fail in one way or another: I call it the Jurassic Park effect) as a model for additional automatic recoverability. In the next iteration, all jobs should be able to survive that kind of failure.
The Rise and Fall of Getting Things Done
The title is quite clickbaity (which I suspect is a feature, not a fan of many of Cal Newport’s approaches), but there are some interesting viewpoints in here.
How do Spotify Codes work?
Gray codes are always a good subject to write about. Note that they can be pretty efficient when paired with delta encoding
awk-raycaster: Pseudo-3D shooter written completely in gawk using raycasting technique
All hail the AWK.
🔊 Billion dollar loser
It was OK. I enjoyed more the schadenfreude of Bad Blood, which I listened in one weekend.
How the Seahawks are using a data lake to improve their game
Moneyball goes to the cloud.
Test for divisibility by 13 (and 7 and 11)
It’s very well presented, but not that groundbreaking: in high school I got as assignment trying to find a rule for 7 or 11 (not sure now) and I arrived at this one. After all, they are just based in base-10 expansion, so with a little bit of patience and basic algebra you can get there.