2022#13 Readings 🇺🇦🌻

May 7, 2022 4 minutes read | 690 words

I was on J on the Beach, so skipped last weekend.

2022#12 Readings 🇺🇦🌻

Apr 18, 2022 3 minutes read | 564 words

I took some days off for Easter, and I definitely needed them.

2022#11 Readings 🇺🇦🌻

Apr 12, 2022 3 minutes read | 600 words

A middle-of-the-week one because it’s Easter and I may not write this during days off.

2022#09 Readings 🇺🇦🌻

Mar 27, 2022 5 minutes read | 911 words

This turned out a long one

2022#08 Readings 🇺🇦🌻

Mar 6, 2022 3 minutes read | 573 words

Meetings galore.

2022#04 Readings

Jan 23, 2022 5 minutes read | 926 words

RIP Michael (originally Marvin) Lee Aday, Meat Loaf 🤘

2022#03 Readings

Jan 15, 2022 4 minutes read | 682 words

Haven’t read much these days, but luckily I have not added much to the list either.

2022#01 Readings

Jan 2, 2022 4 minutes read | 825 words

Happy new year!

2021#22 Readings

Sep 26, 2021 4 minutes read | 674 words

A 2-week’s worth of readings means a longer than usual list, as usual.

2021#21 Readings

Sep 11, 2021 4 minutes read | 772 words

This past week we were on holidays 🎉.

Setting up Kafka with SSL and accessing it with Go

Sep 3, 2021 5 minutes read | 948 words

Days of ~~fire~~ Kafka and ~~thunder~~ SSL.

JSON woes in Apache Spark

Aug 25, 2021 4 minutes read | 667 words

This Apache Spark feature has made us scratch our heads way too much.

2021#20 Readings

Aug 21, 2021 5 minutes read | 1046 words

I had a very entertaining week.

2021#15 Readings

Jun 19, 2021 5 minutes read | 860 words

2020#14 Readings

May 30, 2021 5 minutes read | 906 words

This past week has been Data+AI Summit, so there are several new product announcements from Databricks.

2020#13 Readings

May 24, 2021 4 minutes read | 678 words

This week is Data+AI summit week.

2021#12 Readings

May 9, 2021 4 minutes read | 796 words

Not much to report. I’m still in kind of an article reading slump (my backlog is larger than 50 right now).

UTF-8 Issues between AWS Redshift and Apache Spark when COPY PARQUET

Mar 31, 2021 2 minutes read | 343 words

Timezones and UTF are rocks you repeatedly hit in your data journey.

2021#07 Readings

Feb 28, 2021 3 minutes read | 572 words

My days are consolidating into piano, work, VR, piano, sleep, loop

2021#06 Readings

Feb 15, 2021 3 minutes read | 603 words

This is not an overly long list, but covers a surprisingly large amount of topics.

2021#05 Readings

Feb 8, 2021 4 minutes read | 823 words

Sweet, sweet holidays.

2020#04 Readings

Jan 25, 2021 4 minutes read | 672 words

Not sure what I did this past week aside from finishing a post: I read very little.

Lakehouse: It's like Delta Lake, but not really

Jan 19, 2021 5 minutes read | 1041 words

Lakehouse is the brand name for the underlying architecture of Databricks' Delta Lake: A data lake that is as performant as a data warehouse.

2021#03 Readings

Jan 17, 2021 4 minutes read | 803 words

This is also a video-heavy edition, I keep chugging along my watch list with Glancer

Down memory lane: the Hive paper

Jan 12, 2021 6 minutes read | 1124 words

Hive is arguably old. It is also undoubtedly useful, even now: 10 years after it was introduced.

2021#02 Readings

Jan 9, 2021 5 minutes read | 915 words

The video edition

2021#01 Readings

Jan 3, 2021 4 minutes read | 690 words

As promised, the numbering of these posts is now year-indexed.

Configuring log4j properties in Databricks (and EMR)

Jan 1, 2021 4 minutes read | 648 words

Managing logging in Spark ain’t easy, and is even harder in managed clouds like Databricks or EMR.

2020#66 Readings

Dec 26, 2020 3 minutes read | 619 words

The last of 2020.

2020#64 Readings

Dec 12, 2020 4 minutes read | 774 words

I have been on holidays this week, playing VR and preparing videos for an online Python event I co-organise.

2020#62 Readings

Nov 22, 2020 3 minutes read | 602 words

This is a short edition.

2020#61 Readings

Nov 13, 2020 4 minutes read | 774 words

Finished that post, now started the next. And having all Fridays off is awesome.

The RDD paper: introducing the Spark general purpose framework

Nov 8, 2020 9 minutes read | 1909 words

This is the next instalment on my quest to read and help understand interesting papers in the data space.

2020#60 Readings

Oct 31, 2020 4 minutes read | 796 words

I am having a very hard time finishing my summary of the RDD paper.

2020#59 Readings

Oct 24, 2020 4 minutes read | 764 words

I have played a ton of virtual table tennis this week.

Databricks' Delta Lake: high on ACID

Oct 12, 2020 15 minutes read | 3024 words

After reading the Snowflake paper, I got curious about how similar engines work. Also, as I mentioned in that article, I like knowing how the data sausage is made. So, here I will summarise the Delta Lake paper by Databricks.

2020#57 Readings

Oct 11, 2020 5 minutes read | 976 words

I have dropped the Weekly from the title. It was about time.

Running SparkSQL on Databricks via Airflow's JDBC operator

Oct 5, 2020 4 minutes read | 682 words

The one where Airflow messes with you.

2020#55 Readings of the Week

Sep 19, 2020 4 minutes read | 845 words

I have been on holidays, which has resulted in a lot of reading, mostly books.

2020#54 Readings of the Week

Sep 5, 2020 5 minutes read | 896 words

I am on holidays (starting yesterday)! Two weeks of probably more programming than usual 🤣

2020#51 Readings of the Week

Aug 14, 2020 6 minutes read | 1130 words

I should remove the Weekly moniker of these posts and emails. They are done when they are done. Enjoy!

2020#50 Readings of the Week

Jul 12, 2020 6 minutes read | 1198 words

I was a week off, and this delayed this post by a week. So, this is a long one: have fun for the 50th edition!

2020#49 Readings of the Week

Jun 26, 2020 5 minutes read | 942 words

An early one: For the first time in… not sure how, I’m going to be a whole week off. Which implies no computer.

2020#48 Readings of the Week

Jun 20, 2020 8 minutes read | 1548 words

Spark 3 is here! Rejoice!

2020#47 Readings of the Week

Jun 13, 2020 7 minutes read | 1366 words

Another hard push at reducing my reading list. At the current pace I may not write many more of these posts.

2020#46 Readings of the Week

Jun 7, 2020 5 minutes read | 900 words

I made a hard push to clean up my reading list, deleting a lot and reading another lot. It went from 369 items to 99 🎉

2020#45 Readings of the Week

Jun 1, 2020 5 minutes read | 882 words

Writing generative stuff is eating away at my free time, reducing reading significantly.

2020#44 Readings of the Week

May 10, 2020 5 minutes read | 915 words

The lack of commute is very hard on my reading, and I have also been working on several projects that have eaten into my reading/writing time.

2020#43 Readings of the Week

Mar 28, 2020 4 minutes read | 682 words

The full lockdown edition. Almost no engineering ¯\_(ツ)_/¯. The lack of commute is hard on reading articles.

2020#42 Readings of the Week

Mar 14, 2020 3 minutes read | 598 words

The stay at home edition. Stay safe these days, and remember to wash your hands and keep your distance.

2020#41 Readings of the Week

Mar 7, 2020 4 minutes read | 713 words

No specific theme this week, but feels more data engineering heavy than lately. As it should. Oh, and beware door knobs, they can bring evil.

2020#40 Readings of the Week

Feb 29, 2020 3 minutes read | 566 words

Update on read books this year, I had forgotten on my previous posts.

2020#37 Readings of the Week

Jan 11, 2020 6 minutes read | 1097 words

First edition of the New Year. As eclectic as usual, I hope. The audio-based monitoring of servers and the weird uses of the GPT-2 neural network could be two highlights.

2019#36 Readings of the Week

Dec 22, 2019 4 minutes read | 769 words

On time this week. Nothing remarkable: I’m winding down a bit my reading (both articles and books) in preparation for the yearly review and having some cooldown period.

A (section) of a map of the data engineering space

Jul 7, 2019 11 minutes read | 2151 words

The map and problem described here were part of my presentation Mapping as a tool for thought, and mentioned in my interview with John Grant and Ben Mosior (to appear sometime soon in the Wardley Maps community youtube channel). I’m looking for ideas on how to make this map easier to understand and useful, so I posted it to the Wardley Maps Community forums requesting comments.

2019#16 Readings of the week

Jun 10, 2019 3 minutes read | 435 words

Wardley mapping, data engineering and big data, maths. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#15 Readings of the week

Jun 4, 2019 3 minutes read | 439 words

Data engineering, adtech, ZIO, Rust, writing, some miscellaneous stuff. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#14 Readings of the week

May 26, 2019 4 minutes read | 648 words

Data engineering, adtech, functional programing, formal specification. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#13 Readings of the week

May 19, 2019 3 minutes read | 550 words

Functional programming, adtech, history. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#11 Readings of the week

Apr 28, 2019 2 minutes read | 394 words

Software engineering, Spark, history, python. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here. Testing and debugging Apache Airflow We’ve been using Airflow for almost a year (on my suggestion). I’m not super-happy with it, and testing it is one of the pain points.

2019#9 Readings of the week (x4)

Apr 7, 2019 9 minutes read | 1891 words

You know how you slip once on a habit and everything goes crazy? Well, I’ve been 4 weeks without writing these, so here’s the accumulated reading from 4 weeks. Because, even if I don’t write it, I read a lot anyway. Also, there’s lot of interesting content this “week”.

2019#4 Readings of the week

Feb 5, 2019 3 minutes read | 570 words

Sorry for the delay, Sunday was my birthday (also, Elmo’s, and The Day The Music Died as well) and I spent the day without access to a computer.

2019#3 Readings of the week

Jan 27, 2019 3 minutes read | 527 words

Software/data engineering, history, formal systems. Expect a similar wide range in the future as well. You can check all weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2018: Year in Review

Jan 1, 2019 6 minutes read | 1154 words

The year has ended, what has been going on?

Notifications from Spark on an Apple Watch (via IFTTT)

May 5, 2018 3 minutes read | 428 words

This week I have been working a lot with a relatively large dataset on a Spark shell. It was a graph with 1 billion nodes and 2 billion edges that I wanted to analyse with GraphFrames (the successor of GraphX on Spark).

Scala eXchange 2017

Feb 7, 2018 7 minutes read | 1447 words

Almost two months ago (time sure flies) I attended for the second time the conference Scala eXchange, one of the largest Scala conferences in the world, and which happens to be 1 tube stop from the office you can find me from time to time in London.

2017: Year in Review

Jan 6, 2018 9 minutes read | 1743 words

I am trying to make these posts a tradition (even if a few days late). I thought 2016 had been a really weird and fun year, but 2017 has beaten it easily. And I only hope 2018 will be even better in every way. For the record, when I say we, it means Laia and me unless explicitly changed.

Shading dependencies with sbt-assembly (in particular, shapeless in Spark 2.1.0)

May 7, 2017 1 minutes read | 141 words

A few weeks ago I needed to parse configuration files in Scala for a Spark project and decided to use PureConfig. It is incredibly lean to use, needing minimal boilerplate. I recommend you check it out (give also a look at CaseClassy, which I haven’t had time to test yet). Everything seemed straightforward enough, and I got it working pretty quickly (as in, it compiled properly). The surprise? spark-submit failed with a conflict with Shapeless (lacking a witness).