Below you will find pages that utilize the taxonomy term “Apache Spark”
I was on J on the Beach, so skipped last weekend.
I took some days off for Easter, and I definitely needed them.
A middle-of-the-week one because it’s Easter and I may not write this during days off.
This turned out a long one
Meetings galore.
RIP Michael (originally Marvin) Lee Aday, Meat Loaf 🤘
Haven’t read much these days, but luckily I have not added much to the list either.
Happy new year!
A 2-week’s worth of readings means a longer than usual list, as usual.
This past week we were on holidays 🎉.
Days of fire Kafka and thunder SSL.
This Apache Spark feature has made us scratch our heads way too much.
I had a very entertaining week.
This past week has been Data+AI Summit, so there are several new product announcements from Databricks.
This week is Data+AI summit week.
Not much to report. I’m still in kind of an article reading slump (my backlog is larger than 50 right now).
Timezones and UTF are rocks you repeatedly hit in your data journey.
My days are consolidating into piano, work, VR, piano, sleep, loop
This is not an overly long list, but covers a surprisingly large amount of topics.
Sweet, sweet holidays.
Not sure what I did this past week aside from finishing a post: I read very little.
Lakehouse is the brand name for the underlying architecture of Databricks' Delta Lake: A data lake that is as performant as a data warehouse.
This is also a video-heavy edition, I keep chugging along my watch list with Glancer
Hive is arguably old. It is also undoubtedly useful, even now: 10 years after it was introduced.
The video edition
As promised, the numbering of these posts is now year-indexed.
Managing logging in Spark ain’t easy, and is even harder in managed clouds like Databricks or EMR.
The last of 2020.
I have been on holidays this week, playing VR and preparing videos for an online Python event I co-organise.
This is a short edition.
Finished that post, now started the next. And having all Fridays off is awesome.
This is the next instalment on my quest to read and help understand interesting papers in the data space.
I am having a very hard time finishing my summary of the RDD paper.
I have played a ton of virtual table tennis this week.
After reading the Snowflake paper, I got curious about how similar engines work. Also, as I mentioned in that article, I like knowing how the data sausage is made. So, here I will summarise the Delta Lake paper by Databricks.
I have dropped the Weekly from the title. It was about time.
The one where Airflow messes with you.
I have been on holidays, which has resulted in a lot of reading, mostly books.
I am on holidays (starting yesterday)! Two weeks of probably more programming than usual 🤣
I should remove the Weekly moniker of these posts and emails. They are done when they are done. Enjoy!
I was a week off, and this delayed this post by a week. So, this is a long one: have fun for the 50th edition!
An early one: For the first time in… not sure how, I’m going to be a whole week off. Which implies no computer.
Spark 3 is here! Rejoice!
Another hard push at reducing my reading list. At the current pace I may not write many more of these posts.
I made a hard push to clean up my reading list, deleting a lot and reading another lot. It went from 369 items to 99 🎉
Writing generative stuff is eating away at my free time, reducing reading significantly.
The lack of commute is very hard on my reading, and I have also been working on several projects that have eaten into my reading/writing time.
The full lockdown edition. Almost no engineering ¯\_(ツ)_/¯
. The lack of commute is hard on reading articles.
The stay at home edition. Stay safe these days, and remember to wash your hands and keep your distance.
No specific theme this week, but feels more data engineering heavy than lately. As it should. Oh, and beware door knobs, they can bring evil.
Update on read books this year, I had forgotten on my previous posts.
First edition of the New Year. As eclectic as usual, I hope. The audio-based monitoring of servers and the weird uses of the GPT-2 neural network could be two highlights.
On time this week. Nothing remarkable: I’m winding down a bit my reading (both articles and books) in preparation for the yearly review and having some cooldown period.
The map and problem described here were part of my presentation Mapping as a tool for thought, and mentioned in my interview with John Grant and Ben Mosior (to appear sometime soon in the Wardley Maps community youtube channel). I’m looking for ideas on how to make this map easier to understand and useful, so I posted it to the Wardley Maps Community forums requesting comments.
You know how you slip once on a habit and everything goes crazy? Well, I’ve been 4 weeks without writing these, so here’s the accumulated reading from 4 weeks. Because, even if I don’t write it, I read a lot anyway. Also, there’s lot of interesting content this “week”.
Sorry for the delay, Sunday was my birthday (also, Elmo’s, and The Day The Music Died as well) and I spent the day without access to a computer.
The year has ended, what has been going on?
This week I have been working a lot with a relatively large dataset on a Spark shell. It was a graph with 1 billion nodes and 2 billion edges that I wanted to analyse with GraphFrames (the successor of GraphX on Spark).
Almost two months ago (time sure flies) I attended for the second time the conference Scala eXchange, one of the largest Scala conferences in the world, and which happens to be 1 tube stop from the office you can find me from time to time in London.
I am trying to make these posts a tradition (even if a few days late). I thought 2016 had been a really weird and fun year, but 2017 has beaten it easily. And I only hope 2018 will be even better in every way. For the record, when I say we, it means Laia and me unless explicitly changed.