Concept Maps helper

Jul 29, 2023 6 minutes read | 1222 words

I am a big fan of Concept maps, but writing them in Graphviz is annoying. So I wrote a helper.

2023#01 Readings

Mar 19, 2023 4 minutes read | 769 words

I know, I have been silent for quite a while.

2022#25 Readings 🇺🇦🌻

Nov 13, 2022 5 minutes read | 970 words

Had some stuff going on that ate all my available time.

2022#22 Readings 🇺🇦🌻

Sep 18, 2022 2 minutes read | 266 words

Feeling less tired this weekend.

2022#21 Readings 🇺🇦🌻

Sep 3, 2022 3 minutes read | 503 words

Playing with stable diffusion on your own machine is great.

2022#19 Readings 🇺🇦🌻

Aug 14, 2022 3 minutes read | 567 words

Slowly, very slowly, cleaning up my reading list.

2022#17 Readings 🇺🇦🌻

Jul 9, 2022 2 minutes read | 383 words

Slightly shorter because I’ll be on holidays.

The Presto paper

Jun 12, 2022 4 minutes read | 843 words

This is the next installment on my quest to read and help understand interesting papers in the data space.

2022#16 Readings 🇺🇦🌻

Jun 11, 2022 4 minutes read | 722 words

Spent a week in Switzerland… and on the flight back caught COVID.

Ray: Another way to distribute work in a cluster

May 23, 2022 8 minutes read | 1548 words

A new entry on the data papers series. Ray is a distributed framework for next generation AI applications. What does this mean? A scam? Blockchain on AI? Nah, it’s actually pretty cool, it has actors.

2022#15 Readings 🇺🇦🌻

May 22, 2022 3 minutes read | 519 words

Shit ain’t gettin' better.

2022#14 Readings 🇺🇦🌻

May 15, 2022 3 minutes read | 538 words

This has been a really tough week.

2022#13 Readings 🇺🇦🌻

May 7, 2022 4 minutes read | 690 words

I was on J on the Beach, so skipped last weekend.

2022#11 Readings 🇺🇦🌻

Apr 12, 2022 3 minutes read | 600 words

A middle-of-the-week one because it’s Easter and I may not write this during days off.

Apache Druid: analytical queries powered by magic

Apr 9, 2022 6 minutes read | 1085 words

It has been a while since my previous data paper. This time I tackle a less known one.

2022#10 Readings 🇺🇦🌻

Apr 3, 2022 4 minutes read | 726 words

Lorem ipsum dolor sit amet

2022#09 Readings 🇺🇦🌻

Mar 27, 2022 5 minutes read | 911 words

This turned out a long one

Winning stakeholders' trust

Mar 19, 2022 4 minutes read | 751 words

Trust between business stakeholders and engineering (and data, analytics, operations…) teams is a tricky matter.

2022#08 Readings 🇺🇦🌻

Mar 6, 2022 3 minutes read | 573 words

Meetings galore.

2022#07 Readings 🇺🇦🌻

Feb 27, 2022 4 minutes read | 647 words

Слава Україні! Героям слава!

2022#06 Readings

Feb 19, 2022 4 minutes read | 665 words

The dbt issue

2022#04 Readings

Jan 23, 2022 5 minutes read | 926 words

RIP Michael (originally Marvin) Lee Aday, Meat Loaf 🤘

2022#03 Readings

Jan 15, 2022 4 minutes read | 682 words

Haven’t read much these days, but luckily I have not added much to the list either.

2021#02 Readings

Jan 8, 2022 5 minutes read | 1041 words

Another week gone by, with a long list of readings seen pass.

2022#01 Readings

Jan 2, 2022 4 minutes read | 825 words

Happy new year!

2021#27 Readings

Dec 25, 2021 6 minutes read | 1074 words

Christmas edition!

2021#26 Readings

Dec 20, 2021 7 minutes read | 1444 words

End of year cleanup, so a lot of goodies this time

Data pipelines with Alloy, Take 2

Dec 12, 2021 7 minutes read | 1407 words

In which I write some easy Alloy code for a data model, with change over time.

Docker replacements (particularly in Mac M1)

Nov 21, 2021 1 minutes read | 210 words

An unusual collection of links.

2021#23 Readings

Oct 24, 2021 5 minutes read | 974 words

As usual, skipping an edition means a longer collection later on.

Concept Maps

Oct 16, 2021 4 minutes read | 852 words

No, they are not mind maps.

2021#22 Readings

Sep 26, 2021 4 minutes read | 674 words

A 2-week’s worth of readings means a longer than usual list, as usual.

2021#21 Readings

Sep 11, 2021 4 minutes read | 772 words

This past week we were on holidays 🎉.

Setting up Kafka with SSL and accessing it with Go

Sep 3, 2021 5 minutes read | 948 words

Days of ~~fire~~ Kafka and ~~thunder~~ SSL.

JSON woes in Apache Spark

Aug 25, 2021 4 minutes read | 667 words

This Apache Spark feature has made us scratch our heads way too much.

2021#20 Readings

Aug 21, 2021 5 minutes read | 1046 words

I had a very entertaining week.

2020#19 Readings

Aug 8, 2021 4 minutes read | 771 words

A week on holidays (in-between jobs), where I read more books than articles.

2021#18 Readings

Aug 1, 2021 4 minutes read | 694 words

Next week I start a new job 😮

2021#16 Readings

Jul 3, 2021 3 minutes read | 498 words

This past week I’ve been on holidays in Cordoba. I put on 2.5kg in 4 days. Recommended.

2020#14 Readings

May 30, 2021 5 minutes read | 906 words

This past week has been Data+AI Summit, so there are several new product announcements from Databricks.

Modelling data pipelines with Alloy

May 15, 2021 9 minutes read | 1783 words

In which I write some easy Alloy code for a data model.

2021#10 Readings

Apr 4, 2021 4 minutes read | 709 words

I have spent a big deal of these weeks moving my notes from Bear to Obsidian. I may write the reasons at some point, stay 🐟.

UTF-8 Issues between AWS Redshift and Apache Spark when COPY PARQUET

Mar 31, 2021 2 minutes read | 343 words

Timezones and UTF are rocks you repeatedly hit in your data journey.

2021#09 Readings

Mar 14, 2021 4 minutes read | 733 words

Looks like my mojo is coming back.

2021#08 Readings

Mar 7, 2021 3 minutes read | 538 words

This edition is kind of strange: there’s more management than “code”.

2020#04 Readings

Jan 25, 2021 4 minutes read | 672 words

Not sure what I did this past week aside from finishing a post: I read very little.

Lakehouse: It's like Delta Lake, but not really

Jan 19, 2021 5 minutes read | 1041 words

Lakehouse is the brand name for the underlying architecture of Databricks' Delta Lake: A data lake that is as performant as a data warehouse.

2021#03 Readings

Jan 17, 2021 4 minutes read | 803 words

This is also a video-heavy edition, I keep chugging along my watch list with Glancer

Down memory lane: the Hive paper

Jan 12, 2021 6 minutes read | 1124 words

Hive is arguably old. It is also undoubtedly useful, even now: 10 years after it was introduced.

2021#02 Readings

Jan 9, 2021 5 minutes read | 915 words

The video edition

2021#01 Readings

Jan 3, 2021 4 minutes read | 690 words

As promised, the numbering of these posts is now year-indexed.

Configuring log4j properties in Databricks (and EMR)

Jan 1, 2021 4 minutes read | 648 words

Managing logging in Spark ain’t easy, and is even harder in managed clouds like Databricks or EMR.

Programmatic adtech industry: where to?

Dec 28, 2020 9 minutes read | 1871 words

Finishing and posting this got lost in a task manager reorganisation, it was due in June-July.

Adtech
Data

2020#65 Readings

Dec 20, 2020 4 minutes read | 851 words

My reading list is at less than 10 items, so now my readings posts will hopefully look closer to watchings. My watch-later list is at more than 90.

2020#64 Readings

Dec 12, 2020 4 minutes read | 774 words

I have been on holidays this week, playing VR and preparing videos for an online Python event I co-organise.

2020#63 Readings

Dec 5, 2020 4 minutes read | 649 words

This looks like a less hard technical edition than usual.

Find-the-gap with SQL in AWS Redshift

Nov 29, 2020 3 minutes read | 632 words

A relatively common type of query for time-based SQL tables is a find the gap query. How can you do this in AWS Redshift, which does not have the SQL function generate_series?

2020#62 Readings

Nov 22, 2020 3 minutes read | 602 words

This is a short edition.

2020#61 Readings

Nov 13, 2020 4 minutes read | 774 words

Finished that post, now started the next. And having all Fridays off is awesome.

The RDD paper: introducing the Spark general purpose framework

Nov 8, 2020 9 minutes read | 1909 words

This is the next instalment on my quest to read and help understand interesting papers in the data space.

2020#60 Readings

Oct 31, 2020 4 minutes read | 796 words

I am having a very hard time finishing my summary of the RDD paper.

2020#59 Readings

Oct 24, 2020 4 minutes read | 764 words

I have played a ton of virtual table tennis this week.

2020#58 Readings

Oct 18, 2020 4 minutes read | 739 words

I have read quite a bit this week, I’m also preparing a summary of the RDD paper.

Databricks' Delta Lake: high on ACID

Oct 12, 2020 15 minutes read | 3024 words

After reading the Snowflake paper, I got curious about how similar engines work. Also, as I mentioned in that article, I like knowing how the data sausage is made. So, here I will summarise the Delta Lake paper by Databricks.

2020#57 Readings

Oct 11, 2020 5 minutes read | 976 words

I have dropped the Weekly from the title. It was about time.

Running SparkSQL on Databricks via Airflow's JDBC operator

Oct 5, 2020 4 minutes read | 682 words

The one where Airflow messes with you.

Does Snowflake have a technical moat worth 60 billion?

Oct 2, 2020 15 minutes read | 3032 words

I didn’t know much about Snowflake, so I decided to have a look at its SIGMOD (ACM Special Interest Group on Management of Data) paper and investigate a bit more what special capabilities they offer, and how they compare to others.

2020#56 Readings of the Week

Sep 28, 2020 4 minutes read | 832 words

This is a bit late because I have automated something.

2019#24 Readings of the Week

Sep 14, 2019 3 minutes read | 463 words

Although this week I have been reading mostly Apache Cassandra documentation, I have tried to avoid an onslaught of tips, tricks and readings on it. Just one article.

2019#23 Readings of the Week

Sep 1, 2019 4 minutes read | 714 words

I have been on quite the hiatus, making this more of a readings of the month edition. Sorry!

2019#20,21,22 Readings of the week

Jul 28, 2019 3 minutes read | 625 words

I have been pretty busy lately, and although reading doesn’t stop, my writing sometimes takes a hiatus.

Data engineering, adtech, history, apple. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#19 Readings of the week

Jul 8, 2019 4 minutes read | 660 words

History, haskell, Wardley mapping, functional programming. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here. You can also get these as a weekly newsletter by subscribing here.

A (section) of a map of the data engineering space

Jul 7, 2019 11 minutes read | 2151 words

The map and problem described here were part of my presentation Mapping as a tool for thought, and mentioned in my interview with John Grant and Ben Mosior (to appear sometime soon in the Wardley Maps community youtube channel). I’m looking for ideas on how to make this map easier to understand and useful, so I posted it to the Wardley Maps Community forums requesting comments.

2019#18 Readings of the week

Jun 25, 2019 2 minutes read | 355 words

Software engineering, history, planning, data engineering. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#17 Readings of the week

Jun 17, 2019 2 minutes read | 334 words

_This week is a bit light on technical content because I was attending Scala Days 2019 in Lausanne and I had enough with the talks. _

Software engineering, psychology, history. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#16 Readings of the week

Jun 10, 2019 3 minutes read | 435 words

Wardley mapping, data engineering and big data, maths. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#15 Readings of the week

Jun 4, 2019 3 minutes read | 439 words

Data engineering, adtech, ZIO, Rust, writing, some miscellaneous stuff. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#14 Readings of the week

May 26, 2019 4 minutes read | 648 words

Data engineering, adtech, functional programing, formal specification. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#9 Readings of the week (x4)

Apr 7, 2019 9 minutes read | 1891 words

You know how you slip once on a habit and everything goes crazy? Well, I’ve been 4 weeks without writing these, so here’s the accumulated reading from 4 weeks. Because, even if I don’t write it, I read a lot anyway. Also, there’s lot of interesting content this “week”.

2019#8 Readings of the week

Mar 4, 2019 3 minutes read | 427 words

This edition has Software engineering, formal methods, fraud, Scala. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#7 Readings of the week

Feb 26, 2019 3 minutes read | 448 words

Formal methods, Scala, productivity. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

Apache Hive and java.lang.ClassCastException on start

Feb 19, 2019 2 minutes read | 247 words

A couple of days ago I installed Hive from Homebrew on my Mac. Sadly, when I tried to run the hive command, I got the weird-looking error

2019#6 Readings of the week

Feb 17, 2019 3 minutes read | 550 words

Software engineering, adtech, psychology, python. Expect a similar wide range in the future as well. You can check all my weekly readings by checking the tag here . You can also get these as a weekly newsletter by subscribing here.

2019#1 Readings of the week

Jan 14, 2019 4 minutes read | 648 words

If you know me, you’ll know I have a quite extensive reading list. I keep it in Pocket, and is part of my to do stored in Things3. It used to be large (hovering around 230 items since August) but during Christmas it got out of control, reaching almost 300 items.

2016 in Review

Dec 31, 2016 6 minutes read | 1230 words

Some of the links are affiliate links to Amazon. I only recommend what I use. At last February. Finished my PhD dissertation, so can add Dr. in front of my name when ordering a Gatwick Express train ticket. Also makes for a cooler email signature. So far has been the only differences I’ve seen in the year Work February-December Started working as a consultant in London, almost the day after delivering the PhD presentation.

Ruben Berenguel, PhD

Apr 30, 2016 1 minutes read | 147 words

Started a long time ago. It was supposed to be about a phenomenon leading to chaos: separatrix splitting. I got a research grant. I worked on holomorphic dynamics. Travelled. Presented. Too many roadblocks with the separatrix problem. Switched topics. Welcome to a different new world, infinite dimensional dynamical systems. I read the literature. Researched, proved some things. My grant ran out. I worked. A lot. Too many times I considered giving up.

Data
Maths

Find Search Engine Rankings... via the Command Line

May 22, 2013 3 minutes read | 523 words

Via pixelfrenzy@flickr Beware! The software described here is just for personal and very light use. Its use beyond purely recreational value is against Google Search terms of service, and I don’t want you or anyone to step that line. Any use of this code is at your own risk. Well, after this scary paragraph, lets get to the real meat. Which boils down to just a few lines of bash.

Using Gephi with Google Analytics to visualize keywords and landing pages

Aug 18, 2011 6 minutes read | 1087 words

As of late, I’ve been playing a lot with data analysis and visualization tools. Recently I’ve read two interesting books (Statistical Analysis with R and Visualize This: The FlowingData Guide to Design, Visualization, and Statistics and I’m on my way to another two to refresh my statistics knowledge. But this post is only mildly related to these books, since it started way before: the day I read about Gephi. Gephi is an open source graph visualization tool, to work with huge (or at least big) datasets and graphs.