I’ve been in the position of being forced (unsuccessfully though) to accept some metrics for a data platform that were nonsensical. The list in this post is quite the opposite, although getting your systems to cover all of them is significant (but important) work.
AWS is speeding up development of Athena, keeping it on par with PrestoDB/Trino. This new engine improves query planning performance (kind of a corner case, but can be significant for small ad-hoc queries) and better handling of Apache Iceberg data.
Partitioning (automatically) in Postgres has been possible for some time, but there are constant improvements with each release. Here you will find what is now available and what you can do with it. I’m itching to try some of these.
This is an excellent way of dividing data people. I’m partial to system building, although I enjoy storytelling from time to time. But I could not spend all my time in storytelling mode: creating, designing and implementing systems is how I get my “kick”.
A summary of a paper, evaluating many anomaly detection algorithms. A long, long time ago I was interested in AD, and remember facing this as a problem: too many options, and no clarity on which to choose. The conclusion of the paper though is not brilliant: for multivariate anomalies you need a multivariate algorithm, and in univariate data you should choose a univariate algorithm. Duh.
These two are related. A flag for a compiler triggers an optimisation that cascades through the whole (or most) of the Python scientific stack and affects numbers when they are very close to zero. You can find the details in the two posts above, it’s kind of IEEE-754 obscure.