Regular

If you’ve ever asked “Why is my PySpark job slow on EMR?” the honest answer is usually: it’s not one thing. It’s a handful of small decisions that compound—cluster sizing, file layout, shuffle tuning, join strategy, and the never-ending battle with small files on S3.

I am not sure about you, but tax season is a busy time of year for my teams. With that, I have jumped in the mix to assist with code reviews, PR approvals and branch merging in order to free up some of my Senior Data Engineers to do more...

I recently started reading Tomasz Tunguz and Frank Bien’s Winning with Data; Tranform your Culture, Empower your People, and Shape the Future. For many us in the data management field, whether in Data Engineering, Business Intelligence, Data Architecture, Database Administration, or even Software Engineering, understanding and extending the usage of...