Regular

By now, if you have been following this series, you know what lineage is, where teams go wrong, and how to wire it up on EMR. This final post is about the bigger picture: how data lineage, ontology, and data contracts work together as a system instead of three separate...

Quick apology up front: this post is late. I meant to ship it last week, but a production incident reminded me (again) why lineage matters. So here it is: a practical guide to implementing data lineage from scratch in a Spark on AWS EMR environment.

I wrote the first post in this series after a messy incident. This second post is written after the third time I saw the same lineage mistakes repeat at a different company. The pattern is always the same: lineage is treated as a dashboard, not as part of the pipeline...