Mental Models for Data Platform Engineers (Inspired by Poor Charlie's Almanack)

zazuke1 pts0 comments

The Dagster Almanack: From Complexity to Composability

Meet Compass — Dagster’s new AI data analyst for Slack. Turn questions into trusted insights, instantly.  Try Compass now →

Discover What assets do best , an animated, narrated story about how data assets work together.  Watch now →

Solutions

Pricing

Company

Resources

Try Dagster+

Sign In

Example H2

Example H3<br>Example H4<br>Example H5

A complete guide with all the insights, tips, and some predictions for the data platform engineer, just like an Almanack provides, with practical information for daily life.

I have read the "Poor Charlie's Almanack" by Charlie Munger and thought about what it would take to write one for Dagster. A complete guide with all the insights, tips, and some predictions for the data platform engineer, just like an Almanack provides, with practical information for daily life.<br>My goal is to offer a collection of wisdom, insights, and principles gathered over the years. Giving you an outside view from someone who has used Dagster since back in 2019, used it at enterprise scale but also for my hobby projects (e.g. real-estate project). The piece should give you a holistic view of Dagster's place in the data ecosystem, how to deal with the complexity of data architecture and enterprises, and scaling your data jobs.<br>This article shows you how orchestrators such as Dagster are built for an open data platform that integrates the full data ecosystem, with the shift to data assets instead of DAGs, reducing complexity and applying data engineering best practices.<br>Almanack (also spelled "almanac") refers to a publication containing a variety of information on a dedicated topic. The modern usage of Almanack, particularly in the context of books like those by Charlie or Naval Ravikant, is often metaphorical. It suggests a collection of wisdom, insights, or principles gathered over time.<br>What is Dagster<br>In late 2018, on a co-working and co-living sabbatical in Bali, I was searching for something to bring the data warehouse out of the drag-and-drop world of SSIS and Oracle reporting and into a code-first, developer-friendly workflow. I looked at ODE, BiGenius, TimeXtender, and WhereScape, but found that none of them quite fit my open source and programmatic preferences, so I tried to build something myself but didn't succeed. A year later, back at my 9-to-5 in Copenhagen, I heard Nick Schrock on the Data Engineering Podcast describing the motivation and story behind a Python framework called Dagster that did exactly that. I was hooked, and have used Dagster ever since.<br>Early Focus on Developer Friendliness<br>To understand the context of 2019, you must understand that back then, most ETL jobs were triggered with cron or bash scripts, and if there was an error, the only option was to re-run in the next nightly window where production wasn't touched. Dagster, as explained by Nick in the podcast, focused on developer-friendliness, in particular for ETL developers back then, and that focus hasn't changed today for data engineers.<br>So what is Dagster? The original idea, started in 2018 during a sabbatical after Nick worked at Facebook, came with this definition:<br>One of the goals of Dagster has been to provide a tool that removes the barrier between pipeline development and pipeline operation, but during this journey, he came to link the world of data processing with business processes .<br>Today the definition hasn't changed much and reads like this from the Dagster Docs:<br>Dagster is a data orchestrator built for data engineers , with integrated lineage, observability, a declarative programming model, and best-in-class testability .<br>The initial definition to "link data processing with business" was the key reason that brought me to it, along with the quality of how the components were implemented. Even more compelling was Nick's visionary outline for 3-5 years ahead: to make the work of data engineers similar to software engineers, and make their daily life easier.<br>Biggest Shift Early On<br>This vision led to many new concepts Dagster originally created, which we take for granted in today's data work, and shifted the work into a more reliable and useful toolset for data engineers.<br>Data-aware Orchestration Shift<br>One of the biggest shifts compared to previous tools and orchestrators was that orchestration was fully data-aware from the very beginning. It tried to understand the heterogeneous complexity that exists at every small to large enterprise company, and thrive in it, supporting the full data engineering lifecycle with its platform and data pipeline capabilities built in.<br>This gave me a toolkit for building reliable data pipelines out of the box early on, with battle-tested features through its users (open-source) and a quality and thoughtfulness I hadn't seen before. This was personified by Nick and could be vividly felt in the early interviews, but also in the code that the team produced openly on the repo.<br>For example, backfilling, restartability,...

data dagster almanack engineers platform insights

Related Articles