The Postmodern Build System

-->

The postmodern build system (updated 2025) - jade's www site

2025 Update

This post has aged remarkably well from its original release on 2023-12-11. That being said, I have found myself working on build systems full-time in the year and a half since then and have gotten more familiar with the industrial state of affairs.

I have recently been working on a large scale deployment of Facebook's Buck2 at my job (not at Facebook). The opinions in this post do not necessarily represent those of my employer; I do not wish to speak for them.

Some of the things mentioned vaguely in this post have an actual name and have been implemented. Mentions of these have been annotated with the name of the thing.

If you want to see the old version of the post, see it on archive.org.

Rest of the post

This is a post about an idea. I don't think it exists, and I am unsure if I will be the one to build it. However, something like it should be made to exist.

What we want

Trustworthy incremental builds: move the incrementalism to a distrusting layer such that the existence of incremental-build bugs requires hash collisions.

Such a goal implies sandboxing the jobs and making them into pure functions, which are rerun when the inputs change at all.

This is inherently wasteful of computation because it ignores semantic equivalence of results in favour of only binary equivalence, and we want to reduce the wasted computation. However, for production use cases, computation is cheaper than the possibility of incremental bugs.

This can be equivalently phrased in terms of lacking "build identity": is there any way that the system knows what the "previous version" of the "same build" is? A postmodern build system doesn't have build identity, because it causes problems for multitenancy among other things: who decides what the previous build is?

Maximize reuse of computation across builds. Changing one source file should rebuild as little as absolutely necessary.

Distributed builds: We live in a world where software can be compiled much faster by using multiple machines. Fortunately, turning the build into pure computations almost inherently allows distributing it.

Review

Build systems à la carte

This post uses language from "Build systems à la carte":

Monadic: a build that needs to run builds to know the full targets. It's so called because of the definition of the central operation on monads:

bind :: Monad m => m a -> (a -> m b) -> m b

This means that, given a not-yet-executed action returning a and a function taking the resolved result of that action, you get a new action whose shape depends an arbitrarily large amount on the result of m a. This is a dynamic build plan since the full knowledge of the build plan requires executing m a.

Applicative: a build for which the plan is statically known. Generally this implies a strictly two-phase build where the targets are evaluated, a build plan made, and then the build executed. This is so named because of the central operation on applicative types:

apply :: Applicative f => f (a -> b) -> f a -> f b

This means, given a predefined pure function inside a build, the function can be executed to perform the build. But, the shape of the build plan is known ahead of time, since the function cannot execute other builds.

Nix

As much of a Nix shill as I am, Nix is not the postmodern build system. It has some design flaws that are very hard to rectify. Let's write about the things it does well, that are useful to adopt as concepts elsewhere.

Nix is a build system based on the idea of a "derivation". A derivation is simply a specification of an execution of execve. Its output is then stored in the Nix store (/nix/store/*) based on a name determined by hashing inputs or outputs. Memoization is achieved by skipping builds for which the output path already exists.

This mechanism lacks build identity, and is multitenant: you can dump a whole bunch of different Nix projects of various versions on the same build machine and they will not interfere with each other because of the lack of build identity; the only thing to go off of is the hash.

The store path is either:

Named based on the hash of the contents of the derivation: input-addressed

This is the case for building software, typically.

Named based on the hash of the output: fixed-output

This is the case for downloading things, and in practice has a relaxed sandbox allowing network access. However, the output is then hashed and verified against a hardcoded value.

Named based on the output hash of the derivation, which is not fixed: content-addressed.

Note that ca-derivations have had a rocky deployment timeline and have been removed from Lix.

See ca-derivations.

The derivation for GNU hello

"/nix/store/nvl9ic0pj1fpyln3zaqrf4cclbqdfn1j-hello-2.12.1.drv": {

"args": [

"-e",

"/nix/store/v6x3cs394jgqfbi0a42pam708flxaphh-default-builder.sh"

"builder":...

The Postmodern Build System

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy