-->
The postmodern build system (updated 2025) - jade's www site
2025 Update
This post has aged remarkably well from its original release on 2023-12-11.<br>That being said, I have found myself working on build systems full-time in the year and a half since then and have gotten more familiar with the industrial state of affairs.
I have recently been working on a large scale deployment of Facebook's Buck2 at my job (not at Facebook).<br>The opinions in this post do not necessarily represent those of my employer; I do not wish to speak for them.
Some of the things mentioned vaguely in this post have an actual name and have been implemented.<br>Mentions of these have been annotated with the name of the thing.
If you want to see the old version of the post, see it on archive.org.
Rest of the post
This is a post about an idea. I don't think it exists, and I am unsure if I<br>will be the one to build it. However, something like it should be made to<br>exist.
What we want
Trustworthy incremental builds: move the incrementalism to a distrusting<br>layer such that the existence of incremental-build bugs requires<br>hash collisions.
Such a goal implies sandboxing the jobs and making them into pure functions,<br>which are rerun when the inputs change at all.
This is inherently wasteful of computation because it ignores semantic<br>equivalence of results in favour of only binary equivalence, and we want to<br>reduce the wasted computation. However, for production use cases, computation<br>is cheaper than the possibility of incremental bugs.
This can be equivalently phrased in terms of lacking "build identity": is<br>there any way that the system knows what the "previous version" of the "same<br>build" is? A postmodern build system doesn't have build identity, because it<br>causes problems for multitenancy among other things: who decides what the<br>previous build is?
Maximize reuse of computation across builds. Changing one source file should<br>rebuild as little as absolutely necessary.
Distributed builds: We live in a world where software can be compiled much<br>faster by using multiple machines. Fortunately, turning the build into pure<br>computations almost inherently allows distributing it.
Review
Build systems à la carte
This post uses language from "Build systems à la carte":
Monadic: a build that needs to run builds to know the full targets. It's so<br>called because of the definition of the central operation on monads:
bind :: Monad m => m a -> (a -> m b) -> m b
This means that, given a not-yet-executed action returning a and a function<br>taking the resolved result of that action, you get a new action whose shape<br>depends an arbitrarily large amount on the result of m a. This is a dynamic<br>build plan since the full knowledge of the build plan requires executing m a.
Applicative: a build for which the plan is statically known. Generally this<br>implies a strictly two-phase build where the targets are evaluated, a build<br>plan made, and then the build executed. This is so named because of the<br>central operation on applicative types:
apply :: Applicative f => f (a -> b) -> f a -> f b
This means, given a predefined pure function inside a build, the function can<br>be executed to perform the build. But, the shape of the build plan is known<br>ahead of time, since the function cannot execute other builds.
Nix
As much of a Nix shill as I am, Nix is not the postmodern build system. It has<br>some design flaws that are very hard to rectify. Let's write about the things<br>it does well, that are useful to adopt as concepts elsewhere.
Nix is a build system based on the idea of a "derivation". A derivation is<br>simply a specification of an execution of execve. Its output is then stored<br>in the Nix store (/nix/store/*) based on a name determined by hashing inputs<br>or outputs. Memoization is achieved by skipping builds for which the output<br>path already exists.
This mechanism lacks build identity, and is multitenant: you can dump a whole<br>bunch of different Nix projects of various versions on the same build machine<br>and they will not interfere with each other because of the lack of build<br>identity; the only thing to go off of is the hash.
The store path is either:
Named based on the hash of the contents of the derivation: input-addressed
This is the case for building software, typically.
Named based on the hash of the output: fixed-output
This is the case for downloading things, and in practice has a relaxed<br>sandbox allowing network access. However, the output is then hashed and<br>verified against a hardcoded value.
Named based on the output hash of the derivation, which is not fixed:<br>content-addressed.
Note that ca-derivations have had a rocky deployment timeline and have been removed from Lix.
See ca-derivations.
The derivation for GNU hello
"/nix/store/nvl9ic0pj1fpyln3zaqrf4cclbqdfn1j-hello-2.12.1.drv": {
"args": [
"-e",
"/nix/store/v6x3cs394jgqfbi0a42pam708flxaphh-default-builder.sh"
],
"builder":...