Nix's Substituter List Is Not a Routing Table

Nix's Substituter List Is Not a Routing Table | Blog

Index Projects Posts About

↑ #Nix's Substituter List Is Not a Routing Table Published May 24, 2026 14 min read Source

softwareprogrammingnix

Table of Contents Nix’s substituter model is one of those designs that is almost right, but isn’t exactly there. It is simple in itself: you list a few binary caches in nix.conf, the daemon walks them in order, and if the path you want is anywhere on the internet a build doesn’t have to happen on your laptop. The binary cache is often listed as a strength of NixOS , but it’s actually a strength of Nix and a bare minimum for something like NixOS to work for users. Which is to say that it’s actually perfectly fine until you have a large enough multi-cache setup for your configuration’s dependencies, or in rarer cases, your projects dependencies. By which I mean multiple binary cache instances because you decided to fetch massive C++ and Rust projects from the internet.

In such a case the first cache in your list is almost always https://cache.nixos.org. It is fast, it is global, and it does not have your overlay’s packages. The second tends to be something like the nix-community cache, because you usually pull something really useful 1 from the nix-community organization. In less common cases you also add a third party project’s Cachix and occasionally, if you’re technical enough, your homelab’s private cache. In such a setup every narinfo lookup walks that list. Every nix-shell -p hello becomes a serialized scan across four hosts on three continents because Nix has no concept of which substituter is most likely to answer for this path. It just asks them all, in the order you wrote them down, one after the other.

The Shape of the Problem

To explain why a proxy is the right answer, you have to be honest about what Nix’s substituter logic is:

A loop over substituters, in order.

For each: HEAD /.narinfo. If 200, fetch and use it. If 404, continue.

No concurrency. No latency tracking. No memory of which cache won last time you asked for this hash.

The substituter list is a preference, not a routing table. There is a priority field, but it is a static integer chosen at config time. It does not know that cache.nixos.org is 40 ms away on your home connection and 800 ms away from the office VPN. It does not know that your private cache is the only one that has the path. It does not know anything, because there is nothing to know. The daemon is stateless on every request. For a tool that prides itself on being a model of declarative purity, the network layer underneath is still static and request-local.

ncro, Briefly

ncro—Nix Cache Route Optimizer, pronounced Necro—is a small HTTP proxy that sits between nix-daemon and your substituters. It is about three thousand lines of clean, performant Rust code. It does three things:

On a narinfo lookup, it races all configured upstreams in parallel with HEAD and remembers which one won.

On a NAR fetch, it streams the body straight through to the client. No disk. No buffer. No NAR ever lives on the proxy.

It keeps a small, bounded SQLite table of route decisions so a restart doesn’t force it to relearn the entire world.

That is the whole product. It is deliberately not another ncps, which mirrors caches to disk and gives you all the cache-invalidation grief that comes with mirroring caches to disk. ncro does not retain payload data once it is streamed through.

What’s Actually In It

The interesting parts are the ones the architecture diagram (which you might or might not have paid attention to) doesn’t show:

The race. ncro’s router groups candidates by priority, then for each priority tier spawns a FuturesUnordered of HEAD requests and breaks on the first success. The tier loop is what lets you say “prefer my private cache, but only if it answers—otherwise fall through to cache.nixos.org” without writing any of that logic into Nix itself. There’s a deadline pinned to the select loop so a single hung upstream can’t stall the entire lookup. Failures are classified as not found, network error, and timeout because “every upstream returned 404” and “every upstream’s TCP handshake died” deserve different answers to the client.

The cache, in two layers. A moka Cache sits in front of SQLite, with a 1024-entry capacity and TTL bound to the route TTL. SQLite underneath, with narinfo_bytes stored alongside the route so a hot path doesn’t even need a second upstream fetch. Eviction is throttled to fire every hundred writes by abusing AtomicU64::fetch_add’s pre-increment semantics. This is a detail that bit me in review because count % 100 == 0 fires on the first write, when the counter is zero. The fix was one character, but the impact was real: latency metrics were skewed until this edge case was corrected.

Health, with EMA. , . The first sample bypasses the smoothing. Otherwise, the first probe permanently anchors to whatever junk was in ema_latency at startup. Consecutive...

Nix's Substituter List Is Not a Routing Table

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits