Nix's Substituter List Is Not a Routing Table | Blog
Index<br>Projects Posts About
↑ #Nix's Substituter List Is Not a Routing Table<br>Published May 24, 2026 14 min read Source
softwareprogrammingnix
Table of Contents<br>Nix’s substituter model is one of those designs that is almost right, but<br>isn’t exactly there. It is simple in itself: you list a few binary caches in<br>nix.conf, the daemon walks them in order, and if the path you want is anywhere<br>on the internet a build doesn’t have to happen on your laptop. The binary cache<br>is often listed as a strength of NixOS , but it’s actually a strength of Nix<br>and a bare minimum for something like NixOS to work for users. Which is to say<br>that it’s actually perfectly fine until you have a large enough multi-cache<br>setup for your configuration’s dependencies, or in rarer cases, your projects<br>dependencies. By which I mean multiple binary cache instances because you<br>decided to fetch massive C++ and Rust projects from the internet.
In such a case the first cache in your list is almost always<br>https://cache.nixos.org. It is fast, it is global, and it does not have your<br>overlay’s packages. The second tends to be something like the nix-community<br>cache, because you usually pull something really useful 1 from the<br>nix-community organization. In less common cases you also add a third party<br>project’s Cachix and occasionally, if you’re technical enough, your homelab’s<br>private cache. In such a setup every narinfo lookup walks that list. Every<br>nix-shell -p hello becomes a serialized scan across four hosts on three<br>continents because Nix has no concept of which substituter is most likely to<br>answer for this path. It just asks them all, in the order you wrote them down,<br>one after the other.
The Shape of the Problem
To explain why a proxy is the right answer, you have to be honest about what<br>Nix’s substituter logic is:
A loop over substituters, in order.
For each: HEAD /.narinfo. If 200, fetch and use it. If 404, continue.
No concurrency. No latency tracking. No memory of which cache won last time<br>you asked for this hash.
The substituter list is a preference, not a routing table. There is a<br>priority field, but it is a static integer chosen at config time. It does not<br>know that cache.nixos.org is 40 ms away on your home connection and 800 ms<br>away from the office VPN. It does not know that your private cache is the only<br>one that has the path. It does not know anything, because there is nothing to<br>know. The daemon is stateless on every request. For a tool that prides itself on<br>being a model of declarative purity, the network layer underneath is still<br>static and request-local.
ncro, Briefly
ncro—Nix Cache Route Optimizer, pronounced Necro—is a small HTTP proxy<br>that sits between nix-daemon and your substituters. It is about three thousand<br>lines of clean, performant Rust code. It does three things:
On a narinfo lookup, it races all configured upstreams in parallel with<br>HEAD and remembers which one won.
On a NAR fetch, it streams the body straight through to the client. No disk.<br>No buffer. No NAR ever lives on the proxy.
It keeps a small, bounded SQLite table of route decisions so a restart<br>doesn’t force it to relearn the entire world.
That is the whole product. It is deliberately not another<br>ncps, which mirrors caches to disk and<br>gives you all the cache-invalidation grief that comes with mirroring caches to<br>disk. ncro does not retain payload data once it is streamed through.
What’s Actually In It
The interesting parts are the ones the architecture diagram (which you might<br>or might not have paid attention to) doesn’t show:
The race. ncro’s router groups candidates by priority, then for each<br>priority tier spawns a FuturesUnordered of HEAD requests and breaks on<br>the first success. The tier loop is what lets you say “prefer my private<br>cache, but only if it answers—otherwise fall through to cache.nixos.org”<br>without writing any of that logic into Nix itself. There’s a deadline pinned<br>to the select loop so a single hung upstream can’t stall the entire lookup.<br>Failures are classified as not found, network error, and timeout<br>because “every upstream returned 404” and “every upstream’s TCP handshake<br>died” deserve different answers to the client.
The cache, in two layers. A moka Cache sits in front of SQLite, with a<br>1024-entry capacity and TTL bound to the route TTL. SQLite underneath, with<br>narinfo_bytes stored alongside the route so a hot path doesn’t even need a<br>second upstream fetch. Eviction is throttled to fire every hundred writes by<br>abusing AtomicU64::fetch_add’s pre-increment semantics. This is a detail<br>that bit me in review because count % 100 == 0 fires on the first write,<br>when the counter is zero. The fix was one character, but the impact was real:<br>latency metrics were skewed until this edge case was corrected.
Health, with EMA. ,<br>. The first sample bypasses the smoothing. Otherwise, the first<br>probe permanently anchors to whatever junk was in ema_latency at startup.<br>Consecutive...