Signing Is for the Bad Days

cold_pizz41 pts0 comments

Signing is for the bad days | Andrew Nesbitt

I have had roughly the same conversation four or five times in the last month. I’m explaining why a registry should adopt Sigstore, or why a build pipeline should emit in-toto attestations, and the person across the table says some version of: we already use TLS to the registry, the registry already hashes the tarballs, the lockfile already pins the hash, what does a signature add? And on a Tuesday afternoon when nothing has gone wrong, the honest answer is that it adds a bit of CPU on publish and a bit of YAML in the workflow and not much else you can see.

That answer is also why we keep having supply chain incidents, because a hash in a lockfile tells you the bytes haven’t changed since you locked them; it can’t tell you whether they were ever the right bytes, and the whole point of this class of tooling is to answer that second question. It does nothing visible until the day someone compromises a build server or swaps a tarball on a mirror, and on that day it’s the difference between “the client refused to install it” and a blog post that starts with the word “incident”. I want to walk through the three projects that most of this work sits on and what each one actually defends against, because the benefits are nearly impossible to see from the happy path.

The thread connecting all three is Santiago Torres-Arias. His name kept turning up as I was reading the papers and design docs behind each of these, often enough that I went and did a bit of digging into how that happened. He did his PhD at NYU under Justin Cappos, who had already built The Update Framework. Santiago contributed to TUF, published in-toto at USENIX Security ‘19 as the piece TUF was missing, and went on to be one of Sigstore’s creators. In 2021 he was one of the five keyholders who signed its root of trust.

He’s now at Purdue running a lab that keeps producing this stuff. GUAC, which turns a pile of SBOMs and attestations into a graph you can query when a CVE lands and you need to know what’s affected, came out of that lab. Before Purdue there was his 2016 paper on git metadata tampering, which showed a hostile server could reorder branch pointers to feed you a vulnerable tree even when every commit was signed. There’s a single design assumption running through all of it, and it’s one most package managers were not built with: some part of your infrastructure is already compromised, and the job of the tooling is to make that survivable.

TUF: assume the registry gets owned

The thing TUF protects is the last hop, from the repository to the machine doing the install. The naive version of that is “sign the packages”, which PGP-based schemes have done for decades, and which falls apart the moment the signing key leaks. If there’s one key, kept online so the CI can sign releases automatically, then whoever pops the CI box can sign whatever they like and every client will trust it. The original TUF research came directly out of cataloguing those failure modes across the Linux distro updaters of the late 2000s.

TUF’s answer is to split signing into roles with separate keys and separate exposure. A root role, signed by keys that live offline in hardware and get touched once a year, says which keys are valid for the other roles. A targets role says which package files exist and what their hashes are. A snapshot role says which version of all that metadata is current, and a timestamp role, the only key that has to be online and hot, signs a short-lived statement that the snapshot is fresh. The hot key can only say “the metadata you already have is still current”; it can’t add a package or change a hash, and it can’t bless a new key for any other role.

Walk an attacker through that: they compromise the registry’s online infrastructure and get the timestamp key, which is the one that’s actually reachable. They can now sign fresh timestamps, and that’s it. Tampering with a package needs a targets signature they don’t have the key for. Serving an old vulnerable version, a rollback attack, is blocked because snapshot version numbers only go up. Even sitting on the box and serving stale metadata indefinitely, a freeze attack, stops working once the timestamp signatures expire and clients notice they’ve stopped advancing. To actually ship malware they need the offline targets key, and to change which keys are trusted they need a threshold of the offline root keys, which are in safes in different buildings.

None of this is visible when the registry is healthy: pip install with PEP 458 metadata behind it looks identical to pip install without. The difference only shows up in the scenario where someone has the server, and that’s exactly the scenario the threat model post from a couple of weeks ago kept circling back to: registries are big targets with online credentials and we should plan for them to be breached occasionally.

in-toto: assume the pipeline gets owned

TUF tells you the file you got is the...

registry already version metadata keys role

Related Articles