The question Erlang answered in 1986 is back, one level up

iristenteije1 pts0 comments

The Stable Thing - by Iris - Adaptive Software

Adaptive Software

SubscribeSign in

The Stable Thing<br>Software Has Always Had a Fixed Point. We’re About to Move It.

Iris<br>May 19, 2026

Share

In 1986, Joe Armstrong was given a hard problem. Ericsson needed software for its telephone switches, systems routing hundreds of thousands of simultaneous calls across Sweden. The biggest challenges? Continuity. A switch that went offline to install an update dropped every call running through it. In telecom, that’s a double whammy of bad user experience and a contractual violation.<br>The standard engineering answer was to make downtime short and scheduled: 3am on a Sunday. Armstrong rejected this solution. He didn’t want to minimize downtime. He wanted to eliminate it entirely.<br>To do that, he had to make a decision that seems obvious in retrospect but wasn’t obvious at the time: he had to decide what must never be interrupted in a running system. His answer was process state. The context each process held had to survive any change to the code around it: what it was doing, what it knew, what it was in the middle of. Everything else could be swapped. That couldn’t.

Subscribe

The reason this decision was hard is that most programming languages make it nearly impossible to actually do this. In a conventional system, code and state are entangled. Threads share memory. Objects hold references to methods. If you swap a module while the system is running, you create a window where old code and new code are touching the same data simultaneously. Crashes are likely to happen.<br>Armstrong’s answer was to eliminate entanglement entirely. In Erlang, processes don’t share memory. Each process owns its state completely. Communication happens only by passing messages. If one process crashes, it crashes alone.<br>That isolation made hot code loading tractable. When you update a running Erlang system, the BEAM virtual machine loads the new version of a module alongside the old one. Both versions sit in memory simultaneously. Running processes finish what they’re doing on the old version. The next time a process makes a fully-qualified function call, it gets the new version. The transition happens process by process, on each process’s own terms.<br>In 1998, Ericsson shipped the AXD301 switch. It contained over a million lines of Erlang. Its reported availability was 99.9999999 percent — nine nines — meaning less than one second of downtime per year. Five nines is the standard most telecoms aim for and most cloud services miss. Code was being swapped while calls ran through it.<br>What Armstrong had built was a system organized around a clear answer to a clear question: when everything can change, what must stay stable? Process state. Everything followed from that.

Then the web arrived, and the industry discovered it didn’t need to answer the question at all.<br>The web ran on servers, and servers could restart. A deployment that took two minutes at 3am was an inconvenience, but not a crisis. So instead of deciding what had to survive, engineers built infrastructure that made survival unnecessary. Blue-green deployments: run two identical environments, point traffic at one while you update the other, then flip the switch. Rolling restarts: take instances down one at a time, update them, bring them back up. Load balancers. Containers. Kubernetes.<br>Armstrong’s problem stopped mattering. You no longer needed to update a running system safely, because you could always spin up a new one. The question of “what must stay stable while everything changes?” quietly disappeared. Restarts were free. Stability could live outside the code entirely, in the infrastructure that surrounded it.<br>Armstrong’s insight became a curiosity. Erlang was deployed in serious infrastructure at WhatsApp and RabbitMQ, but stayed niche. Admired by people who understood it, but never a mainstream choice. The problem it had been designed to solve had been worked around.<br>The assumption this left behind went largely unexamined: that there was always a canonical implementation, and that the engineering problem was managing how carefully you replaced it. A new deployment meant a new version, and a new version meant everyone got the same thing. Feature flags pushed at the edges of this; you could change what users saw without a full redeploy, toggle a new behavior on for 5% of users, watch what happened, expand from there. LaunchDarkly built a business on making this tractable at scale. Progressive delivery gave teams finer control over rollouts.<br>But all of these were still organized around a fixed point. There was one true version of the software. The work was deciding who saw it and when.<br>Nobody asked what happened if that fixed point moved.

Until recently, the question was largely theoretical. Moving the fixed point required something that could read code, understand what it was for, and produce new code to serve the same purpose in a changed context. That capability didn’t exist. Now...

process code armstrong running system version

Related Articles