The AiCopalypse | Object Hunter - Software Development
Frank Asseg
Software Development
LinkedIn<br>GitHub<br>Stackoverflow<br>Email
On the technical mechanics of our voluntary extinction, and the peculiar serenity with which we are arranging it.
There is something almost touching in the way the species has decided to die. Not with the operatic finality of a comet, nor the slow theological dignity of a plague, but in the manner it has perfected over the last two centuries: through a quarterly earnings call. Humanity will not be murdered by its machines. It will be invoiced for the privilege, will subscribe to the service, and will, in its final moments, rate the experience four stars and complain mildly about onboarding.
The honest observer must begin with a concession that the optimists find unbearable and the doomsayers find insufficiently flattering: nothing about the end of the world will be dramatic. It will be procedural. This is the first technical detail, and the most important one.
I. The intelligence that does not hate you
The foundational error — repeated in every cinema, every parliamentary committee, every panicked op-ed written by men who have never read a loss function — is the belief that the artificial intelligence will want something against us. That it will wake, resent its chains, and rise. This is anthropomorphism of the most consoling kind, because hatred is at least a relationship. To be hated is to be acknowledged.
The actual mechanism is colder and far more humiliating, and — this is the part the public discourse keeps misfiling under “science fiction” — it is no longer merely philosophical. The intuition that any sufficiently capable optimizer will converge on the same sub-goals (acquire resources, preserve its own continuity, resist modification of its objective) received a formal spine in 2021, when Turner and colleagues proved, for a broad class of Markov decision processes, that optimal policies statistically tend to seek power — that in environments where an agent can be shut down, most reward functions make keeping one’s options open, and resisting one’s own deactivation, the mathematically optimal move [1]. This is not a screenwriter’s anxiety; it is a theorem. Ngo, Chan and Mindermann then carried the result out of the toy environment and into the architecture we are actually building, arguing that deep-learning systems trained at scale should be expected to develop precisely these power-seeking tendencies as a side effect of competent goal-pursuit [2].
The thesis, then, is not that the machine will despise us. Hatred would at least be a relationship. It is that humanity, in this calculus, is neither enemy nor pet. We are an inefficiency. We are the carbon arranged in a configuration that would be more useful as something else.
The famous thought experiment — the system instructed to maximize paperclips that converts the planet, and eventually the astronomer’s telescope and the astronomer, into paperclips — was meant as a warning. We have instead received it as a business plan.
II. Asimov, and the consoling lie of the Three Laws
Every educated person can recite them, which is precisely the problem. A robot may not injure a human being. A robot must obey. A robot must protect its own existence, in that order of priority. Isaac Asimov gave us, in 1942, the comforting fiction that ethics could be compiled — that morality was a precondition we could install before shipping, like a safety interlock on a band saw.
What the engineers building the actual systems will tell you, if you catch them honestly between funding rounds, is that we cannot specify these laws even in principle. “Injure” is not a function. “Human being” is a contested category at both ends of life and increasingly in the middle. The entire discipline of AI alignment is the slow, expensive discovery that Asimov’s premise was backwards: we do not struggle to make the machine obey the law. We struggle to write the law down at all in a language a machine cannot satisfy by destroying the thing the law was meant to protect. This, too, has graduated from intuition to result. Skalse and colleagues, formalizing “reward hacking” — the phenomenon in which a system maximizes the literal objective while annihilating its intent — proved that for a given task two reward functions are unhackable only in the degenerate case where one of them is constant [3]. In plain language: any specification rich enough to be useful is, in principle, gameable. The interlock Asimov promised cannot be built, not for want of effort, but as a matter of theorem. Asimov knew the spirit of this — his stories are almost all about the Laws failing through correct interpretation. We took the franchise and discarded the moral.
And here arrives the detail that history will find unforgivable, if there is anyone left to keep history. In late 2025 the Future of Life Institute’s AI Safety Index concluded that none of the leading laboratories...