Agents Cannot Maintain Systems: The Additive–Transformative Gap in LLM Software Delivery
Menu
This article explains why current LLMs cannot safely modify real software<br>systems, despite impressive code‑generation demos.
Table of contents
The Promise of Automated Software Delivery
In 2026, the automated software delivery dream is for an agent to:
read a repository
understand project structure
plan a multi‑step change
write code, tests, and docs
run the code and fix its own mistakes
produce a PR‑ready diff
The first three tasks are additive; the last three are transformative. The<br>first three add information without changing the behaviour of the system: they<br>require reading, mapping, and planning, but not altering any existing causal<br>structure in the codebase.
Applying new code is self-contained, additive work; modifying an existing system<br>is transformative work that requires an understanding of dependencies,<br>invariants, and consequences. This distinction — additive vs transformative —<br>is the core reason current LLMs can assist but cannot autonomously deliver<br>software.
Parts of the above can be done but only for tightly controlled demos on simple<br>code that is tens of lines long, not on real-world repositories with thousands<br>of lines of code that has existed for years where dozens of people have<br>updated it.
What the Labs Have Actually Delivered
The agentic work of OpenAI, Google, Cognition Labs, GitHub (Microsoft),<br>Sourcegraph, JetBrains, Replit, Amazon, Meta, and Anthropic, that is listed in<br>Further Reading, was published in 2023 and 2024.
Depending on where you look, you may have been given another impression: that<br>"agents are here". However, reality tells a different story.
Agents are improving, but are not reliable, not autonomous, and not production‑safe.
LLMs can assist with software delivery, but they cannot own it.
Why is this?
LLMs generate statistically plausible continuations of text. This works well<br>for self-contained tasks like writing a function or drafting documentation<br>because these are pattern‑extension problems. But pattern‑matching is not<br>system understanding, and plausibility is not correctness.
Software systems are causal: components depend on each other, invariants<br>constrain behaviour, and changes propagate through the system. The moment a<br>task stops being self‑contained and becomes system‑dependent — requiring<br>dependency coherence, persistent state, or awareness of how changes ripple<br>through a real codebase — pattern‑matching is no longer sufficient.
Currently, LLMs can imitate the shape of engineering work, but they cannot<br>maintain a stable internal representation of a system that must be coherently<br>changed, and that gap is exactly why LLMs fail the moment the task becomes<br>system‑level.
Persistent state creates temporal dependencies
A self‑contained task has no past and no future. A system‑dependent task does.
As soon as a change depends on:
previous writes
accumulated data
cached values
long‑lived objects
external system state
any agentic model must reason about how the system got here and how it will<br>behave after the change.
LLMs cannot maintain that internal causal chain.
Writing code to Agentic Systems: The Fundamental Gap
The gap becomes clear when you compare two activities: writing new code and<br>modifying an existing system.
Code generation is local and additive: the model extends a pattern without<br>needing to understand the system.
But agentic work is global and transformative: the LLM must change the system<br>itself, which requires understanding dependencies, invariants, interactions,<br>and downstream consequences.
This is causal reasoning, not pattern extension. LLMs predict tokens, not<br>consequences — and that is why the leap from writing code to producing a safe,<br>system‑aware PR‑ready diff is not incremental but a shift into a fundamentally<br>different problem space.
Producing a PR‑ready diff (the section in question)
A pull request (PR) is a piece of code that will change a system.
For that change to be safe, the change must respect the system's current<br>architecture, its intent, and all downstream consequences.
Software engineers work hard to ensure that such a change is safe through<br>testing and their own judgement and experience before having a collegue review<br>the change.
Applying a change is no longer pattern-matching but understanding causal<br>behaviour: how will the system change if this PR is applied?
The correctness of the PR depends on understanding the whole system, not just<br>generating text.
The LLM must change the system, which requires understanding dependencies,<br>invariants, interactions and consequences, all of which demand causal<br>reasoning, not pattern matching.
Pattern‑matching can write code; only causal reasoning can maintain systems.
What can I do?
Confirm for yourself any claim that you see. Define your own realistic<br>real-world repository to work on, one that is thousands of lines of code, that<br>has supported past...