Worried about Recursive Self-Improvement (RSI)? The answer might be CDE

EGreg1 pts0 comments

Directed Evolution — Why Safebox Is Safer Than RSI

Safebox has no recursive self-improvement.<br>It undergoes continuous directed evolution.

Recursive self-improvement is the dividing line of the whole safety debate. An RSI system improves its own ability to improve: the optimizer rewrites the optimizer, the model makes the next model smarter, and the loop compounds in a direction no one outside it chose. That is exactly why it terrifies people, and exactly why no one can tell you how to secure it — a thing that rewrites the rules of its own improvement has, by construction, no fixed surface for a defender to reason about. You cannot write a static check for a system whose definition of "better" is a moving target it sets itself.

Safebox does something that looks, from a distance, almost as powerful — capability that grows on its own, continuously, without a human authoring each new behavior — and is, up close, a completely different machine. The model never changes. What grows is a library of vetted tools, recombined into workflows, steered by judgments that humans set. The intelligence is fixed. The inventory compounds. Borrowing the term from the Nobel-honored work in enzyme design, the right name for it is continuous directed evolution (CDE) : variation and selection on a fixed substrate, steered toward function by a hand that is always human.

This page makes one claim and defends it: CDE reaches roughly ninety-nine percent of what RSI promises, and unlike RSI it leaves behind a structure a static analyzer can check — which means defense, for the first time in this domain, becomes as tractable as compiling a program rather than the adversarial mess it is today.

I · The two machines

One rewrites itself. The other accumulates under a gate.

The companion page, What Agents Can Do, shows that Safebots already cover about ninety-nine percent of what open-ended agents do, because most useful work is recombination of patterns someone already worked out. CDE is what happens when you let that recombination run continuously. A human approves a tool once, through M-of-N governance. From then on the tool is a primitive the system may compose — and the compositions, not the model, are where capability grows.

◆ Recursive self-improvement

the feared machine

The model improves; the optimizer rewrites the optimizer.

The target — what counts as "better" — is set by the system , and moves.

New primitive powers are acquired, not just recombined.

No fixed surface to analyze; the rules change under you.

Defense is undecidable in the limit — you cannot check a moving definition.

◆ Continuous directed evolution

the Safebox machine

The model is fixed ; only the library of vetted tools grows.

The target is set by humans — judgments encode what "useful" means.

No new primitive powers — only combinations of approved ones .

A fixed surface: typed tools, declarative workflows, signed manifests.

Defense is decidable for an explicit class — you can check the composition.

The distance between these two is the whole safety argument. RSI's power and RSI's danger are the same property: it acquires capabilities no one approved, toward goals no one set. CDE gives up that one thing — the acquisition of genuinely new primitive power — and keeps almost everything else, because the space of combinations of approved tools is already vast. You approve a few dozen primitives; you get a combinatorial closure of governed workflows over them. The capability grows like a Cambrian diversification of body plans from a small set of parts — except every part was vetted before it entered the pool, and every new organism is checked before it is allowed to live.

The hand on the wheel is always human. The system composes; it does not acquire. That single renunciation is what turns an ungovernable loop into a checkable one.

II · The compiler argument

Make the workflow a language, and defense becomes a compiler pass.

Here is the part that matters most, and the part the rest of the industry does not have. Today, defending an AI system is an adversarial, probabilistic, never-finished affair: you watch behavior, you train classifiers, you add monitoring, and you hope the deterministic boundary catches what the probabilistic layer misses. Anthropic's own engineering team reached this conclusion publicly — that the model layer cannot catch egress through a permitted path, and the deterministic boundary is the one that holds. Safebox starts from that boundary and adds the thing that makes it analyzable: the workflow is a simple declarative language, and every tool carries enough metadata that a static analyzer can reason about a composition before it ever runs.

A workflow is not free-form code and not an open-ended action loop. It is a declared graph of typed steps. Each tool declares its inputs, its outputs, its network surface, and its effect class in a signed manifest. Because the language is restricted and the primitives are typed, the...

model system fixed improvement safebox from

Related Articles