Sandboxes Are Not Security

Sandboxes Are Not Security - Jelmer SnoeckThe industry has settled on an answer to the question of how to run AI agents safely: put them in a sandbox. AWS, Daytona, Cloudflare, LangChain, pick one: they all sell it. The category has a shape, a name, a TAM, and a comparison-table format. “Secure code execution for AI agents.” “Zero risk to your infrastructure.” “Real isolation, not just sandbox features.” It is not security. Not on its own. A sandboxed agent with valid Postgres credentials can still drop the production table, send every customer an email through a working OAuth token, and spin up a hundred GPUs in eu-west-1 on an attached AWS role. The sandbox does not gate any of these outcomes, because the sandbox is not what stands between the agent and the resource. The credential is. And the credential is sitting inside the sandbox, ready to be used. Sandboxes are part of the answer. They are not the whole answer, and the industry keeps selling them as if they were. What a sandbox actually does A sandbox protects the host from the agent’s own process. Filesystem isolation. Network namespacing. A bounded blast radius for whatever the agent writes to whatever directory the agent has. These are real properties and they solve real problems. State pollution between sessions. Reproducibility across runs. Containment of an agent that runs rm -rf / on what it thinks is its own scratch directory. None of that is fake. It is also not what attackers are after. An attacker who compromises an agent, whether through prompt injection, a poisoned tool description, or a malicious document the agent was asked to summarise, does not want the agent’s scratch directory. They want what the agent is authorised to do: the Postgres connection string the agent uses to answer questions about customers, the GitHub token the agent uses to open pull requests, the AWS role the agent uses to read from S3. That last one probably also lets it write to S3, and write is where the damage lives. None of these are stored in the agent’s filesystem in any meaningful sense. They are loaded into the agent’s process at runtime, used by the agent on the attacker’s behalf, and the sandbox watches the whole thing happen without interrupting once. The sandbox cannot interrupt. That is not its job. That has never been its job. How we got here Code-execution sandboxes are a real and useful primitive, and the industry knew how to build them before agents existed. They came from Jupyter, from Repl.it, from the lineage of “run untrusted code from untrusted users in a multi-tenant environment without letting them escape onto our host.” That threat model is coherent. The code is the attacker. The sandbox is the answer. What happened next is that the agent platforms borrowed the primitive whole and kept the marketing. Agents look like code that needs to be run, so the sandbox vendors pivoted, and the new agent platforms shipped sandboxes as their first security feature. The pitch decks updated, the comparison tables updated, the buyer’s mental model updated. “How do we secure our agent” got the same answer “how do we run untrusted code” got, and the answer was wrong, because the threat model changed under everyone’s feet and nobody changed the slide. Agents are not untrusted code. They are trusted code holding untrusted instructions and authorised credentials. The danger lives in what the agent is allowed to do on behalf of the human who deployed it, not in what the agent might run that escapes onto the host. Sandboxes were designed to stop the second thing. They were never designed to stop the first. Two controls, different jobs Isolation gates what a process can touch on its own machine. Authorisation gates what a process can do on someone else’s behalf. They are different security properties solving different threats. The industry collapsed them and called the result agent security. A useful test. If your only safety claim about an agent is that it runs in a sandbox, ask what happens when the agent calls the API it was given a token for. If the answer is “the call goes through,” the sandbox is irrelevant to the threat. Whether the sandbox is a microVM, a container, a WASM runtime, or a managed service with SOC 2 compliance: irrelevant. The token left the building the moment the agent decided to use it. The sandbox watched. A sandbox can lock down egress and refuse to let the agent talk to anything except an allow-listed set of endpoints. That helps, until the endpoint is one the agent is allowed to reach and the compromise becomes a credential leak instead of an exfil. The right answer to that shape of problem is a gateway sitting between the agent and the resource, brokering the call and watching what the credential is used for; another post. The first move past the consensus The first move has already been made in public,...

Sandboxes Are Not Security

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs