The Sandbox Shift – sandboxes are the new containers, for AI-written code

zozo123-IB1 pts0 comments

The Sandbox Shift — a field manual for running untrusted code

Skip to content

Container era · 2013Trusted human writes the code

Goal: portability & reproducibility. Code is reviewed and vouched for. Environments are long-lived pets.

Sandbox era · nowA model writes the code

Goal: isolation, instant, disposable. Code is guilty until proven safe. Environments are ephemeral cattle, by the thousand.

The author is untrusted. Model-written code can carry an injected payload, a hallucinated rm -rf /, or a typo'd dependency that resolves to malware. You can't read every run.

Blast radius is bounded. The box decides what the code may touch — filesystem, network egress, secrets, and (if isolation is weak) the host kernel itself.

Reproducible, at scale. Identical clean state every run, thousands in parallel, cheap to spawn and kill. Without it, an eval's reward signal is just noise.

Containment is only half the story. A sandbox is also the whole computer an agent works inside — one place to dev, test, build and deploy, where it takes a task end to end with no human in the seat.

edit → run → test → build → deploy → observe ↺ the agent's loop — repeated autonomously, thousands of times over

Docker packaged the artifact: build, ship, run. The sandbox hands the agent the entire workflow — and the keys. Give it a box that can only run code and you get a calculator; give it one that can edit, build and ship, and you get a developer.

Four jobs people actually reach for them:

devThe harness's workshop Give the coding agent — Claude Code & friends — a box to develop in: edit, install, run, break things, without touching your laptop or your main branch.

test · evalProve it works Run code you or your AI just wrote on a clean slate and grade it, before you trust a single line.

deployShip AI-written code Run model-authored code in production inside a contained runtime, where its blast radius is bounded by design.

parallelTen at once Fork one environment ten ways and let agents chase ten features and bugs at the same time. Keep what passes; bin the rest.

And the same primitive, by role — pick yours:

SWE<br>MLE<br>AI Researcher

Agents now open PRs and run shell commands. You ship code no human fully reviewed.

WhyAn autonomous coding agent's diff is untrusted input the moment it executes.<br>WhereCode-interpreter tools · agent dev loops (a worktree per task) · CI · production tool-calls serving real users.<br>The callUntrusted code, nothing private to reach → ephemeral container or microVM, egress off .

Model-generated code and SQL run against real datasets and pipelines.

WhyThe generated step touches data you actually care about — and might exfiltrate or corrupt it.<br>WhereFeature pipelines · notebook / analysis agents · batch scoring · data-cleaning tools.<br>The callUntrusted code that needs private data → inside the VPC, scoped credentials, deny-by-default egress .

Every rollout and every eval runs untrusted code — thousands at once — and the reward must be reproducible.

WhyRL and evals are untrusted execution at scale; dirty state silently poisons the signal.<br>WhereRL environments · verifiable evals · reasoning sandboxes that run harnesses: spawn → set up task → run the agent's code → score → destroy.<br>The callDisposability is the whole game → microVMs: VM-grade isolation, ~100 ms boot, thousands per host .

The boundary decision<br>Untrusted code with nothing to steal belongs outside — a public, ephemeral box, lowest blast radius.<br>Untrusted code that needs your data belongs inside the VPC , isolated hard — now you're fencing something with real network reach.

An isolation ladder — weak & fast at the bottom, strong & heavy at the top. Pick the lowest rung that holds your threat model.

subprocess + limitsSame kernel, same user. A timeout and a ulimit. Trusted code only.

namespaces · cgroups · seccompnsjail, bubblewrap. Cheap kernel-level fencing for semi-trusted code.

containerrunc / containerd. Shared kernel — one kernel CVE is an escape.

gVisorA user-space kernel intercepts syscalls. Container ergonomics, smaller attack surface.

microVM · FirecrackerA real VM boundary that boots in ~100 ms, thousands per host. The agent sweet spot.

full VM · air-gappedMaximum isolation, maximum cost. For the genuinely hostile.

isolation strength↔boot latency↔density / cost

microVMs broke the old rule that VM-grade isolation must be slow and expensive — which is what makes per-run disposability economically real.

the real lessonDocker didn't beat LXC on isolation — it won on developer experience. Sandboxes get won the same way: the best API and the fastest boot, not the thickest wall.

Four questions. Live verdict, recommended rung (it highlights the ladder), and placement. Nothing leaves your browser.

Who wrote the code?<br>I did, or I've read it<br>A teammate reviewed it<br>A model wrote it — unreviewed

What can it reach?<br>Nothing sensitive<br>Public data only<br>Private data / internal services<br>Production secrets

How many runs?<br>Once /...

code isolation agent untrusted model kernel

Related Articles