Give your agent its own computer

Learn

DocsCompany

PricingTry LangSmith

Get a demo

Try LangSmith

Get a demo

LangSmith

Give your agent its own computer

Amy Ru

June 5, 2026

min

Go back to blog

Create agents

LLMs can reason. But reasoning alone doesn't get much done. Running code execution in an AI agent is harder than it looks. Your agent needs a real computer (filesystem, shell, package manager, persistent state) but handing it access to your infrastructure is dangerous. Think about it this way: you use one laptop. You are n of one. But agents are going to run millions of tasks, and each one needs its own computer to work from. That's the infrastructure shift happening right now. Satya Nadella put it plainly: "Every agent needs a computer." The question is what that computer looks like, and how you give it to them safely. LangSmith Sandboxes are our answer to that. Here's why it matters, and why doing it yourself is harder than it sounds. What becomes possible when an agent has a computer Think about what Cursor, Claude Code, or ChatGPT's code interpreter can do that a plain chat interface can't. They don't just answer questions: they run the code, see the error, fix it, run it again, and hand you something that works. That feedback loop is what makes them useful. That same loop is what separates a demo agent from a production agent. Once your agent can execute, a whole category of work opens up: A coding assistant that doesn't just suggest a fix: it applies the fix, runs your tests, and confirms nothing broke A data analyst that pulls a CSV, runs Python against it, and hands you a formatted report A CI agent that clones your repo, installs dependencies, runs the full test suite, and opens a PR (like OpenSWE) A research agent that browses, scrapes, synthesizes, and writes — not just searches A content pipeline that generates, renders, and exports finished artifacts An RL or eval harness that needs to spin up environments in parallel, run episodes at burst scale, and tear them down immediately — zero to thousands of sandboxes, then back to zero The common thread: these agents need more than a token stream. They need a place to work. Why you can't just hand your agent your laptop The obvious next question is: why not just let the agent run code locally? Or in a Docker container? Teams do this in early prototypes. It stops working in production for two reasons. First: agents run untrusted code by definition. The code your agent executes might come from a model, a user prompt, a cloned repo, or an installed package. You didn't write it. You can't fully vet it. In September 2025, a self-replicating npm worm called Shai-Hulud backdoored 500+ packages — code that executed in preinstall before any validation could run. A second wave in November hit 796 more packages and 25,000+ GitHub repos in hours. An agent that installs npm packages as part of its workflow is exposed to exactly this. Second: containers aren't enough. The common instinct is "just run it in Docker." Containers are great for isolating known, vetted application code (i.e. a web server, a background job). They're not designed for an agent that's installing arbitrary dependencies, running model-generated scripts, and persisting state across a long-running session. And critically: containers share a kernel with the host. A kernel exploit reaches through them. Copy Fail (CVE-2026-31431) is a 732-byte Python script that roots every major Linux distribution back to 2017 via the kernel crypto API. AI tooling found it in about an hour. A container boundary is not an isolation boundary. For untrusted, model-generated code, you need hardware-level separation. LangSmith Sandboxes: a computer for every agent The mental model that helps here: a sandbox needs to be two things at once. It needs the instant startup of a serverless function because you can't make an agent wait two minutes for a VM to boot. And it needs the statefulness of a full machine because agents aren't stateless request-handlers; they're mid-session workers that install dependencies, edit files, and pick up where they left off. LangSmith Sandboxes are built for that model. Each one is a hardware-virtualized microVM. Not a container, a full machine with its own kernel. The agent gets: Agent └── its own computer ├── filesystem ├── shell ├── package manager ├── network access ├── code execution └── persistent stateIt can install packages, run scripts, edit files, spin up a local server, and keep working across a long session — all without touching your production infrastructure or any other agent's sandbox. When the work is done, the sandbox disappears. You access it through the same LangSmith SDK and API key you already use: from langsmith import Client

client = Client() sandbox = client.create_sandbox()

# Give the agent a shell result = sandbox.run("pip install pandas && python analysis.py") print(result.stdout)It just takes one call, and your agent has a computer. There's...

Give your agent its own computer

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs