We run sandboxes for agents at scale

How we run sandboxes for agents at scale | Blog Table of contents Full ControlFrom gVisor to FirecrackerThe rootfs is just an imageExecuting at Scale

Share this article X IconLinkedIn Icon

Product

Jul 3, 2026 How we run sandboxes for agents at scale

Sean Smith CTO & Co-Founder

Table of contentsFull ControlFrom gVisor to FirecrackerThe rootfs is just an imageExecuting at Scale

li]:marker:font-medium [&_table]:w-full [&_th]:border [&_th]:border-border-subtle [&_th]:px-12 [&_th]:py-10 [&_th]:text-body-regular [&_th]:font-medium [&_th]:text-content-primary [&_th]:text-left [&_td]:border [&_td]:border-border-subtle [&_td]:px-12 [&_td]:py-10 [&_td]:text-body-regular [&_td]:text-content-secondary [&_td]:text-left [&_pre]:bg-bg-secondary [&_pre]:border [&_pre]:border-border-subtle [&_pre]:rounded-md [&_pre]:p-16 [&_pre]:overflow-x-auto [&_code]:font-mono [&_code]:text-body-regular [&_.payload-richtext>*]:mb-[2.4rem] [&_.lexical-table-container]:overflow-x-auto [&_.lexical-table-container]:rounded-md [&_.lexical-table-container]:border [&_.lexical-table-container]:border-border-subtle [&_thead]:bg-bg-secondary [&_th]:!border [&_th]:!border-border-subtle [&_th]:!px-12 [&_th]:!py-10 [&_th_p]:!text-body-regular [&_th_p]:!font-medium [&_td]:!border [&_td]:!border-border-subtle [&_td]:!px-12 [&_td]:!py-10 [&_td]:font-normal [&_td_p]:!text-body-regular [&_td_p]:!font-normal">We took a bet early on to give the LLM the power to run arbitrary code. This post is about why we made that bet, and what it takes to run thousands of these sandboxes at once, spinning up and down as fast as people start and finish chats with the agent. Every conversation a user has with the Adapt agent is backed by its own computer. Not just some locked down container on a shared server, but an isolated VM the model can do whatever it wants with: install software, write and run programs, browse the web, talk to APIs. We call these sandboxes, and they're one of the core primitives Adapt is built on. Full Control LLMs are coding geniuses, and my job has largely been about building the perfect developer environment for them to work in. The usual way to connect an AI to the outside world is to hand-build integrations, a bespoke connector for GitHub, another for HubSpot, another for Stripe, or to wait for each service to ship an MCP server. This just doesn't scale, and I'm not really into writing integration code day-in day-out. So instead of doing that work ourselves, we let the model do it. Any service that exposes an API can be accessed from Adapt, because we give the LLM everything it needs to write the script or program that talks to that API. That's a big part of what we mean when we call Adapt a "horizontal intelligence": it isn't wired to a fixed list of tools, it can build the tool it needs on the spot. Foundational to this is giving the LLM full access to the sandbox. Instead of handing the model a static set of languages and CLI tools with limited access to the filesystem, we give it complete access to everything. It runs as root. And while our sandboxes ship with common runtimes like Node and Python, what if the best SDK for some service's API is written in Go? The model can just go ahead and install and run it. Does the LLM need to write a Go program? Go ahead and install Go and run it. So if we're allowing the model to install whatever it wants and execute code no human has verified, how do we secure it? Fortunately, we're not the first people who've needed to run untrusted code. There are two very popular secure runtimes for exactly this: gVisor and Firecracker. Our journey so far has made us very well acquainted with both. From gVisor to Firecracker Our first foray into secure sandboxes for LLMs was the "easy" approach: run each sandbox with gVisor on top of GKE (Google Kubernetes Engine), using GKE Sandbox. We're already running all of our other services on GKE, so this was the natural step for us. gVisor sits between a container and the host kernel. Instead of letting a program make system calls straight to the real Linux kernel, the thing you really don't want untrusted code poking at, gVisor intercepts those calls in a user-space kernel of its own and services them itself. You get most of the convenience of a normal container with a much smaller attack surface. And GKE Sandbox packages all of this up. You deploy Pods (containers) and they transparently run under gVisor, without us having to do much infrastructure configuration at all. And this worked really well to start. We defined the "base" sandbox as a Docker image and let GKE scale it out to the number of sandboxes we needed at any given time. Updates to the software sandboxes shipped with were simple Dockerfile updates and a version bump in a manifest. Hundreds of sandbox Pods running under GKE Sandbox. But the same abstraction that made gVisor easy is the one we kept fighting. Because gVisor reimplements...

We run sandboxes for agents at scale

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI