How We Securely Serve a Large Agent Fleet on a Small Infra Footprint

How We Securely Serve a Large Agent Fleet on a Small Infra Footprint | gluonDB

Blogh2]:max-w-3xl [&_.prose>h3]:max-w-3xl [&_.prose>ol]:max-w-3xl [&_.prose>p]:max-w-3xl [&_.prose>ul]:max-w-3xl">blockquote]:border-primary [&>blockquote]:border-l-4 [&>blockquote]:pl-4 [&>blockquote]:text-muted-foreground [&>code]:rounded [&>code]:bg-muted/50 [&>code]:px-1 [&>code]:py-0.5 [&>h1]:text-foreground [&>h2]:text-foreground [&>h3]:text-foreground [&>h4]:text-foreground [&>h5]:text-foreground [&>h6]:text-foreground [&>li]:text-foreground [&>p]:text-foreground [&>pre]:rounded-md [&>pre]:border [&>pre]:bg-muted/50 [&>ul]:space-y-1"> The agent landscape changed fast. A year ago, most "agents" were chat apps with a tool call. Today, the useful version is closer to a persistent worker: something that remembers, wakes up on a schedule, reads from real systems, writes reports, notices changes, and sometimes executes code. If an agent is only a chat session, you can serve it like a request. If an agent is a worker, you have to decide what stays alive when nobody is watching: files, memory, tools, schedules, or a whole sandbox. Our answer at gluonDB is simple: The sandbox is not the agent. The VM should not be the unit of identity. The filesystem should not be inseparable from code execution. And the orchestration layer should not live inside the same environment it is asking untrusted model output to manipulate. We run a large fleet of persistent agents on a small infra footprint because we split the problem into three parts that are usually bundled together:

Orchestration

Filesystem

Sandboxed execution That split is the trick. The filesystem is durable. The agent control plane is persistent. The sandbox is pooled and temporary.

The VM Is the Wrong Default

Most agent infrastructure ends up in one of two conversations. One is orchestration: frameworks like LangGraph, the Vercel AI SDK, and the OpenAI Agents SDK help developers build loops, tools, state, handoffs, and guardrails. The other is execution: Firecracker made lightweight microVMs a serious primitive, Fly Machines run apps in Firecracker microVMs, and E2B gives agents isolated sandboxes for untrusted code. Both matter. The mistake is when they collapse into one default: Give every agent a sandbox and make that sandbox its computer. That default makes sense for coding agents. If the whole product is editing repos, running tests, installing packages, and starting servers, then keeping the agent close to a shell is natural. But data agents, reporting agents, monitoring agents, and most always-on business agents are different. They need durable working state much more often than they need live execution. They do not need a VM burning CPU and memory all day waiting for the rare moment when the model decides to run npm install.

Files Are Not Execution

The subtle mistake is binding the filesystem to the sandbox. Once you do that, the sandbox quietly becomes the agent's identity. The agent's files live there. Its scratch space lives there. Its process lives there. The harness often lives there too. Now shutting down the sandbox does not feel like stopping execution. It feels like stopping the agent. So teams keep sandboxes warm. Costs rise. They add pooling, but the pool is still fighting the wrong abstraction if every agent is treated as the owner of a sandbox. Then come snapshots, lifecycle policy, cleanup jobs, and image management. Eventually they are building a small cloud provider because they wanted an agent that could write a weekly report. The agent should have durable identity and durable files without owning a running execution environment.

Idle Machines Get Expensive

Always-on sandbox-per-agent infrastructure prices the system around the wrong bottleneck. Agent loops are not free, but the expensive part is pretending every persistent agent needs a persistent machine. If an agent is actively coding all day, fine. Keep a machine close. But if an agent checks a database every morning, writes a report, answers questions, watches for anomalies, and occasionally runs a shell command, the VM is idle most of the time. The industry answer is often "microVMs are cheap." True. Firecracker is excellent. But cheap is not free. If you self-host microVMs, you inherit KVM, networking, images, snapshots, density, cleanup, and host constraints. Nested virtualization has improved, including on AWS, but it is still a thing you have to understand and operate. For us, that was the wrong surface area. We wanted the agent to stay alive as an identity, not as a VM.

The Harness Should Stay Outside

The agent harness is the control system. It decides what tools exist, what credentials are available, what memory gets loaded, and what work is allowed to happen. Putting that harness inside the same sandbox used for arbitrary execution is a strange default. Yes, you can harden it. But you have still moved the most sensitive part of the system into the place where the model is allowed...

How We Securely Serve a Large Agent Fleet on a Small Infra Footprint

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi