OpenAI Agents SDK Sandboxes: Which one should you choose?

OpenAI Agents SDK Sandboxes: Which Provider Should You Actually Use? | Superserve Blog<br>330Get started<br>Open menu

“Supported” isn’t “best.” OpenAI ships SandboxClient classes for seven hosted providers (plus two local clients) in the Agents SDK. That’s an integration shim, not an endorsement - OpenAI doesn’t benchmark, rank, or vouch for any of them. The rest of this post is what we found when we actually looked.

A working developer's guide to picking the right execution backend - without the marketing.

Disclosure: this post is written by the team at superserve.ai . We build persistent sandboxes for AI agents, so we have a horse in this race - specifically the “persistent-by-default workspace” gap we discuss below. We don’t ship a SandboxClient in the OpenAI Agents SDK yet - this post is about the seven providers that do.

TL;DR

The Agents SDK ships with two local sandbox clients (Unix-local, Docker) and seven hosted ones : E2B, Modal, Daytona, Runloop, Vercel, Cloudflare, and Blaxel - the natively integrated list as of May 2026 (OpenAI sandbox clients documentation). The list can change between SDK minor releases.

All seven hosted providers do roughly the same job (spin up a Linux workspace, run commands, mount storage), but they’re optimized for different things: cold start, persistence, GPU, edge proximity, or developer-environment fidelity.

The native list has gaps, notably persistent-by-default workspaces and EU data residency aren’t universally covered, and pricing models are wildly inconsistent.

If you’re picking once and never re-evaluating, don’t . The Agents SDK is designed so the SandboxAgent definition stays stable and only the client swaps.

Why sandbox choice matters for the Agents SDK

“Native” ≠ “endorsed.” Being on the OpenAI Agents SDK’s sandbox client list means one engineer wrote and shipped a SandboxClient wrapper. It does not mean OpenAI tested, benchmarked, or recommends that provider. Treat the list as “compatible,” not “curated.”

The Agents SDK’s sandbox-agents feature is built on a clean split: the harness (model calls, tool routing, approvals, tracing, recovery) stays in your trusted infrastructure, and the sandbox is the execution plane where the model writes files, runs commands, and exposes ports.

That means your sandbox provider is the only thing standing between an LLM and arbitrary code execution on real infrastructure. Get it wrong and you get one of three failure modes:

Security : the agent writes a curl-pipe-bash and your sandbox isn’t isolated enough.

Cost : your sandbox bills for idle wall-clock, your agent thinks for 90 seconds between turns, and your AWS bill triples.

Latency : cold starts on every turn can dominate end-to-end task time - if your sandbox takes seconds to spin up and your task only runs for seconds, the cold start is the task.

The natively supported list is a useful shortlist precisely because the integration is done for you. But “supported” ≠ “best.” Here’s the actual rundown.

At a glance

If you only read one section, read this. (Full side-by-side table below; per-provider deep-dives after that.)

Already on Vercel or Cloudflare? Use the matching native client. The cheapest decision is the one that doesn't add a vendor.

Need GPUs? Modal first, Daytona second.

Running untrusted code at scale? E2B (Firecracker microVM) is the safest starting bet. Vercel and Runloop also use Firecracker.

Coding agent that needs the same workspace tomorrow? Runloop, Daytona, or Blaxel - in order of maturity, raw spin-up speed, and perpetual-standby economics.

Just prototyping? UnixLocalSandboxClient. Stop overthinking it.

Side-by-side comparison

ProviderIsolationPersistenceGPUMountsPricing modelBest fitUnixLocalNone (host process)Local FSHost’sLocal bindFreeDev loopDockerContainerVolumesHost’sS3/R2/GCS/Azure/Box/S3 Files via volumesFreeSelf-hosted prodE2BFirecracker microVMUp to 24h sessionNoS3/GCS/R2/Azure/Box (rclone)Per-second + tiered planUntrusted code at scaleModalgVisor containerEphemeral + Modal volumesYesS3/R2/GCS (Modal cloud bucket)Pure per-secondGPU-bearing agentsDaytonaContainersSnapshots + clone/archiveH100, RTX PRO 6000S3/GCS/R2/Azure/Box (rclone)Per-hour, per-second meteredStateful workspacesRunloopmicroVMSuspend/ResumeNoS3/GCS/R2/Azure/Box (rclone)$250/mo Pro + usageCoding agentsVercelFirecracker microVMSnapshots + persistent (beta)NoNone via SDK integrationItemized: CPU/mem/creations/net/storageNode-heavy code-genCloudflareContainer (DO-backed)Container lifetimeNoS3/R2/HMAC GCSWorkers Paid plan + container usageEdge-native agentsBlaxelUnikraft microVMPerpetual (auto-standby)NoS3/R2/GCS + Blaxel DrivesPer second + Tiered PlanLong-lived sessions

The natively supported list

UnixLocalSandboxClient (built-in)

What it is: Runs the sandbox as a local process tree on the host machine, no container. Ships with openai-agents.

Strengths:

Zero install. Ships with openai-agents.

Fastest local iteration.

PTY support via a small Python 3...

OpenAI Agents SDK Sandboxes: Which one should you choose?

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast