Unpacking sandbox startup latency: why started ≠ ready
All posts<br>Back News<br>June 19, 2026•6 minute read
Unpacking sandbox startup latency: why started ≠ ready<br>Rebecka Storm Product
Greta Workman@gretaworkman Product Marketing
Container scheduling and boot time is the metric that gets quoted most often when people talk about sandbox startup, and it's what the benchmarks usually measure. But in production, it’s usually the smallest part of what your users actually wait through. The work that happens after the container boots, like cloning a repo or installing dependencies, is usually much times larger.<br>We’re working to make the whole startup chain faster, and investing both in the parts of startup we own as a provider, and in the tools you need to optimize the parts you own.<br>Here, we’ll talk about what we mean by the full startup lifecycle, what to look for to optimize, and discuss the knobs we give you to build healthy production systems—including Readiness Probes , now generally available, which tell you exactly when a sandbox has finished initializing.<br>The full startup lifecycle<br>It's tempting to think of a sandbox as "running" the moment its container is scheduled and booted. Many benchmarks, like ComputeSDK's sandbox leaderboard, report on this as time to interactive (TTI). This is often measured from the moment you call create() to the first successful command running inside the container. It's useful for understanding the floor of what a provider can do, but it’s often just the start of the full startup chain.<br>Modal treats these pieces as several distinct events:<br>Created: The sandbox has been requested but no compute resources are allocated yet. On Modal, Sandbox.create is asynchronous and returns immediately. The latency addition here is negligible.<br>Scheduled: The sandbox has been assigned to a worker, which is provisioning the resources it needs (CPU, memory, GPU, volumes) and preparing the container environment. The latency getting to this point depends on how fast your provider is able to quickly find the capacity you need.<br>Started: The container is live, the entrypoint process is running, and network tunnels and Volume mounts are active. You can now run commands inside it with exec(...). The latency of this step depends on actual container boot time. This is usually what benchmarks measure.<br>Ready: Your application-level initialization has finished and the sandbox can actually do the work your user wanted. The latency here is entirely use case specific, but is often multiple seconds.<br>In use: The sandbox is cooking handling real work.
The gap between Started and Ready is largely not included in most benchmarks, even though in many realistic settings, significant work still has to happen after the container starts but before the sandbox is useful. This could be a git pull to fetch the latest remote state, a bun install or npm install to pull dependencies, or a server that needs to come up and start listening. The time before the Sandbox is started is the smallest variable in the chain. The application-level setup is longer, and it's your code's responsibility once the container is running.<br>This pattern is consistent across pretty much every real sandbox workload. For background coding agents, you need the repo cloned, the right branch checked out, and a working dev environment with services running before the agent can do anything. For vibe coding platforms, you need a running application, typically a server up and listening, before the user can interact with anything. For computer use RL training, you want to maximize your rollout throughput, so you need a browser loaded and often an HTTP server handling tool calls up before each rollout can run. In each case, the expensive part is application setup, not container scheduling, and it's invisible to a benchmark that stops at the first exec.<br>Decreasing perceived latency<br>What we’ve seen in the field is that very few applications actually care about container startup time on its own. What everyone cares about is perceived user latency: how long an end user has to wait between asking for a Sandbox and being able to use it.<br>If you start a Sandbox on demand and make users wait through all that startup lifecycle, a 30-second setup directly translates into a 30-second wait. That’s not acceptable for most products, and shaving a few hundred milliseconds off container boot wouldn’t really do much to change that, because container boot was a tiny piece of that 30 second wait. The real answer for production systems is to get rid of that perceived startup time altogether.<br>Optimizing for production systems: Sandbox pools<br>The solution is warm pool to pull from: pre-initialize Sandboxes in the background, before anyone (or any agent) asks for them. When a request comes in, you hand out a sandbox that's already gone through the startup process. The setup cost is paid ahead of time, invisibly, so perceived latency drops to roughly the time it takes to fetch a...