Scaling opencomputer from 1 VM to 1 million sandboxes

iacguy1 pts0 comments

Scaling OpenComputer from one VM to a million sandboxes – OpenComputer

Contents<br>One Azure region<br>Behind cloud quotas<br>The 300-CPU ceiling<br>Why migrating wouldn't fix it<br>A single control plane<br>Cells as the deploy unit<br>Control plane, one job<br>VM placement<br>Same cell, any cloud<br>A registry at the edge<br>How a create finds a cell<br>Only creates cost time<br>Why Cloudflare + D1<br>Heartbeats<br>Per-second billing<br>Push, don't poll<br>Lifecycle latencies<br>Where we are today<br>← Back to blog<br>Scaling OpenComputer from one VM to a million sandboxes

Written by Mohamed Habib · June 17, 2026

We started OpenComputer with a single virtual machine in one Azure region. The company grew quickly, but Azure couldn't raise our compute quota in that region any further, and we found ourselves growing against a fixed pool of CPUs.<br>So we had to find another way to scale.<br>This post covers what we did to reach a point where we can keep adding capacity more or less indefinitely. We'll go through how we split the system into cells, how a global registry at the edge decides where every sandbox lives, and how four cloud providers add up to a million CPUs.

Tweet by Mohamed Habib (@motatoeshq), June 7, 2026: “How we're scaling opencomputer.dev to 1M sandboxes”. View at https://twitter.com/motatoeshq/status/2063679701873492299.<br>Interactive architecture diagram with four stages. Stage 1 (1–100 sandboxes): one control plane connected to one worker. Stage 2 (100–1K): the same control plane scheduling across three workers in a single region. Stage 3 (1K–10K): the control-plane-plus-workers unit becomes a cell, with two cells sitting under an edge termination layer (Cloudflare Workers + D1 registry) that routes each create to the cell with the most room and receives heartbeats back. Stage 4 (10K–100K+): four cells under the edge layer, repeating across regions and clouds. Adding capacity is now a deployment step. Click the stage buttons or type a number of sandboxes to create and the architecture animates to match.Our entire capacity lived inside one Azure region's quota

Sandboxes are full VMs, VMs need physical hardware, and cloud providers ration that hardware per region. Our problem started with the rationing, so it helps to spell out how it works first.

Behind cloud quotas

Every cloud provider bounds you by regional capacity, which is basically where the physical data center exists with finite hardware and a long queue of customers who want it.<br>So providers hand out capacity as a quota, measured in total CPU count. Quotas start small and grow only after you build usage history. The working protocol is to run your existing allocation at around 50% utilization for a week or two, request an increase, and let the usage data justify it. Request 10,000 CPUs on day one and the answer is no.

We hit the 300-CPU ceiling early

Our first region was Azure US East 2, with a starting quota of about 300 CPUs. The plan was to consume it and request more, the way the ladder normally works.<br>But unfortunately we had picked one of the busiest data centers in the world, and 300 CPUs was the ceiling there regardless of our usage history. Meanwhile new users were signing up against a fixed pool of compute.

Why migrating to another region wouldn't have fixed the scale issue

The obvious move was relocating to a quieter region or a different cloud and collecting a bigger quota there. We also held Azure credits we wanted to use. But the deeper problem is that migrating regions or cloud providers only resets the clock.<br>Every region has an upper limit, providers raise it gradually, but a single-region architecture scaling fast enough will hit the ceiling at some point.<br>On top of that, sandbox demand at 10K, 100K, or 1M concurrent VMs can't physically be served from one data center. We needed to rethink the architecture so that adding a region of any cloud is a deployment step rather than a migration project. And we had to take our existing architecture apart to make that happen.

Starting with a single control plane

Our first version of OpenComputer was a single VM handling everything with a control plane.<br>The second version is where we scaled to multiple VMs in one region, coordinated by one control plane that did everything: the web UI, the dashboard, billing logic, and the actual orchestration of VMs, all in one place.

A single control plane (running web UI, dashboard, billing, and orchestration) connected to three workers, all inside one Azure region shown as a dashed boundary box.That design served roughly a thousand sandboxes comfortably. It assumed all compute lived in one region, and the component that orchestrates VMs can also be the component that runs the product around them. We had to split those two jobs apart.

Cells made capacity a deployable unit

We started with cutting the control plane down to a single job (orchestrating VMs), and we packaged it with the workers it manages into a unit that deploys into any cloud region. That unit is a "cell."

The control plane...

region control plane cloud sandboxes single

Related Articles