Recreating a Two Million Particle World at 30 Hz over WebSocket with Centrifugo

Recreating a Two Million Particle World at 30 Hz with Centrifugo | Centrifugo

Skip to main content David Gerrells wrote a blog post How fast is Go - simulating millions of particles on a smart TV — describing a Go server that simulates two million particles in a 2200 × 2200 world at 60 Hz, ships frames to clients at 30 Hz over WebSocket, and lets anyone connected pull particles around with their cursor. The transport is hand-written for speed: bit-packed binary frames, manual protocol, raw WebSocket library. The live demo runs at howfastisgo.dev — try it before reading on.

David's goal was to explore Go performance. Once we saw the demo we immediately wanted to try reproducing it on top of Centrifugo — to see whether a generic real-time transport like Centrifugo could carry this kind of payload. At first glance it looked straightforward: Centrifugo provides a binary WebSocket transport, and the simulation already runs on the server. But along the way we ran into design differences that meant we couldn't quite match the original's per-viewer bytes on the wire. We'll show what we built, what it cost, and why the overhead is worth it for UX and scalability.

Source code of our final demo: v6/millions_of_particles.

Recap the original

The "two million particles" lives entirely on the server. What goes to clients is a density map — one bit per world cell, answering "is there any particle in this cell?". Several particles in the same cell collapse to one bit. Bytes per frame scale with viewport pixels, not particle count — bumping the simulation to 4M particles wouldn't change the wire size at all, the cells would just get fuller.

The server runs everything in one Go process: the simulation, the WebSocket connections, and per-client camera state. On every tick it walks the connected clients, reads each one's camera (x, y, width, height) shipped up from the browser, crops that rectangle from the world buffer, and writes the bit-packed bytes straight to that client's WebSocket. A typical desktop window of 1410 × 730 pixels packs to 1410 × 730 / 8 ≈ 129 KB per tick — that's all a viewer ever receives. The client sees about 21% of the world and pans by changing the camera; cursor input flows back up the same WebSocket and pulls particles around in the simulation.

That tight coupling — sim, sockets, and per-client cameras all in the same process — is what keeps the original lean per viewer: the server cuts a custom message for each connection because everything it needs is right there. Now let's see what changes when we put a generic broker in the middle.

How it fits with Centrifugo

┌────────────────────────────────┐ │ Go backend (particle sim) │ │ 60 Hz sim · 30 Hz publish │ └────┬──────────────────────▲────┘ │ │ │ POST /api/publish │ HTTP RPC proxy │ (frame bytes) │ (cursor input) ▼ │ ┌────────────────────────────────┐ │ Centrifugo │ │ single channel · WS fan-out │ └────┬──────────────────────▲────┘ │ binary WS frames │ WS RPC (cursor) ▼ │ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │ │ B │ │ B │ │ B │ │ B │─┘ browsers └───┘ └───┘ └───┘ └───┘

The backend publishes one binary payload per tick to a single Centrifugo channel; every subscribed browser receives the same WebSocket frame and slides a local camera over the bitmap to render its slice. Cursor input flows back via RPC over WebSocket, proxied to the backend's HTTP endpoint.

The publisher's job is constant: pack the whole 2200 × 2200 world at 1 bit per pixel, send to a channel — and that's it – Centrifugo handles the fan-out. Pan works locally too: each browser has the whole world in memory and just slides a camera over the bitmap.

See in better quality on YouTube.

It works — but at roughly 5× the bytes the original sends to each viewer (~605 KB vs. ~129 KB). It's by design: Centrifugo is a standalone broker. It doesn't know about user cameras, viewports, or which slice each viewer cares about, so a single channel has to ship bytes useful for any subscriber, and the simplest "useful" is the whole world.

And the gap widens with world size. Bump the world to 10000 × 10000 and the naive port ships ~12.5 MB per viewer per tick, while the original would still send ~129 KB — each viewer only pays for their viewport. The naive approach scales with world size; the original scales with viewport size. So we have a real reason to find a better fit.

We could try one channel per viewer, with the backend tracking each camera and packing an individual crop per tick. That would get us closer to the original's ~129 KB per viewer, but it gives up fan-out — the thing Centrifugo is built for — and turns every viewport change into RPC traffic up plus a per-client publish down. The backend would also need to track camera state per connection — easy in the original's single-process design, awkward when Centrifugo sits in between.

So we want fewer bytes per viewer and keep fan-out.

Splitting the world into tiles

The idea: split the world into tiles, and let each viewer subscribe only...

Recreating a Two Million Particle World at 30 Hz over WebSocket with Centrifugo

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play