Deterministic Simulation Testing in Go with synctest // Guido Battiston
DETERMINISTIC SIMULATION TESTING IN GO WITH SYNCTEST
Last updated 2026.07.05
Concurrency bugs are miserable to test for. A test passes on your machine, fails once in CI, and you can’t reproduce it because it lived in some interleaving the scheduler picked that one time. Deterministic Simulation Testing (DST) is one way out of that. The idea is to take the things in your program that vary between runs and funnel them through a single bus, one operation at a time, all driven by a seed. Fix the seed and you get the same run back, so a failure stops being a fluke and becomes something you can replay and debug properly.
In practice that means getting a handle on four things: time, randomness, network and the scheduling of concurrent work. The first three are just regular interfaces you wrap and route through the bus, more or less the same in any language. Scheduling is the hard one, where Go has historically been awkward due to a few inherent language constraints. There are both commercial and open source solutions for implementing or retrofitting DST onto your project. However, since they are quite dependent on the language and tech stack, the implications for general architecture and coding discipline can be significant. A recent addition to the standard library, synctest, changes that.
What do we want to achieve?
Only one goroutine performs work at every single point in time
A single scheduler picks ops deterministically using a fixed seed
The delta between prod and test code is as small as possible
The easy part: interface injection
At its core DST is just wrapping the interfaces whose behavior differs across runs. Randomness is the trivial case, you hand around a rand.New(rand.NewSource(seed)) instead of the global functions. Network is the one worth showing, since hijacking it without touching call sites is the non-obvious part:
resp, err := http.Get("http://server/users/42")
To make it deterministic you swap the transport’s dialer. Every call site stays the same:
client := &http.Client{<br>Transport: &http.Transport{<br>DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {<br>return bus.Dial(ctx, addr)<br>},<br>DisableKeepAlives: true, // optional<br>},<br>resp, err := client.Get("http://server/users/42")
That net.Conn which is returned from bus.Dial is hijacked by us to route ops to the scheduler. Write turns the bytes into an op, pushes it onto the bus, and parks until the scheduler delivers it; Read wakes when the scheduler hands over an op addressed to this conn. As a sketch:
func (c *busConn) Write(b []byte) (int, error) {<br>c.bus.Push(&Op{From: c.from, To: c.to, Body: b}) // park until the scheduler delivers it<br>return len(b), nil
func (c *busConn) Read(b []byte) (int, error) {<br>return copy(b,<br>On the server side you hand the matching listener to a normal http.Server:
srv := &http.Server{Handler: mux}<br>srv.Serve(bus.Listener("server"))
Because everything rides on net.Conn, the standard library still serializes the HTTP request for you; each Write it makes becomes an op the scheduler orders. Any code using this client, including dependencies, is captured with no call-site changes.
SCHEDULER
BUS
net.http
net.http
CLIENT
SERVER
❙❙ pause<br>⏭ step
This visualization sums up the simplicity of DST at its core.<br>The network op is identified by the tuple (from, to), which the bus uses to keep a deterministic ordering of the pending ops. The scheduler then draws from that ordered set with the seed to decide which op runs next. Depending on your architecture this can literally be the service’s name, or it might be more fine-grained. If you have services spawning multiple connections to the same endpoint, you must distinguish them somehow. I recommend doing that through an explicit marker in the production code. A prime subject for injecting this marker is context.Context, since it should probably be handed across cancellable functions anyway, including connections. You should keep in mind though that an ambiguous ordering of ops is a critical error leading to non-determinism. So panic whenever two pending ops compare as equal, forcing an ambiguous ordering to fail loudly instead of silently going non-deterministic. That lets you iteratively add deterministic tests, while making sure the premises don’t break.
Network and randomness are straightforward injections. Which leaves the one part Go genuinely makes hard.
The historically hard part: scheduling
Rust, for what it’s worth, gets this part almost for free. Its runtime is pluggable and its async model suits DST from the start, so instead of forcing the scheduler to behave you swap the executor for one you control. madsim works like that. Scheduling is just another interface you can control in Rust.
Go gives you no such seam, you can’t directly replace the scheduler, so the existing attempts have all had to work around it....