Not Every Byte Gets a Vote — mitander@xyz
In a deterministic game engine, replay starts simple: record inputs, run the same ticks again, and<br>compare the result.
When I started wiring replay for the sim, my first instinct was simple:
Easy. Hash everything.
For the first few fields, that feels right. Actor health, projectile position, RNG state: if any of<br>those differ, the run probably diverged.
Then the less obvious fields pile up. The AI has a trace explaining why it turned left. The renderer<br>has interpolation state from the previous frame. Pathfinding has a cache full of directions. A struct<br>has padding bytes because memory is memory.
The naive checksum was heading toward this:
hash(&world.entities);<br>hash(&world.projectiles);<br>hash(&world.rng);<br>hash(&world.ai_trace); // uh oh<br>hash(&world.render_helpers); // definitely uh oh
That kind of checksum treats every field as equally meaningful. It catches real divergence, but it<br>also turns harmless implementation changes into replay failures.
The bug that made this obvious was small. I changed a helper field used for inspection, and the<br>replay checksum moved. The sim still behaved the same. The player ended in the same place. The same<br>enemies died. But the checksum said no.
Replay was failing because debug data had a different layout.
So the question became narrower:
Which state can change future gameplay?
player health yes<br>projectile position yes<br>RNG stream yes<br>debug events probably not. they are observations<br>render interpolation no. useful, but not gameplay truth<br>pathfinding cache maybe. save it or rebuild it, but name which one
For the pathfinding cache, I want an explicit decision. Either it is rebuilt before AI can read it,<br>or it is persisted and treated as part of the runtime state that matters. What I want to avoid is a<br>cache drifting into replay just because the checksum happened to traverse it.
This post walks through the split I ended up using in a Zig ARPG engine: authoritative gameplay<br>state, derived caches, observation/debug output, and presentation state. The names are local, but<br>making each field pick a role has been useful.
Determinism still needs the usual work: fixed ticks, explicit RNG, stable iteration order,<br>initialized state, and no hidden dependency on render timing or local machine state. The checksum<br>only tells me whether two runs arrived at the same authoritative state.
◆NOTE<br>In this post, "truth" means state that is allowed to affect future gameplay. It is a replay term<br>here, not a claim that other state is unimportant.
The tick gives replay a fixed checkpoint
The sim advances in fixed ticks, not render frames.
The outer tick function mostly schedules phases:
simulation.zigpub fn tick(<br>self: *Simulation,<br>sim_input: Input,<br>maybe_tick_events: ?*TickEventQueue,<br>) void {<br>assert(self.world.phase == .idle);<br>self.world.assert_idle_phase_queues_drained();<br>self.world.ai_trace.clear();
self.run_ingress();<br>self.run_control();<br>self.run_derive();<br>self.run_plan(sim_input, maybe_tick_events);<br>self.run_apply(maybe_tick_events);<br>self.run_cleanup(maybe_tick_events);
self.world.tick_count += 1;<br>self.world.transition_to(.idle);<br>self.world.assert_idle_phase_queues_drained();
if (builtin.mode == .Debug) {<br>self.world.validate();
I like this function because it has boring edges:
idle<br>-> ingress // admit queued world/session changes<br>-> control // update control state<br>-> derive // rebuild derived facts before decisions<br>-> plan // turn input and AI into planned work<br>-> apply // commit movement, physics, combat<br>-> cleanup // retire per-tick leftovers<br>-> idle
Replay needs that kind of boring order. Every tick starts from idle, drains the queues it expects to<br>drain, runs systems in a fixed order, increments time once, and returns to idle. That gives the<br>checksum a specific point in the loop to measure.
If a queue leaks between phases, the next phase can read a command that belonged to the previous<br>one. If a system mutates state in the wrong phase, it becomes harder to explain which tick caused<br>which result. If the world does not return to idle, the next tick starts with unfinished work already<br>loaded.
The tick boundary says where work is allowed to happen.
Replay records inputs
For this replay setup, the file says what went in.
If the replay file stores "the fireball hit for 18," replay is checking a recorded answer instead of<br>checking the sim.
The contract is:
seed + input tape -> ticks -> same authoritative result
The recorder is deliberately small:
replay.zigpub const Recorder = struct {<br>inputs: [recording_ticks_max]Input = undefined,<br>count: u32 = 0,<br>seed: u64 = 0,
pub fn push(self: *Recorder, input: Input) void {<br>if (self.count >= recording_ticks_max) {<br>@panic("replay input buffer overflow");
self.inputs[self.count] = input;<br>self.count += 1;<br>};
▲WARN<br>This is a bounded test recorder, not a production replay format. Overflow panics on purpose:<br>better a loud failure than a silently truncated tape. Longer runs should be split...