Not Every Byte Gets a Vote

mitander1 pts0 comments

Not Every Byte Gets a Vote — mitander@xyz

In a deterministic game engine, replay looks like a hashing problem until the first harmless field<br>breaks it.

When I started wiring replay for the sim in my game engine, my first instinct was simple:

Easy. Hash everything.

For the first few fields, that even feels correct. Actor health, projectile position, RNG state:<br>obviously those belong in the replay check.

Then the edge cases start showing up. The AI has a trace explaining why it turned left. The renderer<br>has interpolation state from the previous frame. Pathfinding has a cache full of directions. A struct<br>has padding bytes because memory is memory.

The tempting checksum has appetite:

hash(&world.entities);<br>hash(&world.projectiles);<br>hash(&world.rng);<br>hash(&world.ai_trace); // uh oh<br>hash(&world.render_helpers); // definitely uh oh

No byte left behind. Also not much of a design boundary.

A checksum like that treats everything as evidence. It does not know the difference between "the<br>boss chose a different target" and "I renamed a debug breadcrumb."

That was the first real design pressure: not how to hash more bytes, but how to stop irrelevant<br>bytes from becoming replay authority.

They were all "state." They were not all authority.

The bug that made this obvious was not dramatic. I changed a helper field used for inspection, and<br>the replay checksum moved. The sim still behaved the same. The player ended in the same place. The<br>same enemies died. But the checksum said no.

That was the wrong failure. Replay was no longer testing gameplay divergence. It was testing whether<br>my debug furniture had the same shape.

So the question I ended up needing was narrower:

Which state is allowed to change what happens next?

player health yes<br>projectile position yes<br>RNG stream yes<br>debug events probably not. they are observations<br>render interpolation no. useful, but not gameplay truth<br>pathfinding cache maybe. save it or rebuild it, but name which one

The cache shouldn't wander into the authority path because the checksum was hungry.

This post is a walkthrough of that split: truth, cache, observation, and presentation. The examples<br>are excerpts from a Zig ARPG game engine. These names are local, but the boundary has been useful.

The interesting part is not "determinism good." Fixed ticks, explicit RNG, stable iteration order,<br>and initialized state are table stakes. The interesting part is deciding what counts as evidence<br>when replay says the sim diverged.

◆NOTE<br>In this post, "truth" means something narrow: state with the authority to change future gameplay.<br>Not "important." Not "useful." Authority.

The tick is the unit replay can trust

The sim advances in fixed ticks. Not frames.

The outer tick function mostly schedules phases:

simulation.zigpub fn tick(<br>self: *Simulation,<br>sim_input: Input,<br>maybe_tick_events: ?*TickEventQueue,<br>) void {<br>assert(self.world.phase == .idle);<br>self.world.assert_idle_phase_queues_drained();<br>self.world.ai_trace.clear();

self.run_ingress();<br>self.run_control();<br>self.run_derive();<br>self.run_plan(sim_input, maybe_tick_events);<br>self.run_apply(maybe_tick_events);<br>self.run_cleanup(maybe_tick_events);

self.world.tick_count += 1;<br>self.world.transition_to(.idle);<br>self.world.assert_idle_phase_queues_drained();

if (builtin.mode == .Debug) {<br>self.world.validate();

I like this function because it is plain, not clever.

idle<br>-> ingress // admit queued world/session changes<br>-> control // update control state<br>-> derive // rebuild derived facts before decisions<br>-> plan // turn input and AI into planned work<br>-> apply // commit movement, physics, combat<br>-> cleanup // retire per-tick leftovers<br>-> idle

Replay needs that kind of boring order. Every tick starts from idle, drains the queues it expects<br>to drain, runs systems in a fixed order, increments time once, and returns to idle. That gives the<br>checksum a clean place to stand.

If a queue leaks between phases, the next phase can read a command that belonged to the previous<br>one. If a system mutates state in the wrong phase, it becomes harder to explain which tick caused<br>which result. If the world does not return to idle, the next tick starts with unfinished work<br>already loaded.

The tick matters because it says where work is allowed to happen.

Replay stores inputs, not outcomes

For this replay setup, the file says what went in, not what supposedly happened.

If the replay file stores "the fireball hit for 18," replay is no longer checking the sim. It is<br>just checking whether I can read yesterday's answer back correctly.

The contract is:

seed + input tape -> ticks -> same authoritative result

The recorder is deliberately small:

replay.zigpub const Recorder = struct {<br>inputs: [recording_ticks_max]Input = undefined,<br>count: u32 = 0,<br>seed: u64 = 0,

pub fn push(self: *Recorder, input: Input) void {<br>if (self.count >= recording_ticks_max) {<br>@panic("replay input buffer overflow");

self.inputs[self.count] = input;<br>self.count += 1;<br>};

▲WARN<br>This is a...

self replay world state tick input

Related Articles