Three bugs that aren't in dial9

3 bugs that aren't in dial9

2026-05-30 · 6 min ·

Table of Contents

FxHash

Concurrency, part 1

Concurrency, part 2

Defense in depth

dial9 is a microscope for Tokio (and Rust applications in general): its compact binary format can record a log of runtime events so you can reconstruct what actually happened to understand bugs and performance behavior. It runs in production, where bugs have real blast radius.

As much as we attempt to avoid it, dial9 still has bugs. We catch most in CI, some in PR review, and some are discovered by customers.

We use AI to help build dial9, bug this presents a challenge: today's models are jagged: strong in one domain, weak in another. Folks are probably familiar with "How many r's are in Strawberry" or "I want to wash my car, the car wash is 50m away. Should I walk or drive?" These are real but don't really capture what this looks like in an Here are three examples we hit building dial9.

FxHash

dial9 uses a fast but cryptographically insecure algorithm called FxHash for certain encoding operations. I told the agent to inline the code from the much larger FxHash crate because we only needed ~10 lines:

impl Hasher for FxHasher { #[inline] fn write(&mut self, bytes: &[u8]) { for &b in bytes { self.0 = (self.0.rotate_left(5) ^ b as u64) .wrapping_mul(0x517cc1b727220a95);

This code has a bug, but unless you really know what's going on, you won't spot it by looking at it.

The agent failed to apply a critical optimization to hash bytes 8-at-a-time in 64-bit words.

fn write(&mut self, mut bytes: &[u8]) { // critical optimization: hash 8 bytes at a time while bytes.len() >= 8 { self.hash_word(u64::from_ne_bytes(bytes[..8].try_into().unwrap())); bytes = &bytes[8..]; if bytes.len() >= 4 { self.hash_word(u32::from_ne_bytes(bytes[..4].try_into().unwrap()) as u64); bytes = &bytes[4..]; for &b in bytes { self.hash_word(b as u64);

If a human inlined FxHash, there is a 0% chance they would create this bug: they would literally copy and paste the code from FxHash.

This bug actually stuck around our code base for quite a while until someone realized it could be optimized doing some unrelated work that touched the Hasher.

Concurrency, part 1

Early versions of dial9 didn't coordinate how events were flushed across multiple threads. This didn't drop events, but it meant that one wall clock interval of data could potentially be split across multiple files.

if buf.should_flush() || buf.flush_epoch.load() I added a new mechanism to trigger simultaneous flushing of the buffers via an epoch counter. Everything seemed to work fine, but a test case that validated that buffers were properly flushed failed extremely reliably on GitHub's macOS runner.

With other flaky tests, I had become accustomed to AI's ability to debug these sorts of race conditions by just staring at it so I did what I normally did and just asked AI to iterate w/ CI until it was green and fix it.

However, to my surprise, the agent couldn't fix it. It kept trying random changes but it couldn't seem to actually find the bug.

Eventually I started looking myself, I looked for probably under 60 seconds and spotted the bug:

if buf.should_flush() || buf.flush_epoch.load() We were incrementing the atomic that claimed we flushed before actually flushing.

You cannot assume that because AI tooling is better than you, even at one specific class of problem, that it won't fail on a simple problem. I've seen a lot of talented engineers basically give up if Claude couldn't solve the problem. That is a dangerous mindset. Next generation models like Claude Mythos show no signs of reduction in jaggedness.1

We only caught this because for some reason, the GitHub macOS runner hit this race every time. Cross platform CI is a cheap way to cover more of these issues especially when coupled with tools like Shuttle, Loom, Turmoil, and fuzzing. Defense in depth is required.

Concurrency, part 2

A basic prompt is all it takes to find MANY bugs in code written by agents (and humans). Some folks have a mistaken notion that one model is not going to be able to find its own bugs, but this couldn't be further from the truth.

dial9 also includes its own stack unwinder and ring buffer to work in environments like Fargate where kernel unwinding is not available.

The code to support this is quite complex: it needs to be a ring buffer that can be safely used from a signal handler. The original version had a bug where two atomics interleaved in a way that could leave tombstones that led to dropped data.

pub(crate) unsafe fn claim_slot() -> Option { let idx = BUFFER.write_idx.fetch_add(1, Ordering::Relaxed); let slot_idx = idx % BUFFER_CAP; let slot = &BUFFER.slots[slot_idx];

// If the slot isn't empty, the buffer wrapped around and the flush // thread hasn't caught up. Drop this sample. if...

Three bugs that aren't in dial9

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan