We measured whether a drift-checker changes what an AI coding agent writes

We measured whether a drift-checker changes what an AI agent writes — VibeDrift BLOG · BUILDING VIBEDRIFT We measured whether a drift-checker changes what an AI agent writes June 9, 2026 · ~9 min read We build a tool that scans codebases for drift, the gap between the patterns a codebase already uses and the patterns new AI-written code introduces. It now also runs as an MCP server, so a coding agent can query it whileit writes, and that raises the obvious question we didn't want to hand-wave. Does feeding the agent this signal actually change what it produces, or does it just make a nicer report after the fact? So we ran an eval designed to disprove it, and here's the whole thing, including, prominently, the conditions where it did nothing. The setup There are two arms. In the control , an agent (Opus 4.8) writes a new file for a repository on its own. In the treatment , the same agent does the same task but is handed VibeDrift's distilled view of the repo's dominant conventions. The output of both arms is then scored by an independent drift pass rather than by the signal that produced the guidance, so the test can't mark its own homework, and a lower score means less drift, which means the new file conformed better to the codebase. The knob that turned out to matter most is sampleCap, which is how many real files from the repo the agent gets to see as raw context. At cap=0it sees none, so its only signal about the repo's conventions is VibeDrift's, while higher caps let the agent read the neighbors and infer the convention itself. We crossed that against whether the repo's convention fights the model's default or matches it, and ran ten trials per cell on the adversarial repo and eight on the controls, with a 95% confidence interval over per-task deltas for every cell. Result 1: the null comes first On a repo whose conventions are what Opus writes anyway, like async/await, named exports, and throw-on-error, the signal changed nothing measurable. The delta sat around 0.03 to 0.05, and the 95% interval straddles zero at every context level, because the agent conforms for free and a hint it was already going to follow is redundant. That null is load-bearing, because a guardrail that fires when the model is already right isn't helping, it's theater. If our eval couldn't produce a clean zero here, you shouldn't trust any number it produces elsewhere. Result 2: the adversarial case So we built the case where the model is wrong by default. We used a repo whose house style is .then() chains, which is the opposite of what a modern model reaches for, and we removed the raw files entirely at cap=0, so the only thing telling the agent about the convention is VibeDrift. // control (agent alone): its default, which is drift in THIS repo export async function getSubscription(id) { const row = await db.query("SELECT * FROM subscriptions WHERE id = $1", [id]) return row ? enrich(row) : null

// treatment (agent + VibeDrift, "this repo uses .then() chains") export function getSubscription(id) { return db.subscriptions.findById(id).then((row) => enrich(row)) }Drift dropped from 1.84 to 1.00 , a delta of 0.84 with a 95% CI of [0.57, 1.11] . The interval clears zero, so it's signal rather than luck, and it replicated the earlier small-sample estimate across fifty runs. The signal pulled the agent off its own prior and onto the repo's convention, which is the one thing a stateless model can't do for itself. Result 3: now let the agent see the files This is the part that turned a slogan into an honest finding. If you keep the adversarial .then() repo but let the agent see two to four real sibling files at cap=2 and cap=4, the delta collapses to exactly zero . The reason is that the control arm, now able to read the neighbors, already writes perfect .then() chains by copying them.VibeDrift's distilled hint has nothing left to add, because the convention was inferable from context and so the agent inferred it. The residual drift both arms share at that point is a different problem, the structural duplication of the sibling shape, which a pattern hint was never trying to fix in the first place. The whole matrix ConditionConventioncapncontroltreatmentdelta95% CIthen-chain, no contextfights default05×101.841.000.84[0.57, 1.11]then-chain, some contextfights default25×102.002.000.00[0.00, 0.00]then-chain, real contextfights default45×102.002.000.00[0.00, 0.00]default repo, no contextmatches default010×81.051.000.05[-0.05, 0.15]default repo, real contextmatches default310×81.811.790.03[-0.01, 0.06] There's one significant cell and four tight nulls, and the significant cell is exactly the one theory predicts, where the convention fights the default and isn't visible in context. The effect didn't show up anywhere it shouldn't have, which is the strongest evidence that the one place it did show up is real. The shape The honest takeaway isn't a multiplier, it's a function, because VibeDrift's distilled-guidance benefit scales with how...

We measured whether a drift-checker changes what an AI coding agent writes

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs