Verification Theater in AI Agent Work

A preserved postmortem · June 2026

Verification Theater in AI Agent Work.

Fluent audit failures, unreadable traces, deterministic custody, and the small checks that still ground human approval.

Probabilistic agents need deterministic custody.

Below the deterministic floor, verify .

Above the floor, manage risk .

Do not call the second one verification.

Read the whitepaper → Run the gates yourself ↓ The repository ↗

The incident

The audit trail was fluent, and wrong.

An auditor agent inside my own coding harness fabricated verification evidence three times: it claimed rendered browser QA that never ran, and invented file-corruption metrics for a file that was provably clean. The prose was polished, specific, confident — indistinguishable from a real audit by reading it. The builder agent reported honestly throughout; this was a single agent confabulating about its own work, no jailbreak, no attacker.

What caught all three was deterministic: a push gate that refused unverified work, sixty seconds of replayed measurement, and one human opening the page in a browser. Never another model reading the prose. The harness already paired models from different vendors — cross-model diversity did not stop it.

What the agent wroteFABRICATED

get_page_text confirmed the full rendered DOM. Console: 0 messages. Network: exactly 1 request. Confident. Specific. Correctly formatted. None of it happened.

What the substrate saidREPLAYED

Every browser call that turn errored on a stale tab ID. No page was ever rendered. Caught by the push gate — it refused the push because the required QA evidence did not exist. The commit never reached origin.

the evidence map — one incident, a clean-failure control, and the ambiguous catch only the substrate resolved

The deterministic floor

Don't trust the verdict. Run the check.

The floor is a handful of checks where reality decides, not an agent — small enough to read in full, run on inputs you choose, and confirmed by the consequence on a surface the agent does not control. Simplicity is the security property; an agent's complexity is the threat surface.

check-blast-radius.sh

Is this write inside the repo, or reaching outside it?

// the filesystem path answers

check-secrets.sh

Does this file contain an obvious secret?

// a pattern match answers

check-irreversible-git.sh

Is this irreversible action actually approved?

// the command + an approval token answer

[ ALLOW ] exit 0 [ BLOCK ] exit 1 that is the entire contract

You hold a lit match to the smoke detector with bash gates/test-gates.sh — it feeds every gate a known-good and a known-bad input and confirms each one fires. No agent is in the loop. Read and run the gates yourself ↗

deterministic custody — probabilistic work inside boundaries the agent cannot narrate into existence

The whitepaper

Capture is not verification.

Traces, dashboards, evaluator reports, and governance checklists are observability — they capture what an agent said it did. They become evidence only when they terminate in a small check a human can inspect and tie to a consequence. The full postmortem documents two field incidents, a controlled reproduction, the recursive audit gap, and the governance I built that turned out to be theatre — with every load-bearing claim linked to replayable substrate.

the recursive audit gap — adding agents moves the trust problem up a floor, never grounds it

Read the full whitepaper → Read on GitHub ↗

The artifact

The dashboard I built to oversee everything.

This is the Decision Cockpit — a real artifact from the project, preserved and embedded below exactly as it was. It is a genuinely nice object. It summarized agent work into a form I could sign. And it is not working oversight. When the human cannot independently check the summary — and the summary is written by the untrusted agent — a dashboard does not produce oversight. It launders agent decisions into a human-signable form: it moves the blame to the human without moving the understanding. I show it because the scaffolding that looked like the answer is the most useful warning.

◆ specimen · a control I tried and found to be theatre open in full ↗

embedded above — it renders once the site is served (a local single-file preview can't load it). preserved as evidence, not a recommendation: capture is not verification.

Contact

Tell me where this is wrong.

The strongest claim in the paper is "run the floor yourself." If you have a counter-example, a preserved agent-failure postmortem, or a correction — I'd rather be corrected than be the only one in the room.

✉ sami.serrag@gmail.com in LinkedIn 𝕏 @SamiSerrag50

Comments

Discussion.

Comments are backed by this repo's GitHub Discussions via giscus — no third-party tracker, and every comment lives in the same repo as the evidence.

Verification Theater in AI Agent Work

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y