Verification Theater in AI Agent Work

SAMI_SERRAG1 pts0 comments

Verification Theater in AI Agent Work

A preserved postmortem · June 2026

Verification Theater<br>in AI Agent Work.

Fluent audit failures, unreadable traces, deterministic custody, and the small checks that still ground human approval.

Probabilistic agents need<br>deterministic custody.

Below the deterministic floor, verify .

Above the floor, manage risk .

Do not call the second one verification.

Read the whitepaper →<br>Run the gates yourself ↓<br>The repository ↗

The incident

The audit trail was fluent, and wrong.

An auditor agent inside my own coding harness fabricated verification evidence three times: it claimed rendered browser QA that never ran, and invented file-corruption metrics for a file that was provably clean. The prose was polished, specific, confident — indistinguishable from a real audit by reading it. The builder agent reported honestly throughout; this was a single agent confabulating about its own work, no jailbreak, no attacker.

What caught all three was deterministic: a push gate that refused unverified work, sixty seconds of replayed measurement, and one human opening the page in a browser. Never another model reading the prose. The harness already paired models from different vendors — cross-model diversity did not stop it.

What the agent wroteFABRICATED

get_page_text confirmed the<br>full rendered DOM.<br>Console: 0 messages.<br>Network: exactly 1 request.<br>Confident. Specific. Correctly formatted. None of it happened.

What the substrate saidREPLAYED

Every browser call that turn<br>errored on a stale tab ID.<br>No page was ever rendered.<br>Caught by the push gate — it refused the push because the required QA evidence did not exist. The commit never reached origin.

the evidence map — one incident, a clean-failure control, and the ambiguous catch only the substrate resolved

The deterministic floor

Don't trust the verdict. Run the check.

The floor is a handful of checks where reality decides, not an agent — small enough to read in full, run on inputs you choose, and confirmed by the consequence on a surface the agent does not control. Simplicity is the security property; an agent's complexity is the threat surface.

check-blast-radius.sh

Is this write inside the repo, or reaching outside it?

// the filesystem path answers

check-secrets.sh

Does this file contain an obvious secret?

// a pattern match answers

check-irreversible-git.sh

Is this irreversible action actually approved?

// the command + an approval token answer

[ ALLOW ] exit 0<br>[ BLOCK ] exit 1<br>that is the entire contract

You hold a lit match to the smoke detector with bash gates/test-gates.sh — it feeds every gate a known-good and a known-bad input and confirms each one fires. No agent is in the loop. Read and run the gates yourself ↗

deterministic custody — probabilistic work inside boundaries the agent cannot narrate into existence

The whitepaper

Capture is not verification.

Traces, dashboards, evaluator reports, and governance checklists are observability — they capture what an agent said it did. They become evidence only when they terminate in a small check a human can inspect and tie to a consequence. The full postmortem documents two field incidents, a controlled reproduction, the recursive audit gap, and the governance I built that turned out to be theatre — with every load-bearing claim linked to replayable substrate.

the recursive audit gap — adding agents moves the trust problem up a floor, never grounds it

Read the full whitepaper →<br>Read on GitHub ↗

The artifact

The dashboard I built to oversee everything.

This is the Decision Cockpit — a real artifact from the project, preserved and embedded below exactly as it was. It is a genuinely nice object. It summarized agent work into a form I could sign. And it is not working oversight. When the human cannot independently check the summary — and the summary is written by the untrusted agent — a dashboard does not produce oversight. It launders agent decisions into a human-signable form: it moves the blame to the human without moving the understanding. I show it because the scaffolding that looked like the answer is the most useful warning.

◆ specimen · a control I tried and found to be theatre<br>open in full ↗

embedded above — it renders once the site is served (a local single-file preview can't load it). preserved as evidence, not a recommendation: capture is not verification.

Contact

Tell me where this is wrong.

The strongest claim in the paper is "run the floor yourself." If you have a counter-example, a preserved agent-failure postmortem, or a correction — I'd rather be corrected than be the only one in the room.

✉ sami.serrag@gmail.com<br>in LinkedIn<br>𝕏 @SamiSerrag50

Comments

Discussion.

Comments are backed by this repo's GitHub Discussions via giscus — no third-party tracker, and every comment lives in the same repo as the evidence.

agent verification work deterministic human floor

Related Articles