I audited 162 agent-written PRs – 27% were the AI fixing itself

aimattb2 pts0 comments

GitHub - commensa-ai/commensa-audit: What % of your AI engineering effort went to fixing the AI's own work? One-page rework report from git history. Read-only, local-first. · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

commensa-ai

commensa-audit

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>11 Commits<br>11 Commits

commensa_audit

commensa_audit

quality

quality

reference

reference

reviews

reviews

sweep

sweep

tests

tests

.gitignore

.gitignore

BUILD_LOG.md

BUILD_LOG.md

LICENSE

LICENSE

PICKUP.md

PICKUP.md

README.md

README.md

SPEC.md

SPEC.md

pyproject.toml

pyproject.toml

View all files

Repository files navigation

commensa-audit

What % of your AI engineering effort went to fixing your AI's own work?

commensa-audit answers that from your git history. Point it at a GitHub repo; get a one-page report:

Rework tax — share of PRs (and changed lines) that corrected earlier work, vs. net-new value

Superseded work — PRs whose output was entirely replaced later (shown separately — discarded ≠ correcting)

Abandoned attempts — PRs closed without merging: the waste merge-based metrics never see

Churn clusters — chains of PRs rewriting each other ("it took 10 PRs to get dark mode right")

Line survival — how much merged code is still alive at the end of the window

Hotspots — rework share by module, against the repo-wide rate

Agent-marked share — "at least X% of PRs carry agent markers" (Co-Authored-By trailers, body signatures) — a stated lower bound, never an attribution claim

We built it because we needed it: our own agent-built product shipped 162 PRs in 13 days, and the audit showed 27% of them were the AI correcting itself .

Install & run

pip install commensa-audit<br>commensa-audit --repo owner/name --token $GH_TOKEN

Or straight from source:

pip install git+https://github.com/commensa-ai/commensa-audit

Output: report_.html (self-contained, forwardable), audit_.json (raw numbers), units.csv (per-PR data).

Scoping large repos

By default the audit covers the newest 500 PRs — a safety cap so a naive run on a huge repo stays fast and bounded. When it truncates, the run prints a notice telling you how to raise it. Two optional flags control the window (both newest-first):

commensa-audit --repo owner/name --since 2026-03-14 --max-prs 150

--since YYYY-MM-DD — only PRs created on/after this UTC date

--max-prs N — cap to the N newest PRs (default 500; use --max-prs 0 for no cap)

Both early-stop pagination, so --max-prs 150 costs ~150 PRs' worth of API calls, not the repo's entire history. Run with no flags on a repo under 500 PRs and you get everything, exactly as before.

Privacy, by architecture

Read-only. GET requests only; a token with read scope is sufficient.

Local-first. Everything runs and stays on your machine. No telemetry, no phone-home, nothing leaves your network.

Inspectable. Pure Python, stdlib + requests + jinja2. Read every line before you run it.

How classification works (and its honest limits)

Every PR is classified by a transparent signal cascade — explicit corrective titles/reverts → self-correction (a PR predominantly undoing lines added in the prior N days) → churn-cluster membership → otherwise generative. Every classification in the output carries the signal that fired and a human-readable why. Thresholds live in one config block; tune them and re-run offline with --reuse.

Known limits (also printed in the report footer): classification is heuristic; squash merges blur attribution; survival windows mean young repos read optimistic; agent-marked share is a lower bound — absence of a marker is not evidence of human authorship. We grade our own certainty rather than fake precision — that's the whole point of the project.

Why "rework tax"?

Agent-era teams measure activity — PRs merged, lines shipped, velocity. None of that distinguishes progress from cleanup. The rework tax does: it's the share of motion that was correction, the closest git-only...

commensa audit repo agent read rework

Related Articles