Corpus Keeper – a zero-dependency auditor for the docs you point an AI at

forgedculture1 pts0 comments

GitHub - forgedculture/corpus-keeper · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

forgedculture

corpus-keeper

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>2 Commits<br>2 Commits

.github/workflows

.github/workflows

demo_corpus

demo_corpus

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

check.sh

check.sh

corpus_keeper.py

corpus_keeper.py

View all files

Repository files navigation

Corpus Keeper

Your AI is only as good as the folder you point it at. Corpus Keeper<br>keeps that folder honest.

Every team wiring an AI assistant into their documents is accruing the<br>same disease: context rot. The old price sheet that still looks<br>current. The plan that was cancelled in a meeting but not in writing.<br>Two docs that disagree about the same fact. The AI cannot tell which<br>one is true, so it picks one - with total confidence. The failure gets<br>blamed on the AI. The cause is the corpus.

Corpus Keeper is a zero-dependency auditor and governance scaffold for<br>any folder of documents that you, your team, or your AI treats as<br>ground truth.

Try it in sixty seconds

roadmap.md<br>FINDING [links] welcome.md: broken link -> guides/setup.md<br>FINDING [ascii] welcome.md: non-ASCII byte at offset 101<br>FINDING [index] CORPUS_INDEX.md: index entry has no file -> roadmap.md<br>FINDING [index] unlisted_note.md: file not listed in CORPUS_INDEX.md<br>FINDING [stale] old_plan.md: line 3 marked stale with no pointer to current truth<br>info [stale] unlisted_note.md: line 3 has open marker (TODO)<br>scanned 6 files: 6 findings, 1 info">git clone https://github.com/forgedculture/corpus-keeper<br>cd corpus-keeper<br>python3 corpus_keeper.py audit demo_corpus

FINDING [links] CORPUS_INDEX.md: broken link -> roadmap.md<br>FINDING [links] welcome.md: broken link -> guides/setup.md<br>FINDING [ascii] welcome.md: non-ASCII byte at offset 101<br>FINDING [index] CORPUS_INDEX.md: index entry has no file -> roadmap.md<br>FINDING [index] unlisted_note.md: file not listed in CORPUS_INDEX.md<br>FINDING [stale] old_plan.md: line 3 marked stale with no pointer to current truth<br>info [stale] unlisted_note.md: line 3 has open marker (TODO)<br>scanned 6 files: 6 findings, 1 info

You will see 6 findings: two broken links, a non-ASCII character, a<br>phantom index entry, an unindexed file, and a deprecated document with<br>no pointer to its replacement. Then open demo_corpus/pricing_2025.md<br>and demo_corpus/pricing_current.md: both claim to be quotable<br>pricing, and they disagree on every number. That seventh defect is the<br>kind a script cannot catch - see "The semantic layer" below.

What it checks

Mechanical rot, on every run: broken relative links, non-ASCII bytes<br>(optional), index drift (files missing from your index, index entries<br>pointing at nothing), and stale markers (SUPERSEDED, DEPRECATED) with<br>no pointer to current truth.

Exit codes are the contract: 0 clean, 1 findings, 2 error. It drops<br>into a script, cron job, or CI unchanged.

Bring your own folder under management

python3 corpus_keeper.py init /path/to/your/folder<br>python3 corpus_keeper.py audit /path/to/your/folder

init scaffolds the governance layer: GOVERNANCE.md (the rules),<br>CORPUS_INDEX.md (the map), a numbered decision-record template, and an<br>append-only decisions log. Existing files are never overwritten.

The rules in one breath: one current truth at a time, decisions get<br>records, logs are append-only, the index is the map, audit after every<br>edit.

The semantic layer

The auditor catches rot of form. Rot of meaning - contradictions<br>between documents, a stale doc that still looks current, a change<br>nobody recorded a decision for - needs a reader. The Corpus Keeper<br>kit wires your AI assistant to do that pass: a Claude skill, a<br>ChatGPT Custom GPT setup, and an AGENTS.md for Codex, Cursor, Gemini<br>CLI, and Copilot, plus governance templates and support.

The kit, and the methodology behind it, live at<br>forgedculture.com.

Requirements and license

Python 3.8+, standard library only. Apache 2.0 (see...

finding index corpus keeper stale files

Related Articles