GitHub - forgedculture/corpus-keeper · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
forgedculture
corpus-keeper
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>2 Commits<br>2 Commits
.github/workflows
.github/workflows
demo_corpus
demo_corpus
.gitignore
.gitignore
LICENSE
LICENSE
README.md
README.md
check.sh
check.sh
corpus_keeper.py
corpus_keeper.py
View all files
Repository files navigation
Corpus Keeper
Your AI is only as good as the folder you point it at. Corpus Keeper<br>keeps that folder honest.
Every team wiring an AI assistant into their documents is accruing the<br>same disease: context rot. The old price sheet that still looks<br>current. The plan that was cancelled in a meeting but not in writing.<br>Two docs that disagree about the same fact. The AI cannot tell which<br>one is true, so it picks one - with total confidence. The failure gets<br>blamed on the AI. The cause is the corpus.
Corpus Keeper is a zero-dependency auditor and governance scaffold for<br>any folder of documents that you, your team, or your AI treats as<br>ground truth.
Try it in sixty seconds
roadmap.md<br>FINDING [links] welcome.md: broken link -> guides/setup.md<br>FINDING [ascii] welcome.md: non-ASCII byte at offset 101<br>FINDING [index] CORPUS_INDEX.md: index entry has no file -> roadmap.md<br>FINDING [index] unlisted_note.md: file not listed in CORPUS_INDEX.md<br>FINDING [stale] old_plan.md: line 3 marked stale with no pointer to current truth<br>info [stale] unlisted_note.md: line 3 has open marker (TODO)<br>scanned 6 files: 6 findings, 1 info">git clone https://github.com/forgedculture/corpus-keeper<br>cd corpus-keeper<br>python3 corpus_keeper.py audit demo_corpus
FINDING [links] CORPUS_INDEX.md: broken link -> roadmap.md<br>FINDING [links] welcome.md: broken link -> guides/setup.md<br>FINDING [ascii] welcome.md: non-ASCII byte at offset 101<br>FINDING [index] CORPUS_INDEX.md: index entry has no file -> roadmap.md<br>FINDING [index] unlisted_note.md: file not listed in CORPUS_INDEX.md<br>FINDING [stale] old_plan.md: line 3 marked stale with no pointer to current truth<br>info [stale] unlisted_note.md: line 3 has open marker (TODO)<br>scanned 6 files: 6 findings, 1 info
You will see 6 findings: two broken links, a non-ASCII character, a<br>phantom index entry, an unindexed file, and a deprecated document with<br>no pointer to its replacement. Then open demo_corpus/pricing_2025.md<br>and demo_corpus/pricing_current.md: both claim to be quotable<br>pricing, and they disagree on every number. That seventh defect is the<br>kind a script cannot catch - see "The semantic layer" below.
What it checks
Mechanical rot, on every run: broken relative links, non-ASCII bytes<br>(optional), index drift (files missing from your index, index entries<br>pointing at nothing), and stale markers (SUPERSEDED, DEPRECATED) with<br>no pointer to current truth.
Exit codes are the contract: 0 clean, 1 findings, 2 error. It drops<br>into a script, cron job, or CI unchanged.
Bring your own folder under management
python3 corpus_keeper.py init /path/to/your/folder<br>python3 corpus_keeper.py audit /path/to/your/folder
init scaffolds the governance layer: GOVERNANCE.md (the rules),<br>CORPUS_INDEX.md (the map), a numbered decision-record template, and an<br>append-only decisions log. Existing files are never overwritten.
The rules in one breath: one current truth at a time, decisions get<br>records, logs are append-only, the index is the map, audit after every<br>edit.
The semantic layer
The auditor catches rot of form. Rot of meaning - contradictions<br>between documents, a stale doc that still looks current, a change<br>nobody recorded a decision for - needs a reader. The Corpus Keeper<br>kit wires your AI assistant to do that pass: a Claude skill, a<br>ChatGPT Custom GPT setup, and an AGENTS.md for Codex, Cursor, Gemini<br>CLI, and Copilot, plus governance templates and support.
The kit, and the methodology behind it, live at<br>forgedculture.com.
Requirements and license
Python 3.8+, standard library only. Apache 2.0 (see...