GitHub - evilsocket/audit: An 8-stage vulnerability-discovery agent. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
evilsocket
audit
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star<br>24
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>2 Commits<br>2 Commits
audit
audit
config
config
docs
docs
prompts
prompts
schemas
schemas
tests
tests
.env.example
.env.example
.gitignore
.gitignore
LICENSE
LICENSE
README.md
README.md
pyproject.toml
pyproject.toml
View all files
Repository files navigation
audit
An 8-stage vulnerability-discovery agent, driven by your Claude Pro / Max<br>subscription through the official Claude Code Agent SDK. Many narrow agents,<br>deliberate disagreement, and an explicit reachability gate.
MIT-licensed. No API key needed if you already use claude login.
Origin
This project is a from-scratch reimplementation of the pipeline described in<br>Cloudflare's Project Glasswing<br>post, which tested Anthropic's Mythos preview LLM against Cloudflare's own<br>codebase. The blog argues that real-world vulnerability discovery does not<br>come from asking one big model "find bugs here" — it comes from:
Many narrow agents working in parallel on tightly-scoped questions<br>("Look for command injection in this specific function, with this trust<br>boundary above it") rather than one exhaustive agent.
Deliberate disagreement — a second agent, on a different model, that<br>tries to disprove the first agent's findings.
A reachability trace as the gating step — most "is this code buggy?"<br>findings are noise unless an attacker-controlled input can actually reach<br>the sink from outside the system.
A feedback loop so reachable bugs in one place automatically seed<br>hunts for the same pattern elsewhere.
This repo packages that pipeline into a runnable agent. The Cloudflare post<br>showed the architecture; this codebase ships the prompts, schemas, state<br>store, and orchestrator.
The 8 stages
Diagram from Cloudflare's Project Glasswing post, reproduced here for reference.
Stage<br>Default model<br>Purpose
Recon<br>Opus 4.7<br>Map the repo, emit narrowly-scoped Hunt tasks
Hunt<br>Sonnet 4.6<br>One attack class per agent; compile/run PoCs
Validate<br>Opus 4.7<br>Adversarial re-read; tries to disprove (different model from Hunt)
Gapfill<br>Sonnet 4.6<br>Re-queue under-covered areas
Dedupe<br>Sonnet 4.6<br>Cluster findings by root cause
Trace<br>Opus 4.7<br>Prove attacker-controlled input reaches the sink
Feedback<br>Sonnet 4.6<br>Turn reachable traces into new Hunt tasks
Report<br>Sonnet 4.6<br>Schema-validated structured report
Each stage is one markdown prompt in prompts/ + one JSON Schema in<br>schemas/. The orchestrator passes the schema into the system prompt so<br>every output is shape-stable on the first try.
Quickstart
" > .env
# 3. Verify<br>audit auth-check
# 4. Run<br>audit run --repo /path/to/target --run-id my-run<br>audit status --run-id my-run<br>audit report --run-id my-run --format md > report.md"># 1. Install<br>python -m venv .venv && source .venv/bin/activate<br>pip install -e .
# 2. Auth (pick one)<br># (a) Already logged in via claude login? You're done.<br># (b) Or generate a 1-year OAuth token for CI / non-interactive use:<br>claude setup-token<br>echo "CLAUDE_CODE_OAUTH_TOKEN=" > .env
# 3. Verify<br>audit auth-check
# 4. Run<br>audit run --repo /path/to/target --run-id my-run<br>audit status --run-id my-run<br>audit report --run-id my-run --format md > report.md
The agent uses subscription billing via your Claude.ai login — it does<br>not call the metered API. The on-disk auth module scrubs<br>ANTHROPIC_API_KEY from the environment so it can't silently route around<br>the OAuth flow.
Cost containment
A real production codebase can produce 15-50 Hunt tasks and 25+ findings to<br>validate. At default concurrency this gets expensive. Flags to keep it sane:
audit run --repo /path/to/target \<br>--max-concurrency 1 \ # one claude subprocess at a time<br>--max-recon-tasks 15 \ # cap initial Hunt fanout<br>--max-cost-usd 30 # abort cleanly if exceeded
The budget guard fires between and within stages — a per-task check in<br>Hunt cooperatively...