Scrutineer: scanning open source without flooding maintainers | Andrew Nesbitt
Scrutineer scans open source repositories for security vulnerabilities and then handles everything that follows: verifying each one, working out who to contact, drafting a fix, and tracking it through to a published advisory. I’ve been building it for Alpha-Omega for the past couple of months.
Large language models have made finding vulnerabilities in open source code much easier. Point one at a codebase and it turns up real bugs alongside invented ones, faster and cheaper than the fuzzers and scanners that came before, but the bottleneck hasn’t moved with it. Every finding still has to be read, confirmed, and fixed by a maintainer, whose time and attention is a finite resource the whole ecosystem depends on. Trying to secure everything by firing machine-generated reports at maintainers would burn out the people the effort relies on.
When I pointed a couple of AI scanners at curl back in May, most of the output collapsed against the project’s own disclosure policy, and the findings worth having were buried in the rest. Scrutineer is built so the volume a model can generate never lands directly on a maintainer.
You add a repo by URL, it runs a pipeline of skills against the code, and presents the results in a web UI for triage. It’s already in the hands of ecosystem security engineers and several of the teams Alpha-Omega funds, and between us a fair number of vulnerabilities have been found, reported, fixed, and shipped in a release with its help.
How a scan runs
Every scan is a skill on disk: a SKILL.md file, a JSON schema for its output, and any scripts it needs. When you add a repo the triage skill runs first and enqueues the rest of the pipeline in parallel. What comes back is a set of structured findings, each carrying a severity, a CWE, a location linked back to the source line, the affected versions, and a six-step trace of how it was reached.
Because skills are just files in a directory, changing what runs is editing markdown rather than recompiling a scanner. The default set lives in skills/, the triage skill’s SKILL.md lists what to trigger, and dropping a new directory in adds a scan type with no code changes.
The skills
Each skill is a directory in the skills folder on GitHub. triage runs first and gathers the context the audit feeds on, and a supporting cast of static-analysis, dedup, and export skills fills in around the edges. The ones that shape how the tool works:
security-deep-dive is the model-backed audit that produces the findings, and by a wide margin the skill that matters most; everything else either feeds it context or acts on what it returns. It runs in two phases. The first builds an inventory of every sink in the codebase, each place that executes code, shells out, or touches a path that could be hostile, without judging any of them yet. The second works through that inventory one entry at a time, tracing each sink back to a trust boundary and deciding whether hostile input can reach it. The inventory is part of the report rather than scratch work, so two runs against the same commit land on the same list. It audits the project’s own code, not its dependencies’ known CVEs: a finding counts only if the vulnerable logic lives in the repo.
threat-model derives the project’s security contract before any auditing happens: what it assumes about its callers, the properties it guarantees under those assumptions, what it leaves to the integrator, and which code is out of scope. Every claim is tagged documented, with a file and line or a closed issue behind it, or inferred, reasoned from the code and flagged for a human to confirm. It lifts whatever SECURITY.md already says about scope verbatim, so the model is a superset of the project’s own stated position rather than a competing one. The deep-dive loads this instead of re-deriving boundaries on every run, which keeps it on the parts of the code the project claims to defend.
maintainers works out who to contact about a disclosure and sorts the people it finds into active leads, regular maintainers, one-off contributors, and bots. It pulls commit history, issue and PR activity, and registry ownership from ecosyste.ms, and reads SECURITY.md and CODEOWNERS for a named security contact, rather than mailing whoever appears first in the git log. The output names a disclosure channel to go with the people: private vulnerability reporting where the repo has it enabled, a published contact where there is one.
patch proposes a fix for a confirmed finding as a unified diff against the scanned ref. It is held to a minimal change in place at the sink, matching the existing code style and reusing whatever sanitiser or validator the project already has, with a regression test when the suite makes one practical. If it cannot tell where the dangerous path diverges from legitimate use, it refuses rather than guess. A diff that parses, targets files that exist, and...