The Control Plane Was the Point: Revisiting autofz in the LLM Era
Preface
autofz is a meta-fuzzer: a runtime orchestrator for existing fuzzers. I<br>developed it during the first few years of my PhD, and it was accepted to<br>USENIX Security 2023. The paper and GitHub repository are both public.
A few years later, autofz is not one of the most cited fuzzing papers, but its<br>control-plane framing has aged better than I expected.
I want to revisit autofz now because its core question feels more relevant than<br>its original fuzzing context: when you have many imperfect workers, how should a<br>system spend a fixed budget among them?
In 2023, the workers were fuzzers. Today, in CRS and LLM-agent systems, the<br>workers may be fuzzers, static analyzers, code agents, patch generators,<br>validators, or model variants. The surface has changed, but the control-plane<br>question is similar: which worker should run, what evidence should be shared,<br>when should the system switch direction, and when should it stop?
This matters more as security capability becomes cheaper and more widely<br>available. I do not mean that security is solved, or that expert systems no<br>longer matter. I mean something narrower: producing plausible bug candidates is<br>becoming easier. The harder problem is turning noisy candidate generation into<br>reliable evidence, reproducible PoVs, useful patches, and good budget decisions.
That is why I do not think the last decade of fuzzing research should be treated<br>as obsolete. Even when the exact techniques are not reused literally, the field<br>has accumulated hard-won lessons about cheap feedback, noisy evaluation,<br>evidence sharing, and fixed-budget automation. autofz was one small version of<br>that larger orchestration problem.
Why fuzzer selection is hard
The initial observation behind autofz was simple: no single fuzzer is always the<br>best fuzzer. The paper made this concrete with four observations.
First, there is no universal fuzzer. Different fuzzers make different tradeoffs<br>in mutation strategy, scheduling, instrumentation, seed management, and search<br>pressure. In the paper’s motivating example, LearnAFL performed best on ffmpeg,<br>but dropped to sixth place on exiv2. RedQueen also outperformed Radamsa by more<br>than 10x on exiv2 under the same resource budget. That kind of target sensitivity<br>is exactly why “just use the best fuzzer” is not a satisfying operational<br>answer.
Second, the best fuzzer can change during the same campaign. We called this rank<br>inversion. On exiv2, Angora made strong early progress, but LAF-Intel and<br>RedQueen caught up after roughly two hours. Later, another inversion happened<br>between LAF-Intel and RedQueen. A static decision made at the beginning would<br>miss that change.
Third, equal resource allocation is wasteful. Collaborative fuzzing can improve<br>over a single fuzzer by sharing seeds, but if every fuzzer receives the same CPU<br>budget forever, the system still spends too much time on workers that are not<br>currently useful.
Fourth, fuzzing randomness makes offline decisions fragile. Even if an expert<br>finds a good combination for one benchmark run, that guidance may not reproduce<br>for the next workload, the next seed corpus, or even another run of the same<br>target. The selection burden does not disappear; it just moves into benchmark<br>selection, training data, or manual tuning.
This was the practical problem autofz tried to remove. A user should be able to<br>provide a pool of available fuzzers and let the system decide which ones deserve<br>the current budget.
autofz compared with individual fuzzers such as AFL, AFLFast, Angora, QSYM, RedQueen, and others. The paper's point was not that individual fuzzers are weak; it was that a runtime composition layer can exploit whichever fuzzer is useful for the current target and phase.
How autofz works
autofz does not implement a new fuzzing algorithm. It runs existing fuzzers and<br>adds a control plane above them.
The control loop has two phases. In the preparation phase, autofz gives baseline<br>fuzzers a short, fair chance to run and observes their progress. Because fuzzers<br>use different internal feedback, autofz maps their interesting inputs back to a<br>common AFL bitmap view. That gives the orchestrator a unified way to compare<br>runtime trends.
The second phase is the focus phase. autofz converts the observed trend into a<br>resource allocation decision. If one fuzzer is clearly ahead, autofz can give<br>the next budget window to that fuzzer. If several fuzzers look useful, it can<br>distribute resources proportionally. Seeds are synchronized across fuzzers so<br>that one worker’s discovery can become another worker’s starting point.
Then the system repeats. It does not assume that the best fuzzer for the previous<br>window is still the best fuzzer for the next one.
Paper Figure 2: autofz alternates between preparation and focus phases. It measures fuzzer trends, converts them into resource allocation, and then spends the next window on the...