The Control Plane Was the Point: Revisiting Autofz in the LLM Era

The Control Plane Was the Point: Revisiting autofz in the LLM Era

Preface

autofz is a meta-fuzzer: a runtime orchestrator for existing fuzzers. I developed it during the first few years of my PhD, and it was accepted to USENIX Security 2023. The paper and GitHub repository are both public.

A few years later, autofz is not one of the most cited fuzzing papers, but its control-plane framing has aged better than I expected.

I want to revisit autofz now because its core question feels more relevant than its original fuzzing context: when you have many imperfect workers, how should a system spend a fixed budget among them?

In 2023, the workers were fuzzers. Today, in CRS and LLM-agent systems, the workers may be fuzzers, static analyzers, code agents, patch generators, validators, or model variants. The surface has changed, but the control-plane question is similar: which worker should run, what evidence should be shared, when should the system switch direction, and when should it stop?

This matters more as security capability becomes cheaper and more widely available. I do not mean that security is solved, or that expert systems no longer matter. I mean something narrower: producing plausible bug candidates is becoming easier. The harder problem is turning noisy candidate generation into reliable evidence, reproducible PoVs, useful patches, and good budget decisions.

That is why I do not think the last decade of fuzzing research should be treated as obsolete. Even when the exact techniques are not reused literally, the field has accumulated hard-won lessons about cheap feedback, noisy evaluation, evidence sharing, and fixed-budget automation. autofz was one small version of that larger orchestration problem.

Why fuzzer selection is hard

The initial observation behind autofz was simple: no single fuzzer is always the best fuzzer. The paper made this concrete with four observations.

First, there is no universal fuzzer. Different fuzzers make different tradeoffs in mutation strategy, scheduling, instrumentation, seed management, and search pressure. In the paper’s motivating example, LearnAFL performed best on ffmpeg, but dropped to sixth place on exiv2. RedQueen also outperformed Radamsa by more than 10x on exiv2 under the same resource budget. That kind of target sensitivity is exactly why “just use the best fuzzer” is not a satisfying operational answer.

Second, the best fuzzer can change during the same campaign. We called this rank inversion. On exiv2, Angora made strong early progress, but LAF-Intel and RedQueen caught up after roughly two hours. Later, another inversion happened between LAF-Intel and RedQueen. A static decision made at the beginning would miss that change.

Third, equal resource allocation is wasteful. Collaborative fuzzing can improve over a single fuzzer by sharing seeds, but if every fuzzer receives the same CPU budget forever, the system still spends too much time on workers that are not currently useful.

Fourth, fuzzing randomness makes offline decisions fragile. Even if an expert finds a good combination for one benchmark run, that guidance may not reproduce for the next workload, the next seed corpus, or even another run of the same target. The selection burden does not disappear; it just moves into benchmark selection, training data, or manual tuning.

This was the practical problem autofz tried to remove. A user should be able to provide a pool of available fuzzers and let the system decide which ones deserve the current budget.

autofz compared with individual fuzzers such as AFL, AFLFast, Angora, QSYM, RedQueen, and others. The paper's point was not that individual fuzzers are weak; it was that a runtime composition layer can exploit whichever fuzzer is useful for the current target and phase.

How autofz works

autofz does not implement a new fuzzing algorithm. It runs existing fuzzers and adds a control plane above them.

The control loop has two phases. In the preparation phase, autofz gives baseline fuzzers a short, fair chance to run and observes their progress. Because fuzzers use different internal feedback, autofz maps their interesting inputs back to a common AFL bitmap view. That gives the orchestrator a unified way to compare runtime trends.

The second phase is the focus phase. autofz converts the observed trend into a resource allocation decision. If one fuzzer is clearly ahead, autofz can give the next budget window to that fuzzer. If several fuzzers look useful, it can distribute resources proportionally. Seeds are synchronized across fuzzers so that one worker’s discovery can become another worker’s starting point.

Then the system repeats. It does not assume that the best fuzzer for the previous window is still the best fuzzer for the next one.

Paper Figure 2: autofz alternates between preparation and focus phases. It measures fuzzer trends, converts them into resource allocation, and then spends the next window on the...

The Control Plane Was the Point: Revisiting Autofz in the LLM Era

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI