HartBreaker: Deterministic Fuzzing of Multi-Hart RISC-V CPUs with Non-Deterministic Programs – Computer Security Group
Skip to content
HartBreaker is the first general-purpose hardware fuzzer that tests the communication channels of multi-hart RISC-V CPUs, including shared memory and inter-processor interrupts. To make this possible, HartBreaker addresses a fundamental obstacle that has so far kept hardware fuzzing confined to single-core designs: the execution of a multi-hart program is inherently non-deterministic, so the deterministic golden-model comparison on which existing fuzzers rely does not work. HartBreaker overcomes this through a new technique we call determinism anchors, which allow non-deterministic behavior to occur freely at the instruction level while guaranteeing that the program as a whole still completes in a predictable state on correct hardware. Applied to five well-tested open-source RISC-V CPUs (Rocket, BOOM, Toooba, NaxRiscv, and XiangShan), HartBreaker discovered five previously unknown concurrency bugs.
HartBreaker is fully open-source and available on GitHub, with a user-friendly setup and extensive documentation. It is also permanently archived on Zenodo.
What is multi-hart fuzzing, and why is it hard?
A hart (hardware thread) is an independent execution unit in a RISC-V CPU. Modern processors contain several harts that communicate through shared memory and inter-processor interrupts (IPIs). Implementing these communication channels in hardware correctly is notoriously hard, because the underlying memory consistency model (RVWMO in RISC-V) deliberately allows many subtle reorderings of memory operations, and because interrupts arrive at times that the receiving hart cannot predict.
Hardware fuzzing has emerged as one of the most effective techniques for finding hardware bugs in single-core CPUs. The idea is simple: generate random programs, run them on the design under test and on a reference model in parallel, and flag any divergence as a candidate bug. This works because the program, the reference model, and the correctly implemented hardware all produce the same final state. Multi-hart execution breaks this assumption. The same program, run twice on the same correct CPU, can legitimately produce different intermediate states depending on how memory operations interleave or when an interrupt happens to fire. There is no single "expected" trace to compare against. Extending existing CPU fuzzers to this setting is therefore non-trivial, and until now it had not been done.
Limitations of current multi-hart testing
Two main approaches have been adopted to verify multi-hart CPUs correctness, but both fall short:
Litmus tests : Litmus tests are small concurrent programs that target specific scenarios in a memory consistency model. They are precise, but they are small: only usually 2-4 memory operations, and their instruction diversity is extremely narrow. Compared to programs generated by state-of-the-art single-hart fuzzers, litmus tests cover only a small fraction of the available instructions: no exceptions, floating-point operations, privilege switches, and only a thin slice of ALU and branch instructions. As a result, litmus tests rarely reach the complex microarchitectural conditions under which real bugs hide.
Formal methods : Formal approaches exhaustively explore microarchitectural states, but they require a manually constructed abstract model of the hardware for complex CPUs with caches and out-of-order execution, which is difficult to derive.
A fuzzer that combines the instruction diversity of modern hardware fuzzers with a verification strategy that can handle non-determinism has been missing: this is the gap HartBreaker fills.
Sources of Non-Determinism
To handle non-determinism, we first need to understand where it comes from. We identify two root causes:
Data-flow non-determinism : A load returns a value that depends on the interleaving of concurrent stores from other harts. The immediate control flow is unaffected, but the data in a register becomes unpredictable.
Control-flow non-determinism : An asynchronous event diverts execution to a trap handler at an instant that the receiving hart cannot anticipate. The program counter, rather than just a data value, becomes unpredictable.
Modeling these effects exposes why fuzzing multi-hart CPUs is hard. Each non-deterministic instruction can lead to several possible successor states, so the number of execution paths through a program grows exponentially with the number of such instructions. A generator that tried to anticipate every legal outcome ahead of time would therefore cannot scale. Worse, an RTL simulator follows only one path per run, so all the paths the generator reasoned about but the simulator did not take become dead code. Avoiding this path-explosion problem, avoiding it is what makes high-throughput multi-hart fuzzing possible.
Determinism Anchors
The central technique behind...