Verifying a Caliptra Boot-FSM Bug with Mununu

hasheddan1 pts0 comments

Verifying a Caliptra Boot-FSM Bug with mununu

Mariano's Substack

SubscribeSign in

Verifying a Caliptra Boot-FSM Bug with mununu

Mariano Cerrutti<br>May 24, 2026

Share

The design and the bug

Caliptra is an open-source silicon root-of-trust developed under the Open Compute Project, contributed to by Microsoft, AMD, Google, and NVIDIA, and intended to be reused as a hardened, auditable RoT block inside larger datacenter SoCs. Among its constituent modules is soc_ifc_boot_fsm.sv, the boot-time state machine in the SoC interface block, which sequences the chip from cold reset through fuse loading and into operation. It has a small, readable footprint — a handful of internal signals, a few reset-domain registers, and one typedef-enum carrying the FSM’s state.<br>The state encoding is declared in soc_ifc_pkg.sv like this:<br>typedef enum logic [2:0] {<br>BOOT_IDLE = 3'b000,<br>BOOT_FUSE = 3'b001,<br>BOOT_FW_RST = 3'b010,<br>BOOT_WAIT = 3'b011,<br>BOOT_DONE = 3'b100<br>} boot_fsm_state_e;<br>Five named values in a three-bit register. Synthesis arithmetic, however, does not care about typedefs — the register has eight admissible bit patterns, and the encodings 3'b101, 3'b110, 3'b111 are unnamed but reachable in principle (under simulation X-propagation, under post-place-and-route metastability on the reset deassertion edge, under cosmic-ray-induced single-event upsets, or simply under a synthesis pass that decides to repack the register). The buggy _pre_fix variant of the file dispatches on the state with a unique casez whose branches cover only the named five and which carries no default clause. The behaviour of the FSM when the register holds 5, 6, or 7 is therefore undefined: depending on the downstream tools, it may infer a latch, hold the previous state, or generate transition logic that lands somewhere arbitrary. This is CWE-1245 — “Improper Finite State Machines in Hardware Logic” — and it is one of the more common patterns auditors look for when reviewing security-critical RTL.<br>The fix is mechanical (add default: boot_fsm_ns ), but the interesting question is how you would find the issue, or — given that someone already pointed at it — how you would demonstrate end-to-end that the bug is reachable and the fix closes it.<br>Diagnostic alternatives

A static linter like SpyGlass, AscentLint, or verilator --lint-only will flag the syntactic pattern: case without default. That is useful and cheap, but the warning tells you that the construct is suspect, not whether the situation it warns about is reachable in the surrounding design, whether reset semantics expose it, or what the downstream consequences are. A team accumulating thousands of such warnings tends to triage them by hand, and uniqueness violations that are “obviously fine because reset always lands in IDLE” tend to be deprioritized until they aren’t.<br>Directed simulation can demonstrate the bug, but only if you can drive the FSM register into one of the offending encodings. RTL simulators usually do not let you do that without surgery on the source (forcing the register from the testbench, or instantiating a wrapper). SystemVerilog Assertions help: an assume property (boot_fsm_ns inside {…named values…}) paired with an assert that the FSM behaves correctly will surface the violation under a formal property checker, but you still need to run it through a formal tool.<br>Commercial formal tools — Jasper, Questa Formal, OneSpin — handle this kind of property well. They bit-blast the register, enumerate the reachable state space, and prove or disprove the assertion. They cost six figures per seat, lock you into a vendor flow, and the output is a counterexample trace in the vendor’s debugger. SBY, the open-source Yosys-driven verification framework, also handles it: write a .sby script, attach the right assume/assert SVA, point it at Z3 or Boolector, and read the verdict. SBY is excellent for what it does — bounded and unbounded model checking on a flat netlist — and it carries the same trade-off as the commercial tools: the verification language is SVA, the model lives inside the BMC engine, and abstraction is something you encode in extra assume statements rather than as a separable artifact.<br>mununu sits in a different corner of the design space. It is also open-source, also uses Yosys and BTOR2 as the front-end, and it also produces verdicts against a property language. The differences worth knowing are the language and the abstraction story. Properties are now written in a 3-valued modal mu-calculus rather than SVA, which means safety and liveness both compose into the same formula syntax and the verdict layer can return Kleene Bottom (”don’t know, refine”) rather than committing to a binary answer under unsound abstraction. Abstraction is declared in a sidecar JSON file that lives next to the SystemVerilog source — the RTL stays untouched, and the sidecar carries decisions like “treat this counter as {0, NONZERO}“ or “sample this register’s reset value nondeterministically.”...

register state reset source caliptra boot

Related Articles