Value numbering | Max Bernstein
home<br>blog<br>microblog<br>favorites<br>pl resources<br>bread<br>recipes<br>rss
Value numbering
April 4, 2026
Welcome back to compiler land. Today we’re going to talk about value<br>numbering, which is like SSA, but more.
Static single assignment (SSA) gives names to values: every expression has a<br>name, and each name corresponds to exactly one expression. It transforms<br>programs like this:
x = 0<br>x = x + 1<br>x = x + 1
where the variable x is assigned more than once in the program text, into<br>programs like this:
v0 = 0<br>v1 = v0 + 1<br>v2 = v1 + 1
where each assignment to x has been replaced with an assignment to a new<br>fresh name.
It’s great because it makes clear the differences between the two x + 1<br>expressions. Though they textually look similar, they compute different values.<br>The first computes 1 and the second computes 2. In this example, it is not<br>possible to substitute in a variable and re-use the value of x + 1, because<br>the xs are different.
But what if we see two “textually” identical instructions in SSA? That sounds<br>much more promising than non-SSA because the transformation into SSA form has<br>removed (much of) the statefulness of it all. When can we re-use the result?
Identifying instructions that are known at compile-time to always produce the<br>same value at run-time is called value numbering.
Eliminating common subexpressions
To understand value numbering, let’s extend the above IR snippet with two more<br>instructions, v3 and v4.
v0 = 0<br>v1 = v0 + 1<br>v2 = v1 + 1<br>v3 = v0 + 1 # new<br>v4 = do_something(v2, v3) # new
In this new snippet, v3 looks the same as v1: adding v0 and 1. Assuming our<br>addition operation is some ideal mathematical addition, we can absolutely<br>re-use v1; no need to compute the addition again. We can rewrite the IR to<br>something like:
v0 = 0<br>v1 = v0 + 1<br>v2 = v1 + 1<br>v3 = v1<br>v4 = do_something(v2, v3)
This is kind of similar to the destructive union-find representation that<br>JavaScriptCore and a couple other compilers use, where the optimizer doesn’t<br>eagerly re-write all uses but instead leaves a little breadcrumb<br>Identity/Assign instruction1.
We could then run our copy propagation pass (“union-find cleanup”?) and get:
v0 = 0<br>v1 = v0 + 1<br>v2 = v1 + 1<br>v4 = do_something(v2, v1)
Great. But how does this happen? How does an optimizer identify reusable<br>instruction candidates that are “textually identical”? Generally, there is no<br>actual text in the<br>IR.
One popular solution is to compute a hash of each instruction. Then any<br>instructions with the same hash (that also compare equal, in case of<br>collisions) are considered equivalent. This is called hash-consing.
When trying to figure all this out, I read through a couple of different<br>implementations. I particularly like the Maxine VM implementation.<br>For example, here is the valueNumber (hashing) and valueEqual<br>functions for most binary operations, slightly modified for clarity:
public abstract class Instruction extends Value { ... }
// The base class for binary operations<br>public abstract class Op2 extends Instruction {<br>// Each binary operation has an opcode and two opearands<br>public final int opcode; // (IMUL, IADD, ...)<br>Value x;<br>Value y;
@Override<br>public int valueNumber() {<br>// There are other fields but only opcode, and operands get hashed.<br>// Always set at least one bit in case the hash wraps to zero.<br>return 0x20000000<br>| (opcode<br>+ 7 * System.identityHashCode(x)<br>+ 11 * System.identityHashCode(y));
@Override<br>public boolean valueEqual(Instruction i) {<br>if (i instanceof Op2) {<br>Op2 o = (Op2) i;<br>return opcode == o.opcode && x == o.x && y == o.y;<br>return false;
The rest of the value numbering implementation assumes that if a valueNumber<br>function returns 0, it does not wish to be considered for value<br>numbering. Why might an instruction opt-out of value numbering?
Pure vs impure
An instruction might opt out of value numbering if it is not “pure”.
Some instructions are not pure. Purity is in the eye of the beholder, but in<br>general it means that an instruction does not interact with the state of the<br>outside world, except for trivial computation on its operands. (What does it<br>mean to de-duplicate/cache/reuse printf?)
A load from an array object is also not a pure operation2. The load operation<br>implicitly relies on the state of the memory. Also, even if the array was<br>known-constant, in some runtime<br>systems, the load might raise an exception. Changing the source location where<br>an exception is raised is generally frowned upon. Languages such as Java often<br>have requirements about where exceptions are raised codified in their<br>specifications.
We’ll work only on pure operations for now, but we’ll come back to this later.<br>We do often want to optimize impure operations as well!
We’ll start off with the simplest form of value numbering, which operates only<br>on linear sequences of instructions, like basic blocks or traces.
Local value numbering
Let’s build a small implementation of local value numbering (LVN). We’ll start with<br>straight-line...