A new Register Allocator for ZJIT | Rails at Scale
We recently landed a new register allocator in ZJIT and I thought I’d write a post about it!
What is a register allocator?
Whenever a compiler generates machine code it needs to decide where to put values.<br>Those values usually take the shape of a variable in your function, though the compiler can also compute intermediate values as well.<br>When we need to perform a calculation on some value, the CPU needs to know how to find the value.
The CPU can typically only compute output based on inputs that are in registers, though some architectures (like x86) allow computations on values stored in memory.<br>That said, reading and writing to registers is much faster than memory, so it behooves the compiler to keep values in registers as long as possible.
Any particular function in your program could have tons of variables, but the number of available registers is finite, and architecture dependent.<br>This is where a register allocator comes in.<br>The register allocator looks at all of the variables, then figures out which registers they should go in, and if there aren’t enough registers figures out how to “spill” those variables to memory.
How does it work?
There are several well-known approaches to register allocation, each with different trade-offs between compile time and code quality.<br>For ZJIT, we chose to implement a linear scan register allocator based on a reduced version of Christian Wimmer’s paper titled “Linear Scan Register Allocation on SSA Form”.<br>The paper isn’t very long so I highly recommend giving it a read when you have time!<br>Alternatively, Max Bernstein wrote a blog post breaking down the paper as well.
ZJIT uses Static single-assignment form, or SSA form in its back end.<br>“SSA form” is a representation of code that only allows a variable to be assigned once.
For example, this Ruby code:
a = 123<br>a += 1
Would be represented in our backend Low-level Intermediate Representation (or “LIR”) kind of like this:
v1 = Const(123)<br>v2 = Const(1)<br>v3 = Add v1, v2<br>CRet v3
In the above pseudocode (it’s not exactly the same as the IR we use in the backend, but quite close), v1, v2, and v3 are all different variables.<br>No variables are allowed to be re-assigned like in the corresponding Ruby code.<br>The generated intermediate representation has exactly the same semantics as the Ruby code, but adds the restriction that a variable can only be written to once.<br>When ZJIT generates an intermediate representation of the Ruby code, it translates all variables to these numeric SSA variables and also uses SSA variables for temporary values.<br>In the example translation above we can think of v1 as being equivalent to a, v2 as a temporary variable with the value 1, and v3 as a new version of a that has been added with v2.
Variables can be written to once, but you can read from the variable as many times as you want.<br>We call the place where the variable was written the “definition” and the place where the variable is read a “use”.
Lifetimes
The first thing a register allocator needs to understand is when each value is alive and for how long.<br>We call this duration a “lifetime” or “live range”.<br>A value’s lifetime (or live range) starts at the point where it’s defined (the definition) and ends at the place it is last used (its “last use”).<br>If two values have overlapping lifetimes they can’t share the same register.<br>If there are more overlapping lifetimes than registers, then we know we need to spill a value to memory.
Since these ranges refer to the “lifetime” of the variable, it’s common to say that the variable “came to life” at its definition, and then “died” at its last use.
Let’s look at an example. Consider this Ruby method that is already in SSA form:
def add_twice(a, b, c)<br>d = a + b<br>e = d + c<br>end
Here are the lifetimes for each value:
# Instruction | a | b | c | d | e<br>-------------------+-----+-----+-----+-----+----<br>1 d = a + b | x | x | . | . |<br>2 e = d + c | | | x | x | .<br>3 return e | | | | | x
A . character means that the variable is alive at that instruction.<br>An x character means the variable dies, but is still used at that instruction.
We can also express these lifetimes as ranges. The live range for each value is [definition, last use]:
a: [0, 1]<br>b: [0, 1]<br>c: [0, 2]<br>d: [1, 2]<br>e: [2, 3]
Where instruction 0 represents the function entry (parameter definitions).
The parameters a, b, and c are all alive at instruction 1 because they were defined before the method body.
a and b are last used at instruction 1, so they die after that.
c stays alive through instruction 2 because that’s where it’s last used.
d is defined at instruction 1 and last used at instruction 2, so it’s alive for both.
e is defined at instruction 2 and last used at instruction 3.
At instruction 1, a and b die and d comes to life.<br>Since we know that a and b are never used again, we’re free to reuse one of their registers for d.<br>So we only need three registers at instruction 1: one each...