Lunacy | Red Vice The virtue of laziness<br>Lunacy is a Lua 5.1 interpreter I’ve been working on as a side-project in Rust. Unlike the average Lua interpreter side-project, Lunacy does type specialization of bytecode operations via Lazy Basic Block Versioning (LBBV), plus includes a Just-in-Time compiler.<br>LBBV is a pretty cool compilation technique that came out of Maxime Chevalier-Boisvert’s research while at University of Montreal, culminating in her PhD thesis, around developing a JavaScript JIT, Higgs. Critically for me, her approach focuses a lot on simplicity: the goal of LBBV is that it is an effective compilation strategy that can be implemented by just one PhD student in a reasonable amount of time, or one engineer as a side-project.<br>The original LBBV paper is very approachable, and I highly recommend reading it. To summarize, it enables optimizing bytecode operations, such as ADD, which naively would need to branch on the types of its operands to figure out if it’s adding two numbers or two strings or what have you. Whenever the operation would inspect the type of an operand and check if it matches a type, it can statically be given a pass/fail based on a specialization context for the basic block. If the context doesn’t know the type of the operand, instead of needing to conservatively fall back to a dynamic operation which can handle any value, compilation can instead be suspended to a thunk: the thunk will eventually be forced by execution hitting it at runtime, at which point you can continue compiling with the context now filled with the runtime type of the value promoted to a static type, guarded by a runtime check to validate that later executions of the same codepath use the same type. In the failure case of the guard, you can suspend another thunk that will specialize the operation for the next runtime type you see.<br>This is very powerful! The end result is that you have blocks which seemingly by magic have type guards only for the types of values that actually were observed at runtime, and for things like loops “for free” unroll one iteration which contains all the type guards, and then have entirely known types inside the hot loop (because that block reached a fixpoint of the specialization context for the loop header, and so reuses the block to form a loop). Because you have the powerful primitive of blocks specialized for a static context based off observed runtime information, you can stuff whatever information in the static context you want and always have a correct “guess” for the value of speculative information. And if you ever end up with too many blocks, no matter what reason, you can fallback to running generic code instead of having exponential block blowup.
Comparison of Lunacy and Higgs specialized blocks for sum. Notice how the inner loop body, block 7, ends up with fully specialized ADD operations for the accumulator and loop induction variable. Lunacy doesn’t do small-integer optimization and so has no overflow checks.
Lunacy does a few things different from Higgs. For one, Higgs is a pure JIT: it has no interpreter, and immediately compiles all bytecode operations directly to assembly. Lunacy instead is implemented as an interpreter first and a JIT second: bytecode operations are implemented as Rust coroutines, which yield effects. Those effects may be things like “guard on stack slot X being type Y”, which are resumed with results like “the slot matches/fails the guard” - and like LBBV, if the type isn’t currently known, we instead suspend the coroutine inside a thunk via a closure. It may also yield an effect like “add operands A and B together”, which can be implemented so it only cares about how to add numbers since the coroutine would only reach the yield if the required guards matched.<br>Runtime behavior is emitted as separate residual operations, separate from the actual Lua bytecode operations which are implemented as coroutines that are driven at compile time. In order to avoid having to define and implement runtime behavior for every combination of operand type etc., Lunacy also uses a closure generating interpreter: complex runtime behavior, like adding two numbers, is instead implemented by a closure that captures the bytecode operands such as the stack slot indexes it uses, and at runtime we execute the closure in order to perform the operation. The interpreter repeatedly executes the residual operations that were the result of compiling blocks by running all of the coroutines for the Lua bytecode.<br>This approach means that the bytecode coroutines in Lunacy were very easy to implement. All of the behavior for an operation, including the different Residual::Exec variants, are located in a single coroutine which intermixes the static compilation portion via yielded effects and the runtime execution portion via closures which can capture arbitrary static information. Because Rust coroutines even implement Clone, when we suspend a coroutine to a thunk in order to...