WATaBoy: JIT-Ing Game Boy Instructions to WASM Beats a Native Interpreter

energeticbark2 pts0 comments

WATaBoy: JIT-ing Game Boy Instructions to Wasm Beats a Native Interpreter

Background

This text assumes the reader is familiar with the concept of just-in-time compilation.<br>Dolphin isn’t on iOS, because you can’t do JIT compilation on iOS. That’s a quick summary of OatmealDome’s blog post “Why Dolphin Isn’t Coming to the App Store”. Ever since reading that, I’ve wondered what it would take to get a CPU-bound emulator like Dolphin working on iOS. Do we just... have to wait a few years for iPhone CPUs to get fast enough to run Dolphin with an interpreter?

Well, Apple has one exception to its JIT restrictions: web browsers. JavaScriptCore, WebKit’s JS engine, uses JIT compilation for its higher-performance tiers. So, if a JS function is called enough times, eventually it’ll be optimised and compiled into native machine code. The same is true for WebAssembly.

So, what if we just piggyback off of this? Instead of generating native machine code directly, we could just generate Wasm bytecode, which will eventually be compiled to native machine code by the web browser. After reading Andy Wingo's blog post "just-in-time code generation within webassembly", I knew such a thing would be possible. In fact, a handful of projects already use this technique, namely The Jiterpreter and v86, but at the time of writing, no emulators for game consoles have used it, and nobody has compared the performance to an interpreter running natively to see if it's faster.

So, for my undergraduate final-year project, I decided I’d build a Game Boy emulator, first using an interpreter, and then using a JIT-to-Wasm. This project primarily serves as a proof of concept and benchmark to compare the performance of each approach. For the rest of this blog post, I'll call this a “JIT-to-Wasm” instead of a “Wasm JIT” to avoid confusion with what the JS engine itself does (recompile Wasm to machine code).

Screenshot of WATaBoy, a Game Boy emulator that compiles SM83 to Wasm

Anyone reading this who knows a bit about emulation just rolled their eyes, because how the hell is a Game Boy emulator going to benefit from JIT compilation? Luckily, GameRoy’s blog post describes exactly how it’s possible while remaining cycle-accurate:

predict when interrupts are going to occur

whenever a JIT block might be interrupted, fall back to an interpreter

lazily evaluate any non-CPU Game Boy components accessed via MMIO

GameRoy’s JIT only targets x86, but nearly all of its optimisation techniques still apply to our JIT-to-Wasm. Definitely check it out if you’re interested in the nitty-gritty details of the Game Boy emulation side of things; it was a huge inspiration.

Still, a Game Boy emulator doesn't benefit from JIT compilation as much as, say, a sixth-gen console. But it was much faster to make, and actually fit within the scope of my final-year project.

Implementation

Now, to narrow the scope of this blog post, I’ll take you through the most broadly applicable part of WATaBoy that I couldn't find a guide for anywhere else: Wasm codegen and late-linking from within Rust . A lot makes WATaBoy interesting, specifically from a Game Boy emulation perspective (e.g., SIMD tile rendering), but those implementation details deserve separate write-ups (you can also just read WATaBoy’s source, of course). If you aren’t interested, skip to the results.

Normally we'd usually reach for tools like wasm-bindgen and wasm-pack to generate glue code between Rust and JavaScript. But those tools cause some ergonomics issues when working with Wasm at a low level. Instead, I use an approach similar to the one described in ”Rust to WebAssembly the hard way”. This just means we'll pass data across the Rust-JS boundary via the C ABI, using pointers and buffer lengths instead of JavaScript objects.

Just a heads up, you’ll need Nightly Rust, because we'll use a tiny bit of inline Wasm later. So run:

rustup default nightly<br>To switch back, just run this again but swap ‘nightly’ for ‘stable’.

Create a new library:

cargo new --lib jit-to-wasm

Hey look, we've already got some code here:

pub fn add(left: u64, right: u64) -> u64 {<br>left + right<br>For our simple example, let’s try producing some Wasm bytecode at runtime that does the same thing.

Wasm code generation

The wasm-encoder crate will be our only dependency. With it, we can emit the bytes for Wasm instructions using a sort of builder pattern. It wasn’t designed for our JIT use case, so there are some ergonomics issues and a tiny bit of boilerplate, but it definitely beats writing an array of raw bytes by hand. :)

[package]<br>name = "jit-to-wasm"<br>version = "0.1.0"<br>edition = "2024"

[lib]<br># Required to produce a .wasm file.<br>crate-type = ["cdylib"]

[dependencies]<br>wasm-encoder = "0.252.0"

Now, let’s use it to produce the bytecode for a Wasm module containing an ‘add’ function. Here comes that boilerplate I mentioned:

use wasm_encoder::*;

fn make_add_module() -> Vecu8> {<br>let mut module = Module::new();

// Encode the...

wasm game code wataboy interpreter native

Related Articles