GameRoy: JIT Compilation in High-Accuracy Game Boy Emulation (2023)

tosh1 pts0 comments

GameRoy: JIT compilation in High-Accuracy Game Boy Emulation | Rodrigodd

Over the past two years, I have spent a lot of time working on my Game Boy<br>emulator, GameRoy. It has reached a good point, where it has a GUI interface (including a<br>debugger and disassembler) and passes numerous tests (comparable to some of the<br>most accurate emulators). I even made a port to Android!

However, there’s always been something that I wanted to do, even before<br>developing this emulator: implementing a dynamic recompiler, also known as a JIT<br>compiler. Since the first time I researched how an emulator works, one of<br>the initial descriptions I found was about the distinction between interpreters and<br>dynamic recompilers. This made me very interested in the subject.

Reinspired by JIT compilers (such as those in Java, JavaScript and LuaJIT), by<br>Dolphin’s development posts, and primarily by this blog post by Brook<br>Heisler regarding a JIT compiler for NES, I began implementing one in my<br>emulator.

One intriguing point in Brook Heisler’s post was a limitation in how interrupts<br>were handled in the compiled code. When developing his JIT compiler, he<br>encountered a trade-off between performance and precision in how to handle<br>interrupts in his emulator. He ultimately decided to sacrifice performance for<br>precision, but I was not satisfied with that approach.

After pondering for some time, I came up with a solution to this problem - a way<br>to achieve maximum performance without sacrificing precision.

In this blog post, I will describe the process and considerations of<br>implementing a JIT compiler in my emulator, and how I solved the problem of<br>handling interrupts.

It’s worth noting that I am presenting the topics in the same order they came up<br>in my thought process, which just happens to be the reverse order in which they<br>were implemented (excluding the interpreter).

Interpreter x Just-In-Time compilation&nbsp§

An emulator has the job of reproducing a computer system’s behavior in software.<br>A computer system, including the Game Boy, consists of many components, each<br>requiring emulation. Among these, the CPU stands out as the most crucial<br>component, responsible for directly executing the instructions of the programs<br>that you intend to run on your emulator.

There are two main ways of emulating the CPU (or any programmable component).

A very basic interpreter&nbsp§

The simplest way to emulate a system’s CPU is by implementing an interpreter.<br>Essentially, the interpreter reads instructions from the emulated memory,<br>decodes them, and emulates their execution by updating the emulator’s state.

For the Game Boy, the instruction to be executed is read byte by byte (since the<br>Game Boy is an 8-bit system) from the memory address pointed by the program<br>counter (PC), incrementing the PC after each read. To decode the<br>instruction only the first byte is needed for most instructions. Since there are<br>only 255 possible values for a byte, we can use a look-up table. The Game<br>Boy CPU also has 0xCB prefixed instructions that are decoded from the second<br>byte, using another look-up table.

I am using Rust for my emulator implementation, so I can use a match statement<br>to decode the instruction. Each branch of the match lead to a function that<br>implements the instruction’s execution. This function reads and writes to<br>registers or memory, reading additional bytes from the instructions when<br>necessary (e.g., for immediate values, like the address of jump instructions).

A very simplified interpreter would be something like this:

fn interpret_instruction(gb: &mut GameBoy) {<br>let opcode = gb.read(gb.cpu.pc);<br>gb.cpu.pc += 1;<br>match opcode {<br>0x00 => nop(gb),<br>0x01 => load_bc_im16(gb),<br>0x02 => load_bc_a(gb),<br>0x03 => inc_bc(gb),<br>0x04 => inc_b(gb),<br>// 252 more branches like that...

// ...

// INC B 1:4 Z 0 H -<br>fn inc_b(gb: &mut GameBoy) {<br>gb.cpu.b += 1; // increment B<br>gb.cpu.f &= 0x1F; // clear Z, N, H flags<br>if gb.cpu.b == 0 { gb.cpu.f |= 0x80; } // set Z flag on zero<br>if gb.cpu.b & 0xF == 0 { gb.cpu.f |= 0x80; } // set H flag on half carry

The interpret_instruction function is invoked in a loop, executing one<br>instruction per iteration.

One issue with this approach is that the speed at which the interpreter can<br>execute the code is far behind the theoretical speed that your computer’s CPU<br>could execute the code.

The interpreter requires multiple instructions of the host CPU, not only<br>to execute the instruction, but also to read from memory and decode it, task<br>that the host CPU does in a single instruction. Not only that, the interpreter<br>needs to decode the same instruction multiple times.

But that isn’t a problem, at least not for the Game Boy, whose CPU runs at a<br>clock speed of just a little over 1 MHz, while you might be reading this on<br>a device with much more than 1 GHz. In fact, GameRoy can run Game Boy games<br>at over 60 times the original system’s speed on my notebook with a 2.6 GHz<br>Intel i5-4200U.

Even so, this is still 41 times slower than the theoretical...

emulator game interpreter instruction instructions system

Related Articles