Theseus: Translating Win32 to WASM

ingve1 pts0 comments

Tech Notes: Theseus: translating win32 to wasm

Theseus: translating win32 to wasm

May 24, 2026

This post is part of a<br>series on Theseus, my win32/x86 emulator.

Theseus now can produce WebAssembly output, allowing it to translate a .exe<br>file into something that runs on the web.<br>Try it out here, but note it is full of bugs<br>(e.g. Minesweeper crashes if you win).

This was pretty straightforward to get working, with the exception of one major<br>detail that this post will go into.

The x86 emulation part of this is just recompiling the existing Theseus output<br>with a different CPU target. This is one of the main benefits of this binary<br>translation approach. The translated code is almost (with the exception of how<br>main gets invoked) wholly agnostic to the environment it eventually runs in.<br>In principle I now get optimized wasm compiler output for relatively free. The<br>main challenge was figuring out the code layout to get Cargo to cooperate with<br>my weird requirements.

The win32 part was changing things to abstract over a "Host" API that is able to<br>do things like fetch mouse events and render pixels. That is now implemented<br>once for SDL and once for the web. This was also relatively straight forward, at<br>least in my first pass.

So what was hard? It comes to a part of the design space I hadn't previously<br>explored well: whether the emulator is allowed to block.

To block or not to block

In retrowin32, the emulator was designed to be able to step through some<br>instructions and then return control to the caller. This is critical for the web<br>version in particular, where you cannot block the main thread. In my earlier<br>post "threading in two ways" I<br>went into some detail on the various tradeoffs on how I could emulate threads in<br>a browser, ultimately choosing a single thread.

This has its advantages, but is unsatisfying in a few important ways:

The main thread must repeatedly call into the emulator in a loop that yields<br>control back to the browser.

Any Windows API implementation that might transfer control to the emulator<br>must be made async, so that it can be suspended and resumed. This is obvious<br>for functions that take a callback, but even a function like MoveWindow will<br>synchronously send Windows messages related to moving to the window, so it is<br>also async with respect to the message handling.

And finally, all the normal reasons async code is yucky: getting object<br>lifetimes correct, how stack traces are busted, confusing debugging, and so<br>on.

In the spirit of exploring the design space, when I got to revisit this choice<br>in Theseus I instead made everything synchronous and implemented threads using<br>real OS threads. In particular because Theseus maps the original program's code<br>to function calls, it makes the debugging experience pretty pleasant: if I set a<br>breakpoint or if something crashes, I get a stack trace that goes through both<br>the source program and emulator code.

Picture: a Theseus program in a native debugger, with a stack trace including a<br>generated x86 address on the left, and with a thread picker showing the Windows<br>"winmm" multimedia thread on the right.

I mostly care about the developer experience here, but one additional reason<br>this approach is nice is performance. Computers are really good at quickly<br>running simple code made of nested function calls that store things on the<br>stack. My asynchronous approach meant there was a lot of control overhead, even<br>in tight loops.

Blocking on the web

In all, blocking is great. But on the web, you cannot block. Even in a<br>single-threaded program a call to a Windows API like GetMessage is supposed to<br>block until a message is available, but browser events will only come in via the<br>browser event loop once you've returned control. It would seem you're stuck.

What it really means is that fundamentally, if you want to block, you must use a<br>thread, even in the case where the program you're emulating is itself<br>single-threaded. So here's the approach: I run the emulator's threads in web<br>workers. When the emulator needs something from the browser, it can send a<br>message via the postMessage API that comes in on the main thread's event loop.<br>Critically, at this point I make the worker block until the message is handled.

This where the<br>atomics API comes in.<br>(Uh oh, synchronization code! The chances that I got this wrong are extremely<br>high; I welcome your feedback on this, and I post it in part to provoke some<br>reader who knows more than me to correct me.)

If you share memory between the main thread and worker, you can make the worker<br>block on an atomic until the main thread is done. To do this, the worker sends<br>the address of a local when it posts its message:

fn blocking_call() {<br>let mut buf = 0i32;<br>let msg = create_message(<br>/* ... some JavaScript data indicating what function to call ... */,

// ... and include the *address* of the above 'buf' variable<br>&mut buf as *mut _ as u32<br>);<br>post_message(msg);<br>unsafe {<br>// wait while buf==0 until we get an Atomic notify on...

theseus block thread emulator main code

Related Articles