Theseus: Translating Win32 to WASM

Tech Notes: Theseus: translating win32 to wasm

Theseus: translating win32 to wasm

May 24, 2026

This post is part of a series on Theseus, my win32/x86 emulator.

Theseus now can produce WebAssembly output, allowing it to translate a .exe file into something that runs on the web. Try it out here, but note it is full of bugs (e.g. Minesweeper crashes if you win).

This was pretty straightforward to get working, with the exception of one major detail that this post will go into.

The x86 emulation part of this is just recompiling the existing Theseus output with a different CPU target. This is one of the main benefits of this binary translation approach. The translated code is almost (with the exception of how main gets invoked) wholly agnostic to the environment it eventually runs in. In principle I now get optimized wasm compiler output for relatively free. The main challenge was figuring out the code layout to get Cargo to cooperate with my weird requirements.

The win32 part was changing things to abstract over a "Host" API that is able to do things like fetch mouse events and render pixels. That is now implemented once for SDL and once for the web. This was also relatively straight forward, at least in my first pass.

So what was hard? It comes to a part of the design space I hadn't previously explored well: whether the emulator is allowed to block.

To block or not to block

In retrowin32, the emulator was designed to be able to step through some instructions and then return control to the caller. This is critical for the web version in particular, where you cannot block the main thread. In my earlier post "threading in two ways" I went into some detail on the various tradeoffs on how I could emulate threads in a browser, ultimately choosing a single thread.

This has its advantages, but is unsatisfying in a few important ways:

The main thread must repeatedly call into the emulator in a loop that yields control back to the browser.

Any Windows API implementation that might transfer control to the emulator must be made async, so that it can be suspended and resumed. This is obvious for functions that take a callback, but even a function like MoveWindow will synchronously send Windows messages related to moving to the window, so it is also async with respect to the message handling.

And finally, all the normal reasons async code is yucky: getting object lifetimes correct, how stack traces are busted, confusing debugging, and so on.

In the spirit of exploring the design space, when I got to revisit this choice in Theseus I instead made everything synchronous and implemented threads using real OS threads. In particular because Theseus maps the original program's code to function calls, it makes the debugging experience pretty pleasant: if I set a breakpoint or if something crashes, I get a stack trace that goes through both the source program and emulator code.

Picture: a Theseus program in a native debugger, with a stack trace including a generated x86 address on the left, and with a thread picker showing the Windows "winmm" multimedia thread on the right.

I mostly care about the developer experience here, but one additional reason this approach is nice is performance. Computers are really good at quickly running simple code made of nested function calls that store things on the stack. My asynchronous approach meant there was a lot of control overhead, even in tight loops.

Blocking on the web

In all, blocking is great. But on the web, you cannot block. Even in a single-threaded program a call to a Windows API like GetMessage is supposed to block until a message is available, but browser events will only come in via the browser event loop once you've returned control. It would seem you're stuck.

What it really means is that fundamentally, if you want to block, you must use a thread, even in the case where the program you're emulating is itself single-threaded. So here's the approach: I run the emulator's threads in web workers. When the emulator needs something from the browser, it can send a message via the postMessage API that comes in on the main thread's event loop. Critically, at this point I make the worker block until the message is handled.

This where the atomics API comes in. (Uh oh, synchronization code! The chances that I got this wrong are extremely high; I welcome your feedback on this, and I post it in part to provoke some reader who knows more than me to correct me.)

If you share memory between the main thread and worker, you can make the worker block on an atomic until the main thread is done. To do this, the worker sends the address of a local when it posts its message:

fn blocking_call() { let mut buf = 0i32; let msg = create_message( /* ... some JavaScript data indicating what function to call ... */,

// ... and include the *address* of the above 'buf' variable &mut buf as *mut _ as u32 ); post_message(msg); unsafe { // wait while buf==0 until we get an Atomic notify on...

Theseus: Translating Win32 to WASM

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits