Hardware Is Asynchronous. Most of Our Operating Systems Still Aren't. | Matheus Santos
2026-06<br>Rust<br>OS<br>Systems
Hardware Is Asynchronous. Most of Our Operating Systems Still Aren't.
This past weekend I had maybe the best technical thread I've had in years, and it left me chewing on something I want to write down before it fades.
I'd posted a small writeup about wire-probe, a tiny L4 latency tool I built. What started as the usual "this looks AI-written" sniping turned, somewhere around the third reply, into a long and genuinely good back-and-forth with Mohit D. Patel, who is building a from-scratch operating system in Rust. We went deep: page-fault stacks, lazy FPU state, scheduler activations, cancellation semantics. And in the middle of it a single idea surfaced that I haven't been able to put down since.
Hardware is asynchronous, and almost every operating system we use is not.
Left: hardware components run independently and signal completion via queues. Right: Unix/POSIX blocks the calling thread until I/O returns. An async-first OS inverts that default.
The retrofit we all live with
The dominant model, the Unix and POSIX lineage that Linux and the BSDs inherit, was designed around blocking calls and untyped byte streams. You call read, your thread stops until the data is there. That was a reasonable model for the machines of the early 1970s, and it is still a clean way to think. The problem is that real hardware has never worked that way. A disk controller, a NIC, a DMA engine all run independently of the CPU and signal completion later. The synchronous call is a fiction the kernel maintains on top of fundamentally asynchronous machinery.
For decades we have been bolting asynchrony back onto that blocking foundation, usually badly. POSIX AIO was awkward and widely avoided. Linux's native AIO only really worked for direct I/O and quietly blocked the rest of the time. Then came io_uring, which is the best async interface Linux has ever had, and it exists in large part because everything before it was inadequate.
Here is the part worth noticing. io_uring is a completion-ring design: you submit operations into one queue, the kernel completes them, and you collect results from another. That is not a new idea. Windows NT shipped overlapped I/O and I/O completion ports back in the mid-1990s, and they do exactly this. So io_uring is not Linux inventing something. It is Linux converging, thirty years later, on a completion model another mainstream OS has had the whole time. The async-first idea is old. What is rare is making it the foundation instead of a late addition.
Everything is the same handle
One thing from the conversation stuck with me more than the rest. A Unix file descriptor, a Windows HANDLE, and the observer capability in his design are all the same primitive: a kernel-managed reference to something you can wait on. Strip away the abstraction layers and every system converges on roughly the same shape. Submit an operation, get back a handle, then poll it, block on it, or hang a callback off it. The interesting differences are not in that shape. They are in lifetime and cancellation, which is exactly where these designs get hard.
That convergence is the tell. If everyone ends up at the same primitive regardless of where they started, maybe the right move is to start there on purpose rather than arrive at it by accretion.
What async-first actually looks like
His approach, and I want to be clear this is an early and openly work-in-progress project, is to make asynchrony the default rather than the exception. Events in the kernel are modeled with an observer pattern. Threads are made cheap enough that you can just spawn one for a task instead of reaching for the thread pools and tasklets that exist mainly because traditional kernel threads are too heavy to use casually. Notifications into userspace are delivered as upcalls, which sit in the same family as the scheduler-activations research from the early 90s: a kernel-originated interrupt into user code instead of a thread parked on a call.
Underneath all of it is one claim. Since the hardware is asynchronous with respect to the CPU, the OS I/O model should be lightweight-async by default, and the blocking call should be the thing you build on top, not the thing everything else gets wrapped around.
Why I think this is a good direction
A surprising amount of operating-system machinery turns out to be workaround for the blocking-first assumption.
Async-signal-safety, the rule that you can only call a small set of functions from a signal handler, exists because signals are asynchronous delivery bolted onto a synchronous world, so any critical section can now be interrupted at the worst possible moment. Thread pools, tasklets, and workqueues exist because spawning a real kernel thread per small task is too expensive in the traditional model. A lot of accumulated complexity is there to paper over a foundation that assumed your code...