I built a GPU back end for Emacs

How I built a GPU backend for Emacs | Andros Fenollosa

A few months ago I became obsessed with a silly question: why does my Emacs, on a laptop with a perfectly capable GPU, draw all of its text using the CPU? And that led to others: why can't I play a video inside a buffer? Why can't I have animated cursor effects? Why can't I cross-fade between buffers? I needed to satisfy my curiosity, so I started digging.

I started reading the code, with an AI as my companion. I discovered that every glyph, every underline, every scroll is recomputed and repainted by the processor. Emacs's redisplay engine (xdisp.c) was born in an era when there was no other option, and it is tuned to the millimeter for exactly that. And nobody had managed to slip a GPU underneath without rewriting half of Emacs... until recently.

So I decided to try. What began as a weekend experiment ended up being a complete display backend for macOS with Metal, a second backend for GNU/Linux with OpenGL, a video player inside the buffer, shader-based cursor effects, and a debate of more than a hundred messages on the Emacs developers' mailing list that ranged from cairo's performance to software freedom and the ethics of artificial intelligence.

This article exists because I feel like telling the story, and it might be useful for future implementations. At the end I leave the lessons I take away and a conclusion that is not the one I expected when I started.

A note of honesty up front: I built this project with the help of an LLM as a copilot, from start to finish. I say it here just as I said it in public when I was asked. I will come back to it, because it turned out to be the most important plot twist of the whole journey.

Phase 1: the architecture decision

Anyone's first instinct would be to open the macOS code, the Cocoa backend (nsterm.m), and start replacing CoreGraphics calls with Metal calls. It is the most direct path. And it is exactly what I decided not to do.

The problem with that approach is that it ties you to one platform. If I write "Emacs with Metal", I have an Emacs for Mac and nothing else. I needed to write a display-backend abstraction that would let me have one driver per platform. So I sketched a three-layer architecture on a Post-it:

flowchart TD X["Redisplay engine (xdisp.c, untouched)"]:::core --> P["src/gfxterm.c Neutral drawing policy (plain C)"]:::policy P --> D["src/gfxdrv.h Driver interface (~25 operations)"]:::iface D --> M["src/mtlterm.m (macOS) Metal driver"]:::mtl D --> G["src/glterm.c (GNU/Linux, X11) OpenGL ES / EGL driver"]:::gl

classDef core fill:#37474F,stroke:#263238,stroke-width:2px,color:#fff classDef policy fill:#00897B,stroke:#00695C,stroke-width:2px,color:#fff classDef iface fill:#7CB342,stroke:#558B2F,stroke-width:2px,color:#fff classDef mtl fill:#8E24AA,stroke:#6A1B9A,stroke-width:2px,color:#fff classDef gl fill:#D32F2F,stroke:#B71C1C,stroke-width:2px,color:#fff The idea is that all the drawing logic (how a glyph string is composed, where the wavy underline goes, how an image is clipped to the window, how scrolling works) lives in a plain-C file, without a single platform-specific line. And that each platform only has to implement a small contract: about 25 primitive operations of the kind "upload this texture", "draw this quad", "present the frame". That contract is gfxdrv.h. The first driver would be Metal, in mtlterm.m.

The golden rule, the one I imposed on myself and never broke once: xdisp.c is not touched . The redisplay engine computes the glyph matrices exactly as always; I only hook into the drawing interface that already exists. If the experiment went wrong, Emacs was still Emacs.

In hindsight, this was the best decision of the whole project!

Phase 2: the Metal backend and the tyranny of the pixel

With the architecture clear, I dove into Metal. The technical plan was that of any modern text renderer:

Rasterize each glyph just once, via CoreText, into a grayscale texture (a glyph atlas in R8 format).

Draw the text as textured quads that sample that atlas.

Upload images (PNG, JPEG, SVG, GIF) as textures.

Composite the whole frame on the GPU, in a persistent texture, and present it.

On paper, two afternoons. In practice, weeks. The reason has a name: pixel parity .

My success criterion was not "it looks good". It was that the result be identical, pixel for pixel, to the original Cocoa backend. Same binary, with the GPU on and off, and the diff between the two captures had to be practically zero. I built a harness that launched the same Emacs twice, loaded an identical scenario, captured the screen on both and compared them with Python and PIL. The bar landed around 0.055% of differing pixels in the baseline, and anything that strayed from there was a bug to hunt down.

That harness was relentless, and it surfaced a collection of details I had to look at under a magnifying glass:

The ink weight . CoreText and my shader applied antialiasing differently.

The relief colors...

I built a GPU back end for Emacs

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI