Structural diffing in Emacs; deterministic agent harnesses

surprisetalk1 pts0 comments

Structural diffing in Emacs; deterministic agent harnesses&larr; Back to Kevin's newslettersPublished: 2026 May 28<br>Difftron

Last newsletter I wished for structural diffs in a Magit-like UI and, a few weekends later, it turned out as awesome as I&rsquo;d imagined.

Behold, Difftron:

entity type hierarchy with one function change expanded to show a side-by-side diff with highlighting"/>

See the 15 minute demo video for more details.

Conceptually, the tool works as follows:

A Rust binary parses code into semantic entities (functions, types, classes, etc., depending on the language)

Entities are matched between the left and right sides of the diff based on entity type and name

Matched entities are shown as a side-by-side diff

Unmatched entities are marked as added/removed

This barely scratches the surface of the complex computer sciencey questions about tree diffing and heuristics for matching entities, but so far I don&rsquo;t mind at all — as it turns out, just this basic scheme combined with a rich user interface already feels miles better than the standard file-and-line-based diffs I&rsquo;ve been using.<br>All of the UI credit goes to Magit and Emacs, for a few reasons:

First, Magit espouses the idea that everything can be interacted with:

Commands are invoked, not by typing them out, but by pressing short mnemonic key sequences. Many commands act on the thing the cursor is on, either by showing more detailed or alternative information about it in a separate buffer or by performing an action with side-effects on it.

In Difftron:

the hierarchy levels can be expanded/collapsed (individually or globally) to quickly switch between &ldquo;overview&rdquo; and &ldquo;detail&rdquo; views of the diff

the diff text itself can be used to jump to the code — to the file in the current worktree if applicable, in a buffer, or a new worktree at that ref (thus allowing one to use LSP, etc. while exploring)

the left and right-side labels can be used to select different comparisons (press &ldquo;N&rdquo; or &ldquo;P&rdquo; to move to the next/previous commit; &ldquo;enter&rdquo; to pull up an autocomplete across all refs)

Second, Emacs is pretty much just text with different colors, so the default data-density is quite high.<br>I&rsquo;m sure Emacs is capable of supporting excessive amounts of whitespace to match even the most &ldquo;minimal, clean, modern&rdquo; (i.e., useless) web UI, but you&rsquo;d have to work against the grain to mess things up.

Third, the entire system is &ldquo;live&rdquo;, meaning that I can imagine some new feature, prompt an LLM to implement it, and then test it without restarting Emacs.

I just defined:

(defun difftron-reload ()<br>(interactive)<br>(when (featurep 'difftron)<br>(unload-feature 'difftron t))<br>(load "/Users/dev/work/difftron/emacs/difftron.el"))

and ran it whenever I wanted to test some changes.

This made the iteration loop quite fast, which made it low-friction and fun to polish away rough edges.

All in all, it took about 8 hours over two afternoons to knock out the initial implementation in Magit and record the first demo video, then another 16 hours polishing it as I used it myself and ran it by a few friends for critique.

The entire implementation was done by Codex with GPT-5.5 High (via my $20/month ChatGPT Plus subscription), and I&rsquo;ve barely touched the code myself.<br>I mainly provided guidance in terms of:

telling it what Rust crates to use for the language analysis (Rust Analyzer&rsquo;s ra_ap_syntax for Rust; arborium for Clojure and TypeScript)

telling it to set up dedicated scripts to lint, format, and test

telling it to set up a minimal Emacs environment that it could use to drive the package itself and reproduce firsthand any bugs I ran into

My agents.md instructions were minimal:

when editing Emacs code, review Magit and other package source code in /emacs/test-config/straight/repos/

review /scripts/ for available project commands like formatting and testing

use red/green TDD workflow

never add tests for config or script changes

Same advice I give myself, honestly: Read the source code of what you&rsquo;re using, put frequent/complex tasks into scripts, and specify the success condition before trying to implement it.

On open-sourcing a vibe-coded project

Last newsletter I said:

Mayyybe if I&rsquo;m happy with it I&rsquo;ll end up releasing something.<br>But I&rsquo;m not trying to collect Github stars or HN karma, so I might just happily use it in the privacy of my own home without trying to &ldquo;commercialize it&rdquo;.

While I am indeed happy with Difftron so far, I&rsquo;ve hesitated about sharing it because of a few lingering questions in the back of my mind:

for the code, how should I distinguish between:

code I thought hard about and wrote myself

code I prompted and reviewed

code I prompted and only &ldquo;black box&rdquo; tested

how open am I to accepting potentially YOLO&rsquo;d code from others (e.g., adding support for languages I...

rsquo code emacs difftron ldquo rdquo

Related Articles