Always Be Blaming

A few tips on 4D-ing your code comprehension skills.

I wrote on the importance of reading code before: Look Out For Bugs My default approach to reading is “predictive”: I don’t actually read the code line by line. Rather, I try to understand the problem that it wants to solve, then imagine my own solution, and read the “diff” between what I have in my mind and what I see in the editor. Non-empty “diff” signifies either a bug in my understanding, or an opportunity to improve the code.

This is 2D reading, understanding a snapshot of code, frozen in time. This is usually enough to spot “this feels odd” anomalies, worthy of further investigation.

Ideal code is memoryless — it precisely solves the problem at hand. Most real code is Markov — the shape of the code at time T depends not only on the problem statement, but also on the shape of the code at time T - 1. The 3D step is to trace the evolution of code over time, Where Do We Come From? What Are We? Where Are We Going?.

The step after that is to understand the why. What were we thinking back then, when we wrote this code? It’s useful to have the “theory of mind” concept ready here. I personally learned the term way too late in my life, so let me give a short intro for today’s lucky 10 000. Theory of mind is the ability to imagine yourself in someone else’s skin. Not just in their shoes (“I certainly would have acted differently in that situation”), but with their mind (“I wouldn’t have acted that way, but I get why they did”). This is something people learn. The experimental setup here is to have a child in a room with toys, with a doll sitting near the opposite end of the room, and asking the child “what does the doll see?”. Younger children describe the room from their perspective, older begin to intuit that doll’s perspective is different.

So this is the goal of reading code — understanding what the original author was thinking, and why.

End of the mumbo-jumbo, some practical advice. First, read Every line of code is always documented, it is very good.

Second, make sure it is effortless for you to find out how a given snippet of code evolved. This is harder than it seems! Just git blame isn’t an answer — mind the gap between the problem that’s easy to solve, and the problem in need of solving.

git blame answers spatial question of “how each line appeared in this file”, because there’s a relatively straightforward UI for this — annotate each line with a commit hash. But this is not the question you are asking most of the time! You don’t care about the file! There’s a small snippet of code in the middle, and you want a temporal history of that.

As much as I don’t like working in the browser GitHub’s web interface for blaming is probably better than what you get locally by default. It starts with the y shortcut, which resolves a symbolic reference like

https://github.com/tigerbeetle/tigerbeetle/blob/main/src/vsr/replica.zig

into the one which has a commit hash in the URL:

https://github.com/tigerbeetle/tigerbeetle/blob/c54f613a2eb2a127a0ba212704e3fa988c42e5cb/src/vsr/replica.zig

This commit hash is critical, because it anchors the entire repository — if you open a different file from the web UI, it will be shown as of that commit. This enables you to not myopically focus on just the diff in question, but to absorb the entire context at that point in time.

So my usual web workflow is:

ctrl+f to find the line I am interested in

b to toggle blame

Click “blame prior to change” a couple of times, repeating ctrl+f to go back to the snippet I am curious about.

cmd-click on the commits that are potentially relevant, pinning their commit hashes in the URL in new tabs.

Then, from the commit page, “Browse files” button to then go and t to other files. Or, cmd+l to focus browser’s address bar, and s/commit/tree/ (or back!) as needed, to switch between diff and snapshot views.

Again, my goal here is not to annotate a diff on a file but rather to get a “virtual checkout” as of the interesting commit.

This web approach is what I was using throughout most of my career, but I’ve finally found a way to replicate it locally. The idea is to make blaming “in-place”. Instead of git blame annotating lines of code, I directly switch to a historical commit. I have the following devil hydra of shortcuts:

, b l blames line. It notes the $line the cursor is at, runs git blame -L $line,$line to find $commit that introduced the line, and then runs git switch --detach $commit to check it out. I have a dedicated worktree for code archeology, so I don’t worry about trashing my work. There’s also a half-hearted attempt to maintain “logical” cursor position, but it doesn’t work very well. Is there some git command that tells me directly “what’s the equivalent of $file:$line:column in $sha-A for $sha-B?”

, b p blames parent. Which is just switching to the parent commit of the current HEAD, what “blame before this change” does on GitHub (it works...

Always Be Blaming

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast