Git Is Not Fine

steveklabnik1 pts0 comments

Git Is Not Fine

Git Is Not Fine<br>This is a piece about git. But wrote it because of jj.

The thing about jj is that I’m in love with it. I love it, and I’m convinced that you’ll love it too. I think that if jj doesn’t have any dealbreakers for you, you should give it a serious shot.

But you probably won’t if you think git is fine. And that’s unfortunate, because git is not fine.

See, Git does two jobs: it’s a distributed store for source, and it’s a distributed workflow tool. It knocked the first job out of the park so far that most of us fail to see that its solutions for the second job were mostly an afterthought. And if you actually work in a meaningfully distributed way (and whether you know it or not, you do — across time, with yourself or others) then whether you know it or not you are feeling the pain. Because, like East River Source Control says, async development is table stakes.

Some Throatclearing About Git

If you’re not familiar with git (and you are), git is a distributed version control system, the first DVCS to hit critical mass and practically the only VCS anyone uses anymore. Almost every engineer who knows what a rebase is learned it using git commands, in terms of git constructs. It’s still a little miracle of a tool, too, economical and fast. As a result, most all of us have seen or written little diagrams that look like this (which represents a local feature branch in a steady state):

Diagrams like this are the heart of thinking in git: commits and branches. The commits are the source code and its history, and they are immutable. The branches are mutable pointers with a log attached.

Behind these perfect diagrams hide devils, imperfections in git’s model of how we work with code. Let us uncover them.

There is No C

Say you’re collaborating with someone in a faraway time zone. You don’t want to merge anything without getting their review first. How do maintain throughput in the presence of that time zone latency?

The same way CPUs do it: by pipelining your work. Instead of writing a single PR, submitting it, and waiting for it to finish before starting the next one, you write the first PR, submit it, write the second PR on top of it, submit that one, and so on and so forth, submitting many sequential PRs for review simultaneously. Like this:

The term of art for this is “stacked PRs”. And unfortunately, git makes stacked PRs very hard to work with.

To see why, let’s look at how a fastforward plus rebase flow is represented in git. Here’s our repo after a fresh fetch:

Here’s the same repo after fast-forwarding trunk and rebasing our bugfix branch onto it:

The rebase takes the diff of C2 to C1 and applies it to the new commit we received from origin, C3, creating C2’.

Those relationships are pretty clear in the diagram. That’s why people do the diagrams that way! Pro Git includes diagrams with exactly that shape.

But these commit names are unlike anything you’d find in a real repo. This is closer to reality:

And after you completed the rebase, you’d get something like this:

Take a moment to read these diagrams and the previous ones with fresh eyes, taking in what they point to in the underlying system.

You might see then that we’ve lost some important information in the new diagrams. The two “Fix key entry race” commits had an ordered relationship indicated with an apostrophe. But that’s not there in the new diagrams. Git has no knowledge of that relationship, and can’t tell you about it.

The commit names in the old diagram also imply that all the commits named C belong to an ordered series in a branch. You can still visually see that in the new diagram, too, but the arrows tell a different story: actually finding “Release 4.51.4”’s successors in code or with git commands is not trivial in a real repo. You’d have to scan all the branches for commits visible on a path to “Release 4.51.4”.

So when we read classic git diagrams, or even these more detailed git diagrams, the diagrams themselves and sometimes even our own eyeballs are misleading us about the capabilities of our tool. There is no “C2” that you can look for and see various permutations of. There’s not even a “C” linking these commits together. These notions do not exist.

As a result, git commits cannot tell you and have no idea about:

Successor commits

Revision history (if you amend a commit, you can’t get to the old one from the new one)

Rebase history

Whether they are garbage or not

Branches can’t do it either. They do have a notion of history, but:

Branches aren’t 1:1 with code changes. They are in some cases, but this is a convention you can’t rely on

Branches do not have relationships with one another. For example, it’s impossible to reliably find wp/bugfix from trunk in the above example — it’s not even reachable from trunk, since there are no forward references.

Got it? Great. Because this is, of course, a discussion of stacked PRs. (Remember?)

Let’s go back to that example. Say we write a...

diagrams commits branches fine like rebase

Related Articles