The Biggest and Weirdest Commits in Linux Kernel Git History
Destroy All Software
Content
Screencasts
Execute Program
Programmer's Compendium
Conference Talks
Account
Sign In
Sign Up
Company
Blog
Contact
Privacy Policy
FAQ
EULA
The Biggest and Weirdest Commits in Linux Kernel Git History
Posted on 2017-02-12
We normally think of git merges as having two parent commits.<br>For example, the most recent Linux kernel merge as I write this is commit 2c5d955, which is part of the run-up to release 4.10-rc6.<br>It has two parents:
2c5d955 Merge branch 'parisc-4.10-3' of ...<br>*- 2ad5d52 parisc: Don't use BITS_PER_LONG in use ...<br>*- 53cd1ad Merge branch 'i2c/for-current' of ...
Git also supports octopus merges, which have more than two parents.<br>This seems strange for those of us who work on smaller projects: wouldn't a merge with three or four parents be confusing?<br>Well, it depends.<br>Sometimes, a kernel maintainer needs to merge dozens of separate histories together at once.<br>Having 30 merge commits, one after another, would be more confusing than a single 30-way merge, especially if that 30-way merge was conflict-free.
Octopuses are more common than you might expect.<br>There are 649,306 commits in the kernel's history.<br>46,930 (7.2%) are merges.<br>Of the merges, 1,549 (3.3%) are octopus merges.<br>(This is as of commit 566cf87, which is my current HEAD.)
$ git log --oneline | wc -l<br>649306<br>$ git log --oneline --merges | wc -l<br>46930<br>$ git log --oneline --min-parents=3 | wc -l<br>1549
As a comparison point, 20% of all Rails commits are merges (12,401 out of 63,111), but it has zero octopus merges.<br>Rails is probably more representative of the average project; I expect that most git users don't know that octopus merges are even possible.
Now, the obvious question: how big do these octopus merges get?<br>The ">" lines here are continuations; the command is written in five lines total.<br>All of the commands in this post are as I typed them into the terminal while experimenting, so they're not necessarily easy to read.<br>I'm more interested in the conclusions and include code only for the curious.
$ (git log --min-parents=2 --pretty='format:%h %P' |<br>> ruby -ne '/^(\w+) (.*)$/ =~ $_; puts "#{$2.split.count} #{$1}"' |<br>> sort -n |<br>> tail -1)<br>66 2cde51f
66 parents!<br>That's a lot of parents.<br>What happened?
$ git log -1 2cde51f<br>commit 2cde51fbd0f310c8a2c5f977e665c0ac3945b46d<br>Merge: 7471c5c c097d5f 74c375c 04c3a85 5095f55 4f53477<br>2f54d2a 56d37d8 192043c f467a0f bbe5803 3990c51 d754fa9<br>516ea4b 69ae848 25c1a63 f52c919 111bd7b aafa85e dd407a3<br>71467e4 0f7f3d1 8778ac6 0406a40 308a0f3 2650bc4 8cb7a36<br>323702b ef74940 3cec159 72aa62b 328089a 11db0da e1771bc<br>f60e547 a010ff6 5e81543 58381da 626bcac 38136bd 06b2bd2<br>8c5178f 8e6ad35 008ef94 f58c4fc4 2309d67 5c15371 b65ab73<br>26090a8 9ea6fbc 2c48643 1769267 f3f9a60 f25cf34 3f30026<br>fbbf7fe c3e8494 e40e0b5 50c9697 6358711 0112b62 a0a0591<br>b888edb d44008b 9a199b8 784cbf8<br>Author: Mark Brown<br>Date: Thu Jan 2 13:01:55 2014 +0000
Merge remote-tracking branches [65 remote branch names]
This broke some history visualization tools, provoking a reaction from Linus Torvalds:
I just pulled the sound updates from Takashi, and as a result got your merge commit 2cde51fbd0f3. That one has 66 parents.
[...]
It's pulled, and it's fine, but there's clearly a balance between "octopus merges are fine" and "Christ, that's not an octopus, that's a Cthulhu merge".
From what I can see, this unusual 66-parent commit was an otherwise mundane merge of various changes to the ASoC code.<br>ASoC stands for ALSA System on Chip.<br>ALSA is the sound subsystem; "system on a chip" is a term for a computer packed into a single piece of silicon.<br>Putting those together, ASoC is sound support for embedded devices.
Now, how often do merges like this happen?<br>Never!<br>The second-place merge is fa623d1 with "only" 30 parents.<br>However, the large distance from 30 to 66 parents isn't surprising with sufficient context.
The number of parents for a git commit is probably distributed according to a fat one-sided distribution (often informally called a power law distribution, but that's usually not strictly correct for reasons that aren't interesting here).<br>Many properties of software systems fall into fat one-sided distributions.<br>Hold on; I'll generate a plot to be sure... (much nitpicking of chart layout ensues).<br>Yes, it's fat and one-sided:
To be terse and coarse about it, "fat one-sided" means that there are far more small things than large things, but also that the maximum size of the things is unbounded.<br>The kernel contains 45,381 two-parent merges, but only one 66-parent merge.<br>Given enough additional development history, we can expect to see a merge with more than 66 parents.
Lines of code per function or per module are also fat and one-sided (most functions and modules will be small, but some will be large; think of a "User" class in a web app).<br>Likewise for the rate of change for modules (most modules will change infrequently, but...