Improve Git monorepo performance with a file system monitor - The GitHub Blog
Try GitHub Copilot CLI
See what's new
Search
Jeff Hostetler·@jeffhostetler
June 29, 2022
Updated June 30, 2022
25 minutes
Share:
If you have a monorepo, you’ve probably already felt the pain of slow Git commands, such as git status and git add. These commands are slow because they need to search the entire worktree looking for changes. When the worktree is very large, Git needs to do a lot of work.
The Git file system monitor (FSMonitor) feature can speed up these commands by reducing the size of the search, and this can greatly reduce the pain of working in large worktrees. For example, this chart shows status times dropping to under a second on three different large worktrees when FSMonitor is enabled!
In this article, I want to talk about the new builtin FSMonitor git fsmonitor--daemon added in Git version 2.37.0. This is easy to set up and use since it is "in the box" and does not require any third-party tooling nor additional software. It only requires a config change to enable it. It is currently available on macOS and Windows.
To enable the new builtin FSMonitor, just set core.fsmonitor to true. A daemon will be started automatically in the background by the next Git command.
FSMonitor works well with core.untrackedcache, so we’ll also turn it on for the FSMonitor test runs. We’ll talk more about the untracked-cache later.
$ time git status<br>On branch main<br>Your branch is up to date with 'origin/main'.
It took 5.25 seconds to enumerate untracked files. 'status -uno'<br>may speed it up, but you have to be careful not to forget to add<br>new files yourself (see 'git help status').<br>nothing to commit, working tree clean
real 0m17.941s<br>user 0m0.031s<br>sys 0m0.046s
$ git config core.fsmonitor true<br>$ git config core.untrackedcache true
$ time git status<br>On branch main<br>Your branch is up to date with 'origin/main'.
It took 6.37 seconds to enumerate untracked files. 'status -uno'<br>may speed it up, but you have to be careful not to forget to add<br>new files yourself (see 'git help status').<br>nothing to commit, working tree clean
real 0m19.767s<br>user 0m0.000s<br>sys 0m0.078s
$ time git status<br>On branch main<br>Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
real 0m1.063s<br>user 0m0.000s<br>sys 0m0.093s
$ git fsmonitor--daemon status<br>fsmonitor-daemon is watching 'C:/work/chromium'
_Note that when the daemon first starts up, it needs to synchronize with the state of the index, so the next git status command may be just as slow (or slightly slower) than before, but subsequent commands should be much faster.
In this article, I’ll introduce the new builtin FSMonitor feature and explain how it improves performance on very large worktrees.
How FSMonitor improves performance
Git has a "What changed while I wasn’t looking?" problem. That is, when you run a command that operates on the worktree, such as git status, it has to discover what has changed relative to the index. It does this by searching the entire worktree. Whether you immediately run it again or run it again tomorrow, it has to rediscover all of that same information by searching again. Whether you edit zero, one, or a million files in the mean time, the next git status command has to do the same amount of work to rediscover what (if anything) has changed.
The cost of this search is relatively fixed and is based upon the number of files (and directories) present in the worktree. In a monorepo, there might be millions of files in the worktree, so this search can be very expensive.
What we really need is a way to focus on the changed files without searching the entire worktree.
How FSMonitor works
FSMonitor is a long-running daemon or service process.
It registers with the operating system to receive change notification events on files and directories.
It adds the pathnames of those files and directories to an in-memory, time-sorted queue.
It listens for IPC connections from client processes, such as git status.
It responds to client requests for a list of files and directories that have been modified recently.
FSMonitor must continuously watch the worktree to have a complete view of all file system changes, especially ones that happen between Git commands. So it must be a long-running daemon or service process and not associated with an individual Git command instance. And thus, it cannot be a traditional Git hook (child) process. This design does allow it to service multiple (possibly concurrent) Git commands.
FSMonitor Synchronization
FSMonitor has the concept of a "token":
A token is an opaque string defined by FSMonitor and can be thought of as a globally unique sequence number or timestamp.
FSMonitor creates a new token whenever file system events happen.
FSMonitor groups file system changes into sets by these ordered tokens.
A Git client command sends a (previously generated) token to FSMonitor to request the list of...