Porting btrfs-progs to Rust · xfbs's blog↓Skip to main content<br>Last weekend, I was itching to write some code. But finding a good project can<br>be difficult. What I try to look for is something challenging enough to learn<br>from, yet self-contained so I know what success looks like.<br>The idea that came to mind was porting btrfs-progs to Rust. For the<br>uninitiated, btrfs is a copy-on-write (CoW) filesystem in Linux popular<br>enough that Fedora has used it as the default filesystem since Fedora 33.<br>Btrfs has several neat features. If you are familiar with ZFS, you may<br>recognize that some of their feature sets overlap. In my mind, Btrfs is<br>like ZFS, but for everyday usage.<br>Copy-on-write : Writes are never in-place. Modified data goes to new<br>blocks while old blocks remain intact. This enables cheap snapshots and atomic<br>operations. You can take atomic snapshots of your entire filesystem and back<br>them up, even incrementally (with btrfs send / btrfs receive).<br>Integrity : Btrfs checksums all data and metadata, so it can detect silent<br>data corruption (bit rot). When combined with redundancy, it can<br>automatically repair corrupted blocks.<br>Subvolumes : It supports subvolumes, which are lightweight, independently<br>snapshottable directory trees that share the same underlying storage.<br>Multi-device : Btrfs filesystems can span multiple devices, which you can<br>add and remove on-the-fly. It has support for different data redundancy<br>profiles built in (such as single, RAID0, RAID1, RAID10), so you don’t need<br>to use things like LVM.<br>Compression : Built-in transparent compression and deduplication.<br>Online maintenance : You can defragment and resize a btrfs filesystem while<br>it is mounted and in use.<br>Since btrfs has capabilities that go beyond traditional filesystems, it comes<br>with a userspace utility called btrfs. This command-line tool lets you<br>interact with the features that are specific to it.<br># Take a read-only snapshot of the root filesystem<br>btrfs subvolume snapshot -r / /snapshot
# Write out the entire snapshot to the file `snapshot.bin`<br>btrfs send /snapshot > snapshot.bin<br>This tool is part of btrfs-progs, and is unsurprisingly written in C. The<br>tools work well and don’t have significant attack surface (they’re not exposed<br>to the network or anything), so there’s no pressing need to rewrite them in a<br>memory-safe language. But I wanted to do it anyway: it would help me understand<br>how these tools actually work, and I wanted to see if I could create a simpler<br>implementation that might be easier to maintain and test.<br>Making a plan #<br>Before starting this rewrite, I put some thought into how I wanted to approach<br>this. When doing a rewrite of an existing codebase, there are two strategies: a<br>“clean-room” rewrite, where you look only at the interface and effects of the<br>tool but not its code, or a source-informed rewrite, where you study and<br>translate the original code directly. The advantage of a clean-room approach is<br>that your rewritten code is original work that you can license however you<br>want. I chose the latter approach. I thought it would be easier if I actually<br>studied the existing code. However, that means my rewrite needs to carry the<br>same license as the original, since studying the code would likely influence<br>the outcome, making it a derived work.<br>I also decided to explore how useful an LLM could be for automating tedious<br>tasks. I’m somewhat ambivalent about LLM usage: I’ve had bad experiences where<br>they produced low-quality, incorrect code. At the same time, LLMs can be<br>genuinely helpful for mechanical tasks, such as translating CLI command<br>structures into clap declarations. I wanted to see if I could find the sweet<br>spot of maintaining control of the architecture and quality of the codebase,<br>while accelerating the process.<br>A first approach #<br>Initially, I wanted to understand how the original btrfs-progs codebase was<br>organized. What’s the architecture? Does it use loose coupling with tidy,<br>separated modules that I could translate individually to Rust and tie together<br>using FFI? How does the compilation process work? How are the tools tested?<br>I started by cloning the repository<br>and browsing around to understand what the pieces are and how they fit<br>together. Here’s what I understood about the structure:<br>PathDescriptionkernel-lib/Low-level data structures and algorithms extracted from the Linux kernel, intended for reuse in userspace tools.kernel-shared/Btrfs-specific kernel code synchronized with the Linux kernel’s btrfs implementation. It implements core btrfs algorithms and on-disk format handling.libbtrfs/Library for interacting with btrfs filesystems (also exposed as a Python module).common/Shared utility code for btrfs tools (parsing, formatting, device scanning, filesystem utilities, etc.).libbtrfsutil/Higher-level library for managing btrfs filesystems with official Python bindings (subvolume, filesystem, and qgroup operations).cmds/Implementation code for...