verifying /proc | arya dradjica
I think it’s fairly obvious that I’m a perfectionist. Sometimes that drags me into the land of Linux system APIs — I once wrote a 300 line C program to replace the classic “run date every 60 seconds” loop you might find in your status bar. My goal was to use all the cool system calls I could find, be as efficient as possible, and update the date exactly on time. It was a lot of fun! I ran into a similar rabbit hole while trying to build a file system watcher, and the results were cool enough to deserve their own blog post.
I wanted to build a “perfect” file system watcher. Here’s how I define “perfect” — I can build a file system cache around the watcher, such that opening and reading some path is exactly equivalent to loading path from the cache. Thus, the watcher has to catch every possible change that could affect path — creation and deletion, renaming, changes to symlinks, etc. etc. If you’ve looked into file system watching before, you probably know about inotify(7), which seems like the “perfect” tool for the job.
Unfortunately, inotify is missing something: notifications about mounts. If you are watching the path /foo/bar with inotify, a new filesystem mount at /foo would go undetected. So how do we detect filesystem mounts? With a little bit of research, I learned that /proc/self/mounts lists information about the mounts visible to the current process and that it is pollable. That seems perfect! But … am I supposed to assume that /proc just works? What if it’s not mounted?
/proc is an important part of Linux’s system API — it goes beyond system calls and offers a lot more information and control. For example, the *at() family of system calls let you traverse the filesystem using file descriptors, which is more reliable against race conditions and can be security relevant. The man page for linkat(2) suggests using /proc/self/fd/$fd to link an anonymous file (see openat(2) w/ O_TMPFILE) into a filesystem (AT_EMPTY_PATH also exists but requires some permissions). Apparently glibc sometimes polyfills system calls by reading /proc. We should have a clear, reliable way of getting access to /proc … but as far as I know, everybody just assumes it’s there. Can we do better?
It took a few hours, but I think I’ve come up with a simple and secure mechanism for verifying /proc:
Set an inotify watch for /. This lets us detect modifications to /proc, excluding mounts. Depending on the nature of the change, we can restart the verification (up to N times) or give up.
Open /proc and use the resulting file descriptor for all later operations.
If /proc is mounted over afterwards, the fd will continue referring to the pre-mount contents. We’ll verify that the pre-mount contents refer to a real proc file system and use them to check for such mounts.
Use statfs(2) and verify f_type is PROC_SUPER_MAGIC (0x9fa0). At this point, we know /proc is a proc filesystem, but this is not enough! A bind mount of a real proc filesystem could result in a /proc that is rooted at some subdirectory (i.e. /proc points to /real_proc/foo/bar).
Check the inode number of /proc (e.g. with statx(2)). The /proc root should have an inode number of 1; I assume it won’t change.
I spent a long time on this step; initially I considered verifyng /proc/self/fd by making a temporary file descriptor and checking for it. Then I look at the inode number of /proc/self (which is somewhat reliable; at least in a recent kernel version, it’s PROC_DYNAMIC_START i.e. 0xF0000000 because it’s the first dynamically allocated inode number). Then I realized /proc would itself have a verifiable inode number. I like my final solution!
Open /proc/$PID, verify it is not mounted, and use the resulting file descriptor for all later operations.
Set a poll(2) watch for /proc/$PID/mounts. Ideally we’d check /mountinfo instead, but the man pages document poll(2) support only for the former. If a change is detected, restart the next step.
Open /proc/$PID/mountinfo and use it to learn about all mountpoints on the system. Check that /proc is still correctly mounted. Thanks to the poll(2) watch, this is a perfect live view!
I’m pretty happy with this! I think the verification is simple enough that any Linux application that cares about low-level details should use it. I’m pretty excited to write my own filesystem watcher, but it’d be nice if the standard inotify crate supported mount notifications. I might look into contributing mount notifications with this mechanism there.
I was talking to people on #linux on Libera.Chat while working on this. Somebody asked me what my use case and threat model were. My answer is simple: I just like having reliable ways to do things. Waving away potential problems because “it’s probably fine” doesn’t sit well with me. I like taking my time and properly understanding every problem I come across; I learned a lot along the way, and best of all, I had fun.
Until the next silly rabbit hole :)