Toward better handling of major page faults [LWN.net]
LWN<br>.net<br>News from the source
Content Weekly Edition<br>Archives<br>Search<br>Kernel<br>Security<br>Events calendar<br>Unread comments
LWN FAQ<br>Write for us
User:<br>Password: |
Log in /<br>Subscribe /<br>Register
Toward better handling of major page faults
[LWN subscriber-only content]
LWN needs you
LWN counts on its subscribing readers to support its mission of creating relevant human-written news coverage of the Linux and free-software communities. Please subscribe to LWN and help to keep us on the net.
As a special offer, subscribe to LWN now for at least six months, and receive a 25% discount on your subscription.
Proceed to the article
By Jonathan Corbet<br>May 22, 2026
LSFMM+BPF
A major page fault occurs when a process attempts to access a page that is<br>not currently present in RAM; satisfying such faults usually involves I/O, and can thus take some time. When many threads<br>sharing an address space are generating page faults, the result can be<br>significant lock contention while that I/O<br>takes place. During the memory-management track at the 2026 Linux Storage,<br>Filesystem, Memory Management, and BPF Summit, Barry Song led a session<br>to try, yet again, to find an enduring solution to this problem.
Song began by saying that per-VMA locks had<br>been introduced a few years back; that work moved much of the<br>page-fault-handling work to a lock at the virtual-memory-area (VMA) level,<br>in an attempt to relieve pressure on the process-level mmap_lock.<br>But, when satisfying the fault requires initiating I/O, the kernel will release the per-VMA lock then<br>retry the handling of the fault, with mmap_lock held, once the I/O completes. That can create significant<br>mmap_lock contention, causing threads to stall. He wanted to find<br>ways of reducing that contention, and had a few options to consider.
The first is to simply retry the handling of the fault using the VMA lock<br>rather than mmap_lock after I/O completion; he has posted a patch<br>series implementing this idea. That would introduce even more<br>complexity into the fault-handling path, though, he said.
An alternative would be to completely remove the retry code and, instead,<br>simply hold the VMA lock while waiting for I/O. Lorenzo Stoakes worried<br>that this approach, too, would add complexity. Shakeel Butt asked about<br>how bad the additional complexity would be; Matthew Wilcox answered that it<br>would not be that much worse, but that the fault-handling code is already<br>too complex now. He said there might be a possible third option: apply<br>Song's change to retry under the VMA lock, but only for anonymous pages,<br>where the change is relatively simple.
Ryan Roberts said that the retry flag (the VM_FAULT_RETRY value<br>returned when fault handling must be retried) is covering too many cases.<br>It is used for compatibility with code that is not able to deal with the<br>VMA lock and, as a result, retry has to use mmap_lock. Suren<br>Baghdasaryan said that retries are called for when an operation cannot be<br>done under the VMA lock — at least, not at the current time. There might<br>be a place for a separate flag to call for a retry under the VMA lock.<br>Butt asked whether the contention stalls Song had observed were associated<br>with anonymous or file-backed pages; the answer was that the problem is<br>mostly seen with the latter.
Song returned to the option of removing the retry code entirely. He said<br>it would be possible, but has the potential to create priority-inversion<br>problems. Threads running within an Android app have different priorities;<br>in the wrong scenario, one thread holding mmap_lock could block<br>the high-priority user-interface thread, causing visible stalls. Wilcox<br>said that the real problem is threads waiting for I/O with the VMA locked, but Song said that the<br>problem comes up even if the high-priority thread is not accessing the same<br>VMA. After some discussion on whether the priority-inversion scenario was<br>a real problem, the consensus seemed to be that it indeed is.
Song concluded with a few other discussion points, the first of which was<br>whether it makes sense to use different approaches for anonymous and<br>file-backed pages. In the case of anonymous pages, the kernel can allow<br>page-fault handling and other VMA changes (an mprotect()<br>call, for example) to happen concurrently. The file-backed side might<br>benefit more from removing the retry logic entirely once the<br>priority-inversion problem has been fully understood and avoided. Then, he<br>said, there are cases where multiple threads are faulting in the same sets<br>of pages; rather than contend on the mmap_lock and folio locks,<br>the handler could check the up-to-date status of the folio. If it is up to<br>date, then somebody else has already performed the I/O to handle the fault, so the retry can be<br>avoided.
His final question was whether the kernel should retry fault handling under<br>the VMA lock by default and only fall back to the mmap_lock in<br>cases where it is known to be needed. Baghdasaryan repeated the...