Toward better handling of major page faults

Toward better handling of major page faults [LWN.net]

LWN .net News from the source

Content Weekly Edition Archives Search Kernel Security Events calendar Unread comments

LWN FAQ Write for us

User: Password: |

Toward better handling of major page faults

[LWN subscriber-only content]

LWN needs you

LWN counts on its subscribing readers to support its mission of creating relevant human-written news coverage of the Linux and free-software communities. Please subscribe to LWN and help to keep us on the net.

As a special offer, subscribe to LWN now for at least six months, and receive a 25% discount on your subscription.

Proceed to the article

By Jonathan Corbet May 22, 2026

LSFMM+BPF

A major page fault occurs when a process attempts to access a page that is not currently present in RAM; satisfying such faults usually involves I/O, and can thus take some time. When many threads sharing an address space are generating page faults, the result can be significant lock contention while that I/O takes place. During the memory-management track at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit, Barry Song led a session to try, yet again, to find an enduring solution to this problem.

Song began by saying that per-VMA locks had been introduced a few years back; that work moved much of the page-fault-handling work to a lock at the virtual-memory-area (VMA) level, in an attempt to relieve pressure on the process-level mmap_lock. But, when satisfying the fault requires initiating I/O, the kernel will release the per-VMA lock then retry the handling of the fault, with mmap_lock held, once the I/O completes. That can create significant mmap_lock contention, causing threads to stall. He wanted to find ways of reducing that contention, and had a few options to consider.

The first is to simply retry the handling of the fault using the VMA lock rather than mmap_lock after I/O completion; he has posted a patch series implementing this idea. That would introduce even more complexity into the fault-handling path, though, he said.

An alternative would be to completely remove the retry code and, instead, simply hold the VMA lock while waiting for I/O. Lorenzo Stoakes worried that this approach, too, would add complexity. Shakeel Butt asked about how bad the additional complexity would be; Matthew Wilcox answered that it would not be that much worse, but that the fault-handling code is already too complex now. He said there might be a possible third option: apply Song's change to retry under the VMA lock, but only for anonymous pages, where the change is relatively simple.

Ryan Roberts said that the retry flag (the VM_FAULT_RETRY value returned when fault handling must be retried) is covering too many cases. It is used for compatibility with code that is not able to deal with the VMA lock and, as a result, retry has to use mmap_lock. Suren Baghdasaryan said that retries are called for when an operation cannot be done under the VMA lock — at least, not at the current time. There might be a place for a separate flag to call for a retry under the VMA lock. Butt asked whether the contention stalls Song had observed were associated with anonymous or file-backed pages; the answer was that the problem is mostly seen with the latter.

Song returned to the option of removing the retry code entirely. He said it would be possible, but has the potential to create priority-inversion problems. Threads running within an Android app have different priorities; in the wrong scenario, one thread holding mmap_lock could block the high-priority user-interface thread, causing visible stalls. Wilcox said that the real problem is threads waiting for I/O with the VMA locked, but Song said that the problem comes up even if the high-priority thread is not accessing the same VMA. After some discussion on whether the priority-inversion scenario was a real problem, the consensus seemed to be that it indeed is.

Song concluded with a few other discussion points, the first of which was whether it makes sense to use different approaches for anonymous and file-backed pages. In the case of anonymous pages, the kernel can allow page-fault handling and other VMA changes (an mprotect() call, for example) to happen concurrently. The file-backed side might benefit more from removing the retry logic entirely once the priority-inversion problem has been fully understood and avoided. Then, he said, there are cases where multiple threads are faulting in the same sets of pages; rather than contend on the mmap_lock and folio locks, the handler could check the up-to-date status of the folio. If it is up to date, then somebody else has already performed the I/O to handle the fault, so the retry can be avoided.

His final question was whether the kernel should retry fault handling under the VMA lock by default and only fall back to the mmap_lock in cases where it is known to be needed. Baghdasaryan repeated the...

Toward better handling of major page faults

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play