Swap tables, flash-friendly swap, swap_ops, and more

Swap tables, flash-friendly swap, swap_ops, and more [LWN.net]

LWN .net News from the source

Content Weekly Edition Archives Search Kernel Security Events calendar Unread comments

LWN FAQ Write for us

User: Password: |

Swap tables, flash-friendly swap, swap_ops, and more

[LWN subscriber-only content]

How do you stay on top of kernel development?

LWN is the only outlet providing coverage of Linux kernel development from the inside. Beyond immediate access to all content, LWN subscribers get a number of benefits, including access to the LWN Kernel Source Database, and they provide the crucial support that keeps this unique coverage alive.

GIve LWN a try : get a one-month free trial subscription, no obligations, no tricks, no credit card required.

Proceed to the article

By Jonathan Corbet May 18, 2026

LSFMM+BPF

The kernel's swap subsystem is charged with managing anonymous pages in secondary storage when those pages are (hopefully) not being used and the memory they occupy is needed elsewhere. This long-unloved subsystem has seen a resurgence of developer interest in recent times, so it is not surprising that it was the topic of three separate sessions in the memory-management track at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit. Two of those sessions were concerned with improving the performance and maintainability of the swap code, while one (shared with the storage track) was about how swapping could be friendlier to solid-state storage devices.

Status and roadmap

The first session was a breakneck-paced presentation from Kairui Song on recent changes in the swap subsystem and what is coming next. Song began by describing his work introducing the swap table and removing a lot of swap-subsystem complexity; see this article and its successor for details on this work. Before his changes were merged for 7.0, the swap subsystem incurred an overhead of between three and 11 bytes per page; that overhead is now reduced to between two and ten bytes. That news was greeted by applause in the room.

Song is not done, though; he intends to cut the static overhead to zero bytes, albeit still with a maximum of ten. His goal to cap that overhead at eight bytes will not be realized in the short term because refault tracking for the memory resource controller requires more data. In the long term, he still hopes to cut the maximum overhead to three bytes per page.

The need for some operations to bypass the swap cache has been removed, and most of the swap-oriented helpers are now folio based. Most operations only need the folio lock now; there are opportunities, he said, to optimize further by applying some lockless algorithms. Work to unify folio allocation with the swap cache is still in progress. Currently, anonymous and shared-memory folios come with their own allocation logic that may bypass readahead; he described this code as a long, complex, and racy fallback loop. He is working to replace it with a single allocation helper.

Other work is aimed at letting the system make better use of the swap cache; better readahead support is an important step in that direction. The zram subsystem can take advantage of it now but, he said, whether that is beneficial is not entirely clear. It may be that zram is fast enough already.

Swapping I/O is asynchronous and takes time; that means that there can be a long delay between the onset of memory pressure and the completion of the I/O that allows that pressure to be relieved. By the time that happens, it may turn out that the system has overshot and swapped out more pages than really needed. This could be helped by immediately dropping pages from the swap cache once writeout has completed. He is not sure why that is not always done now; more research is needed there.

There are a number of other problems yet to be solved. Swapping of PMD-level huge pages is not as efficient as it could be. Readahead can end up bringing in pages used for hibernation, which is wasteful but not a huge problem, though the workaround is ugly. He is contemplating adding a special bit to mark pages reserved for hibernation. There are users who would like to be able to resize swap areas on the fly; that should be practical to implement now.

Another problem arises when both anonymous and shared-memory (shmfs) folios are swapped to the same device. If shmfs-backed transparent huge pages (THPs) are being swapped, they can end up overlapping an anonymous page's slot; when that happens now, the offending folio is simply dropped. The problem will worsen, though, if readahead gains support for THPs. He is contemplating creating a new swap-table type to address this problem. Matthew Wilcox said the problem may come down to a confusion of logical (within the owning process's address space) and physical readahead; we are doing something wrong somewhere, he suggested.

Song is looking into compaction of the swap table. The system manages...

Swap tables, flash-friendly swap, swap_ops, and more

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast