Controlling Memory Management with BPF

signa111 pts0 comments

Controlling memory management with BPF [LWN.net]

LWN<br>.net<br>News from the source

Content Weekly Edition<br>Archives<br>Search<br>Kernel<br>Security<br>Events calendar<br>Unread comments

LWN FAQ<br>Write for us

Edition Return to the Front page

User:<br>Password: |

Log in /<br>Subscribe /<br>Register

Controlling memory management with BPF

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this<br>content and would like to see more of it, your subscription will<br>help to ensure that LWN continues to thrive. Please visit<br>this page to join up and keep LWN on<br>the net.

By Jonathan Corbet<br>May 15, 2026

LSFMM+BPF

Roman Gushchin began his session in the memory-management track of the<br>2026 Linux Storage,<br>Filesystem, Memory Management, and BPF Summit by saying that the<br>community has seen a lot of proposals adding BPF-based interfaces for<br>memory management. None of them have made their way into the mainline,<br>though. He wanted to explore the ways in which BPF might be helpful and<br>the obstacles that have kept BPF-based solutions out so far. This session<br>was followed by a discussion led by Shakeel Butt on what the requirements<br>for a new, BPF-based interface for memory control groups might look like.

Obstacles to BPF integration

Existing efforts have tried to capture a number of different<br>memory-management heuristics, he began. There have been proposals to use<br>BPF to control out-of-memory handling, NUMA<br>balancing, memory control<br>groups, page-cache<br>eviction, and more. There are more interesting ideas that have not yet<br>been pursued, including readahead control, madvise(),<br>kernel samepage merging, and guest-memory<br>control. Readahead, in particular, is a messy set of heuristics, but it is<br>important for performance.

There are a number of obstacles to the addition of BPF interfaces for the<br>memory-management subsystem, he said; he would cover them from the least<br>important to the most. The first was concerns about out-of-tree BPF<br>programs. Kernel developers want to see production-quality code land in<br>the mainline, but that is not how BPF is working now. There are<br>production-quality sched_ext schedulers,<br>for example, but they are all stubbornly out of tree. BPF maintainer<br>Alexei Starovoitov said that "sched_ext was a mistake", in that it<br>did not bring any production schedulers with it into the mainline. That is<br>a hard situation to fix now, he said. It would be good to have a good,<br>in-tree out-of-memory handler; if nothing else, it would help developers to<br>judge the proposed interfaces.

Including BPF programs in the kernel tree does not seem to be<br>controversial, Gushchin said, so the real question is how far developers<br>should go. A first step would be to just include the source for people to<br>examine and play with. Automatic loading of included BPF programs could be<br>a good second step, Starovoitov said; it would let people use the included<br>BPF programs easily. Gushchin suggested that a BPF implementation of systemd-oomd<br>would provide a good example of how that subsystem works.

Another obstacle is the current inability to attach struct ops programs to control groups. BPF<br>programs can be attached, but not those using the struct ops<br>interface. He has an implementation for the out-of-memory handler, but<br>sched_ext uses a different solution.

Then, there is the issue of safety and fallback; a broken BPF<br>memory-management program could easily make the system unusable. This is<br>the hardest issue to solve, at least from Gushchin's perspective; it is hard<br>even to define what "safety" means in this context. Time-based fallbacks<br>are hard to implement and ugly, he said. Memory-management actions can be<br>wrapped into monitored kfuncs, but that leads to non-generic solutions that<br>can hurt performance. The acceptable level of service needs to be defined;<br>a traffic-control program that drops all packets is OK, but a sched_ext<br>scheduler that starves half of the tasks in the system is less so. What<br>should happen if a faulty BPF program is loaded and the system can no<br>longer reclaim memory?

There will always be concerns about performance in hot paths, which will<br>make it hard to justify adding BPF programs in the hottest of them. The<br>memory-management subsystem depends heavily on batching for performance,<br>raising the question of whether BPF programs should run before or after batching<br>is done. He suggested that batching should happen first, but that makes it<br>impossible to control the batching itself with a BPF program.

Finally, the most important obstacle, he said, is ABI stability; this<br>concern had been most recently raised by<br>David Hildenbrand on the mailing list. In person, Hildenbrand said<br>that there was some confusion about what providing hooks for BPF programs<br>means; are they a permanent memory-management feature? The community may<br>not want to commit to keeping those hooks around indefinitely. That<br>concern has led to a decision not to provide hooks for the management of<br>transparent huge pages; nobody knows what the picture...

memory management programs control said kernel

Related Articles