Controlling Memory Management with BPF

Controlling memory management with BPF [LWN.net]

LWN .net News from the source

Content Weekly Edition Archives Search Kernel Security Events calendar Unread comments

LWN FAQ Write for us

Edition Return to the Front page

User: Password: |

Controlling memory management with BPF

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Jonathan Corbet May 15, 2026

LSFMM+BPF

Roman Gushchin began his session in the memory-management track of the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit by saying that the community has seen a lot of proposals adding BPF-based interfaces for memory management. None of them have made their way into the mainline, though. He wanted to explore the ways in which BPF might be helpful and the obstacles that have kept BPF-based solutions out so far. This session was followed by a discussion led by Shakeel Butt on what the requirements for a new, BPF-based interface for memory control groups might look like.

Obstacles to BPF integration

Existing efforts have tried to capture a number of different memory-management heuristics, he began. There have been proposals to use BPF to control out-of-memory handling, NUMA balancing, memory control groups, page-cache eviction, and more. There are more interesting ideas that have not yet been pursued, including readahead control, madvise(), kernel samepage merging, and guest-memory control. Readahead, in particular, is a messy set of heuristics, but it is important for performance.

There are a number of obstacles to the addition of BPF interfaces for the memory-management subsystem, he said; he would cover them from the least important to the most. The first was concerns about out-of-tree BPF programs. Kernel developers want to see production-quality code land in the mainline, but that is not how BPF is working now. There are production-quality sched_ext schedulers, for example, but they are all stubbornly out of tree. BPF maintainer Alexei Starovoitov said that "sched_ext was a mistake", in that it did not bring any production schedulers with it into the mainline. That is a hard situation to fix now, he said. It would be good to have a good, in-tree out-of-memory handler; if nothing else, it would help developers to judge the proposed interfaces.

Including BPF programs in the kernel tree does not seem to be controversial, Gushchin said, so the real question is how far developers should go. A first step would be to just include the source for people to examine and play with. Automatic loading of included BPF programs could be a good second step, Starovoitov said; it would let people use the included BPF programs easily. Gushchin suggested that a BPF implementation of systemd-oomd would provide a good example of how that subsystem works.

Another obstacle is the current inability to attach struct ops programs to control groups. BPF programs can be attached, but not those using the struct ops interface. He has an implementation for the out-of-memory handler, but sched_ext uses a different solution.

Then, there is the issue of safety and fallback; a broken BPF memory-management program could easily make the system unusable. This is the hardest issue to solve, at least from Gushchin's perspective; it is hard even to define what "safety" means in this context. Time-based fallbacks are hard to implement and ugly, he said. Memory-management actions can be wrapped into monitored kfuncs, but that leads to non-generic solutions that can hurt performance. The acceptable level of service needs to be defined; a traffic-control program that drops all packets is OK, but a sched_ext scheduler that starves half of the tasks in the system is less so. What should happen if a faulty BPF program is loaded and the system can no longer reclaim memory?

There will always be concerns about performance in hot paths, which will make it hard to justify adding BPF programs in the hottest of them. The memory-management subsystem depends heavily on batching for performance, raising the question of whether BPF programs should run before or after batching is done. He suggested that batching should happen first, but that makes it impossible to control the batching itself with a BPF program.

Finally, the most important obstacle, he said, is ABI stability; this concern had been most recently raised by David Hildenbrand on the mailing list. In person, Hildenbrand said that there was some confusion about what providing hooks for BPF programs means; are they a permanent memory-management feature? The community may not want to commit to keeping those hooks around indefinitely. That concern has led to a decision not to provide hooks for the management of transparent huge pages; nobody knows what the picture...

Controlling Memory Management with BPF

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan