Kubernetes memory requests don't do what you think (until you enable MemoryQoS)

lkanwoqwp2 pts0 comments

Kubernetes Management: Memory Request and Limit in Practice<br>April 19, 2026 | 7 min ReadKubernetes Management: Memory Request and Limit in Practice<br>This is the second part of an article series about resource management in Kubernetes. In the first part, we discussed how Kubernetes manages CPU, and in this part we focus on memory.<br>How does Kubernetes manage memory?<br>From the previous part (Kubernetes CPU request and limit) we learned that Kubernetes uses cgroups for resource management, so now we can take a closer look at how this works for memory.<br>We will focus on cgroups v2, because they have been available since kernel version 4.5, meaning they are enabled by default in distributions such as Ubuntu 16.04, Red Hat Enterprise Linux 9 (available since version 8), and Amazon Linux 2023.<br>It is worth noting that cgroups v2 itself offers memory.min, which allows guaranteeing a minimum amount of memory for a process (v1 had no such mechanism). However, in the default Kubernetes configuration, memory request is used only by the scheduler - regardless of cgroups version. Kubelet does not pass the request value to the kernel as memory.min, so a process in a container can still be reclaimed or OOM-killed even if its usage is within the declared request. Only the MemoryQoS feature gate (alpha since 1.22, still alpha and disabled by default in 1.36) makes kubelet map requests.memory to memory.min in cgroup v2. The feature remains alpha due to potential kernel livelock under aggressive allocation near memory.high - it requires kernel >= 5.9 and a supported runtime (containerd/CRI-O).<br>In the memory controller documentation https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst it says the controller tracks utilization of:<br>Userland memory - page cache and anonymous memory.<br>Kernel data structures such as dentries and inodes.<br>TCP socket buffers<br>And it provides functions such as:<br>memory.current - shows current memory usage by processes in the cgroup<br>memory.min - allows guaranteeing a minimum amount of memory for a process (memory request), &ldquo;hard protection&rdquo;<br>memory.low - sets a threshold that will not be reclaimed as long as reclaimable resources exist in another cgroup that has no memory.low or is above memory.low, &ldquo;best effort memory protection&rdquo;<br>memory.high - defines a threshold above which the OS starts slowing down new allocations for the cgroup, but does not trigger OOM kill yet<br>memory.max - defines the maximum amount of memory the process can use. If the cgroup exceeds this limit, a process inside that cgroup is killed by the OS due to out of memory (OOM kill).<br>Memory Request - memory.min<br>By default memory request is used only by the scheduler. Memory request combined with feature MemoryQoS is mapped to memory.min in cgroups v2. This means it is the minimum amount of memory guaranteed to a cgroup - this memory will not be reclaimed by the kernel as long as usage stays within that boundary.<br>If we set in a Deployment:<br>resources:<br>requests:<br>memory: 500m<br>then - with the MemoryQoS feature gate enabled - kubelet passes this value to the container runtime (containerd / CRI-O) via the Unified field in CRI, and the runtime sets memory.min in the container cgroup. This is done by the ResourceConfigForPod function in kubelet:<br>func ResourceConfigForPod(allocatedPod *v1.Pod, enforceCPULimits bool, cpuPeriod uint64,<br>enforceMemoryQoS bool) *ResourceConfig {

// ...

if enforceMemoryQoS {<br>memoryMin := int64(0)<br>if request, found := reqs[v1.ResourceMemory]; found {<br>memoryMin = request.Value()<br>if memoryMin > 0 {<br>result.Unified = map[string]string{<br>Cgroup2MemoryMin: strconv.FormatInt(memoryMin, 10),

// ...<br>Note that request.Value() returns the value in bytes, so in our example memory.min will be 524288000 (that is 500 * 1024 * 1024), not the literal &ldquo;500Mi&rdquo; written in the cgroup file.<br>The enforceMemoryQoS parameter in kubelet code comes from the MemoryQoS feature gate, which is disabled by default. This means that in standard Kubernetes configuration:<br>memory request is not passed to the kernel as memory.min - the kernel does not receive this information at all<br>memory request is used only by kube-scheduler for Pod placement decisions<br>When MemoryQoS is enabled and kubelet sets memory.min on the Pod cgroup, the kernel treats this value as a hard guarantee: as long as cgroup memory usage stays within the effective min boundary, its pages are not reclaimed under any condition - even during node memory pressure. If the kernel cannot keep this guarantee, it invokes the OOM killer, which may terminate processes outside the protected cgroup to free required memory. This is a fundamental difference from the default setup, where a Pod with request 512Mi under node memory pressure is just another candidate for reclaim and eviction.<br>Memory Limit - memory.max<br>Memory limit in Kubernetes maps to memory.max in cgroups v2 (equivalent to memory.limit_in_bytes in cgroups v1). This is a hard memory limit a cgroup...

memory request cgroup kernel kubernetes limit

Related Articles