Why Kubernetes nodes inherit problems they never asked for · siderolabs/awesome-talos Wiki · GitHub
//wiki/show" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
//wiki/show;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
siderolabs
awesome-talos
Public
Notifications<br>You must be signed in to change notification settings
Fork<br>35
Star<br>265
Why Kubernetes nodes inherit problems they never asked for
Jump to bottom
Spooky Hyrax edited this page May 18, 2026<br>·<br>3 revisions
We build Talos Linux, a minimal OS designed for Kubernetes. That context shapes this analysis, but the kernel data and CVE methodology are independently reproducible. Links throughout.
The Linux kernel hit 40 million lines of code in early 2025. It seems to double in size roughly every decade. The kernel grows because Linux is designed to run everywhere, and every subsystem added for one use case ships to all of them.<br>A Kubernetes node is not a general-purpose machine, but it inherits the same kernel as everything else, including interfaces, APIs, and optimizations built for use cases it will never have. Copy Fail (CVE-2026-31431) is a direct consequence of that.
Surface area as a vulnerability
Copy Fail originates from three independent kernel decisions made between 2011 and 2017 that, together, produced a straight-line path from unprivileged user to root, with no race condition or per-kernel offsets.<br>AF_ALG, the interface that made it exploitable, exists because general-purpose Linux doesn't ask whether a particular machine needs a particular API. While not a bug in the conventional sense, it creates a situation that is easy to exploit without much benefit. Let’s call it a business-bug: a technically intended behavior that functions as a vulnerability in your environment.
Where general-purpose Linux leaves Kubernetes nodes exposed
Here's what that looks like in practice:
The page cache is host-wide. Namespace isolation doesn't partition it. A successful attack can extract node secrets, including the Kubernetes CA.
The exploit leaves no forensic trace on disk. The page cache is modified in memory. Standard disk forensics won't detect it.
Disabling user workloads on control plane nodes by default constrains the blast radius. The compromised workload still needs to land on the same node as the secrets it's targeting.
When Copy Fail dropped, our internal Slack was very busy. We needed to know if/how we were vulnerable. That full summary is here.
For Talos Linux specifically, we found the kernel was vulnerable, and an upgrade was necessary. The canonical exploit path of interactive users and setuid binaries doesn't exist on Talos nodes. The Kubernetes PoC vector applies more broadly, but control plane isolation by default means a compromised workload still needs to land on the same node as the secrets it's targeting. The blast radius was narrower than it would have been on a general-purpose node.
The fix for Copy Fail reverts a 2017 optimization, so while the kernel is back to where it was, the AF_ALG interface remains.* That IPsec implementation detail became a general-purpose userspace API, shipped to every Kubernetes node that never needed it, for nine years. AF_ALG is one interface. The question is how many others like it are on your nodes right now.
* The next release of Talos will remove the crypto API completely.
To examine surface area, let’s look at binary count across OSes : Ubuntu ships 2,780 binaries. Flatcar ships 2,391. Talos Linux ships fewer than 50. Many of these additional binaries have no function on a Kubernetes node.
Fewer binaries equals fewer CVEs, and we found CVE exposure follows a similar pattern.
Using grype on default, up-to-date installations (as of September 2025), scanning the full system rather than just container images: Ubuntu carries 280 critical and 1,943 high CVEs. Flatcar carries 27 critical and 75 high. Talos Linux had 0 critical and 29 high, all of which come from the Linux kernel itself.
For full methodology, see our CVE comparison and binary comparison. We also maintain a VEX database documenting which CVEs are actually exploitable on Talos.
But CVEs still require patches, and the window to act keeps shrinking. In...