Saturating 10 Gigabit on Linux

Saturating 10 Gigabit on Linux :: Terminal ThoughtsSaturating 10 Gigabit on Linux 2026-06-28 :: quest :: 14 min read (2892 words) #linux #hardware #networking Wringing full speed out of a network used to be a niche pursuit, the stuff of 10G home labs and datacenter transfers. That’s changing fast. Multi-gig fiber is landing in homes everywhere, even Hawaiian Telecom is selling 3 Gig out here on the Big Island. A fast link is no longer just the wire between your own machines; it’s the wire to the internet too. And Linux still ships network defaults sized for a slower era, so on a fast link, especially a long one, they leave throughput unused. Here’s the sysctl set I run, what each knob does and where it actually matters, and the single-stream iperf3 result over 20 feet of in-wall Cat 5e: 9.75 Gbit/s. This started on my own network. I’d built a 10-gigabit link between my laptop and my NAS, and I wanted to know how much of it I was really getting, and whether it’d be ready when my internet connection catches up. That meant a look at the kernel’s network defaults, which are more conservative than you’d expect. I’ll walk them in groups, and since a local LAN and a long-haul link lean on very different parts of the list, I’ll flag which is which as we go. Congestion Control: BBR and fq⌗ net.core.default_qdisc = fq net.ipv4.tcp_congestion_control = bbr

Congestion control is the algorithm that decides how fast a TCP sender pushes data and how it backs off when the network pushes back. The long-time default, CUBIC, treats packet loss as the signal to slow down. On a clean local wire that’s fine, but on a path with real distance or the occasional dropped packet, it retreats hard and leaves the pipe half full. Enter BBR. Rather than waiting for loss, it measures the path directly - the bottleneck bandwidth and the round-trip time - and paces itself to sit right at that limit. On a long or lossy link it holds throughput that CUBIC would surrender, which is exactly where a fat pipe over distance struggles without it. That pacing is the key, and it wants a qdisc that can space packets out evenly instead of letting them burst. That qdisc is fq (“fair queue”), so default_qdisc = fq and bbr are the standard pairing: BBR sets the pace, fq keeps it. Both have been in the mainline kernel for years; on anything current you just switch them on. Where it helps, honestly: on your LAN, with sub-millisecond latency and no loss, CUBIC already saturates 10G and BBR won’t beat it by much. This pair pays off the moment distance enters the path, a transfer to a server across the country, or any link that isn’t pristine. It costs nothing locally and rescues the long-haul case, so it stays. Socket Buffers⌗ net.core.rmem_default = 1048576 net.core.rmem_max = 67108864 net.core.wmem_default = 1048576 net.core.wmem_max = 67108864 net.core.optmem_max = 65536 net.ipv4.tcp_rmem = 4096 1048576 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 net.ipv4.udp_rmem_min = 8192 net.ipv4.udp_wmem_min = 8192

A socket buffer is the memory the kernel holds for data in flight, one buffer for sending and one for receiving. They matter for throughput because of a hard limit in TCP: a sender can have only one window of unacknowledged data on the wire at a time, and that window can’t exceed the buffer. If the buffer is too small, the sender fills the window and then stalls, waiting for acknowledgments to come back before it can send any more. How big does it need to be? At least the bandwidth-delay product (BDP): the link’s bandwidth times its round-trip time, which is the amount of data in flight needed to keep the pipe full. At 10 Gbit/s over a 50 ms path that works out to about 60 MiB, which is where the 64 MiB ceiling (67108864 bytes) comes from. Size the buffer under the BDP and throughput is capped no matter how fast the link. The three-number values, tcp_rmem and tcp_wmem, are min, default, and max. The kernel autotunes each connection between min and max, starting at the default and growing the buffer only as the connection demands it. So 64 MiB is a ceiling, not a reservation - a busy long-haul transfer can climb to it while a quiet connection stays small, and you get the headroom without paying for it on every socket. The rmem_max/wmem_max pair sets that same ceiling for the core, and optmem_max covers the small ancillary buffer used for control messages. The two udp_*_min values raise the floor below which UDP buffers won’t shrink under memory pressure. UDP doesn’t autotune the way TCP does, so the floor keeps it steady, and since the large buffers above apply to UDP sockets too, the same settings quietly help QUIC, which rides on UDP. Honestly, on your LAN almost none of this is in play. At sub-millisecond latency the BDP is a few hundred kilobytes, so the autotuner never climbs near 64 MiB and the stock defaults would fill the link just fine....

Saturating 10 Gigabit on Linux

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars