Rust Service Isn't Leaking – It Could Be the Allocator

abhirag1 pts0 comments

Your Rust Service Isn't Leaking — It Could Be the Allocator

Your Rust Service Isn't Leaking — It Could Be the Allocator

on 2026-06-17

While load testing one of our Rust services at work, we ran into something that took us way longer to figure out than we'd like to admit: memory that shoots up under load and just stays there.

Our service was event-driven:

Read events from a message queue (Kafka/Redis Streams/NATS)

For each event, spawn a Tokio task to process it

Use a Semaphore to cap concurrent tasks to handle backpressure

With this setup, we expected memory to come down once all the events are processed. But it stayed pinned near the top of the container limit.

Our Workload

We had a bursty and sparse workload with a frequency of 3-4 times per hour. Each burst had ~100k events. We ran our services as Kubernetes pods on an Ubuntu cloud VM.

Here is a much simpler version of the workload pattern we were dealing with.

At a high level, the code below keeps only 100 events active at a time using a Semaphore. For each event, it spawns many short-lived Tokio tasks, each waiting on I/O, and then getting dropped once its response is collected.

struct Event {<br>payload: Bytes, // ~4KB<br>user_tokens: VecString>, // up to 1000 tokens<br>// other fields

...<br>// in main<br>let semaphore = Arc::new(Semaphore::new(100));

loop {<br>let event: Event = fetch_next_event().await;<br>let permit = semaphore.acquire_owned().await.unwrap();

tokio::spawn(async move {<br>let _permit = permit;<br>let data = event.payload.clone();

let mut tasks = JoinSet::new();<br>for token in &event.user_tokens {<br>let token = token.clone();<br>let data = data.clone();<br>tasks.spawn(async move {<br>// hits an outbound API and returns response<br>process(token, data).await<br>});

let mut responses = Vec::with_capacity(event.user_tokens.len());<br>while let Some(res) = tasks.join_next().await {<br>responses.push(res)

generate_response_event(event, responses);<br>});

The above code is written in a way which is easy to understand. We tried some code optimizations which reduced the peak memory usage but the memory pattern remained the same.

First Check: Is It a Memory Leak?

With the memory staying pinned, our immediate question was, is our code leaking somewhere?

Rust makes memory leaks harder to write accidentally, but not impossible. With so many tasks being spawned aggressively, it was easy for us to suspect that some of these Tokio tasks were hanging around longer than they should.

We used dhat to see if there were any memory leaks.

At t-gmax: 1,455,866,178 bytes (100%) in 1,321,561 blocks (100%), avg size 1,101.63 bytes<br>At t-end: 10,798 bytes (100%) in 26 blocks (100%), avg size 415.31 bytes

t-gmax tells about the peak heap memory consumption during entire program and t-end tells about the heap memory state as the program finishes executing.

The heap memory dropping from 1.4GB peak to 10KB confirms that the Rust program is freeing almost everything it allocates. But Kubernetes was still showing high RSS. RSS (Resident Set Size) is the amount of physical memory that the OS currently counts as being used by the process.

This gap between t-end and the RSS reported by K8s comes from how glibc's allocator manages freed memory.

glibc's Allocator

glibc's ptmalloc manages memory through arenas. Each arena allocates from one or more contiguous heap regions. For thread arenas, these regions are mmap-backed sub-heaps. The following is a simplified model of how the allocator works.

Allocation

When Tokio tasks are executed concurrently, the memory blocks for the tasks are laid out sequentially in these arenas, in the order they are requested. With Tokio's work-stealing, tasks can have memory allocated across arenas.

Here is a simplified view of how allocations can end up inside arenas.

Arena 1 Arena 2<br>│ │ │ │<br>│ Virtual expansion... │ │ Virtual expansion... │<br>+----------------------+ +----------------------+<br>│ TOP CHUNK │ │ TOP CHUNK │ ← (top pointer)<br>+----------------------+ +----------------------+<br>│ Task C: Box │ │ Task D: Vec │<br>+----------------------+ +----------------------+<br>│ Task A: Buffer │ │ Task C: Integer │<br>+----------------------+ +----------------------+<br>│ Task B: Vec │ │ Task A: String │<br>+----------------------+ +----------------------+<br>│ Task A: String │ │ Task B: Vec │<br>+----------------------+ +----------------------+<br>HEAP START HEAP START

There is no boundary separating Task A's memory from Task B's memory. They are interleaved.

glibc allocates memory in these arenas only when allocation is below the mmap threshold. The default mmap threshold is 128KB, but glibc adjusts it dynamically up to 32MB on a 64-bit machine. Since most individual allocations in our workload were below the mmap threshold, they were handled inside glibc arenas instead of getting their own mmap regions.

De-allocation

glibc shrinks this contiguous heap region by trimming from the top. The OS can reclaim memory only when the free space is at the end of the heap. Allocations above the mmap...

memory task event tasks heap glibc

Related Articles