A faster bump allocator for rust

414owen1 pts0 comments

A faster bump allocator for rust

A faster bump allocator for rust

A faster bump allocator for rust

2026-06-02 23:02

Say hello to stumpalo.

Stumpalo is a bump allocator.

Stumpalo has scoped stack support.

Stumpalo is extremely fast.

Stumpalo has a logo, created very hastily.

Stumpalo’s logo is stumpy:

# Speed

You&rsquo;re probably using a bump allocator because you want raw<br>allocation throughput.

Let&rsquo;s see how fast stumpalo is, compared to other libraries.

operation<br>stumpalo<br>blink<br>bumpalo

alloc_u8<br>โœ… 1.00x<br>๐Ÿ”ด 2.14x<br>๐ŸŸ  1.54x

alloc_u16<br>โœ… 1.00x<br>๐Ÿ”ด 2.46x<br>๐ŸŸฅ 2.54x

alloc_u32<br>โœ… 1.00x<br>๐ŸŸฅ 3.36x<br>๐ŸŸฅ 3.34x

alloc_u64<br>โœ… 1.00x<br>๐ŸŸฅ 3.35x<br>๐ŸŸฅ 3.34x

alloc_u128<br>โœ… 1.00x<br>๐ŸŸก 1.19x<br>๐ŸŸก 1.18x

alloc_multiple_u8<br>โœ… 1.00x<br>๐Ÿ”ด 1.82x<br>๐Ÿ”ด 1.85x

alloc_multiple_u16<br>โœ… 1.00x<br>๐Ÿ”ด 2.30x<br>๐Ÿ”ด 2.34x

alloc_multiple_u32<br>โœ… 1.00x<br>๐ŸŸฅ 3.12x<br>๐ŸŸฅ 3.14x

alloc_multiple_u64<br>โœ… 1.00x<br>๐ŸŸฅ 3.23x<br>๐ŸŸฅ 3.25x

alloc_multiple_u128<br>โœ… 1.00x<br>๐ŸŸฅ 2.70x<br>๐ŸŸฅ 2.61x

alloc_array_u8_8<br>โœ… 1.00x<br>๐Ÿ”ด 1.99x<br>๐Ÿ”ด 2.11x

alloc_array_u8_32<br>โœ… 1.00x<br>๐ŸŸข 1.15x<br>๐ŸŸก 1.20x

alloc_array_u8_64<br>โœ… 1.00x<br>๐ŸŸ  1.55x<br>๐ŸŸ  1.59x

alloc_array_u8_128<br>โœ… 1.00x<br>๐ŸŸก 1.30x<br>๐ŸŸ  1.50x

alloc_slice_u8_8<br>๐ŸŸข 1.11x<br>๐ŸŸก 1.27x<br>โœ… 1.00x

alloc_slice_u8_32<br>๐ŸŸข 1.06x<br>โœ… 1.00x<br>๐ŸŸข 1.08x

alloc_slice_u8_64<br>โœ… 1.05x<br>โœ… 1.00x<br>๐ŸŸข 1.09x

alloc_slice_u8_128<br>โœ… 1.00x<br>๐ŸŸข 1.06x<br>โœ… 1.04x

alloc_slice_u16_8<br>โœ… 1.00x<br>๐ŸŸก 1.33x<br>๐ŸŸก 1.16x

alloc_slice_u16_32<br>โœ… 1.00x<br>๐ŸŸข 1.14x<br>๐ŸŸข 1.11x

alloc_slice_u16_64<br>โœ… 1.00x<br>๐ŸŸข 1.14x<br>๐ŸŸข 1.10x

alloc_slice_u16_128<br>โœ… 1.04x<br>โœ… 1.00x<br>โœ… 1.02x

alloc_slice_u32_8<br>โœ… 1.00x<br>๐ŸŸข 1.14x<br>๐ŸŸข 1.09x

alloc_slice_u32_32<br>โœ… 1.00x<br>๐ŸŸข 1.14x<br>๐ŸŸข 1.10x

alloc_slice_u32_64<br>โœ… 1.05x<br>โœ… 1.00x<br>๐ŸŸข 1.06x

alloc_slice_u32_128<br>๐ŸŸข 1.09x<br>โœ… 1.00x<br>๐ŸŸข 1.13x

alloc_slice_u64_8<br>โœ… 1.00x<br>๐ŸŸก 1.25x<br>๐ŸŸข 1.11x

alloc_slice_u64_32<br>โœ… 1.04x<br>โœ… 1.00x<br>โœ… 1.02x

alloc_slice_u64_64<br>๐ŸŸข 1.08x<br>โœ… 1.00x<br>๐ŸŸข 1.10x

alloc_slice_u64_128<br>๐ŸŸข 1.07x<br>โœ… 1.00x<br>๐ŸŸข 1.08x

alloc_slice_u128_8<br>โœ… 1.00x<br>๐ŸŸข 1.12x<br>๐ŸŸข 1.11x

alloc_slice_u128_32<br>๐ŸŸข 1.08x<br>โœ… 1.00x<br>๐ŸŸข 1.12x

alloc_slice_u128_64<br>๐ŸŸข 1.07x<br>โœ… 1.00x<br>๐ŸŸข 1.08x

alloc_slice_u128_128<br>โœ… 1.03x<br>โœ… 1.00x<br>โœ… 1.04x

alloc_struct_13<br>โœ… 1.00x<br>๐ŸŸ  1.55x<br>๐ŸŸ  1.39x

alloc_struct_24<br>โœ… 1.00x<br>๐Ÿ”ด 1.94x<br>๐Ÿ”ด 1.97x

alloc_struct_26<br>โœ… 1.00x<br>๐ŸŸ  1.56x<br>๐ŸŸ  1.52x

alloc_struct_30<br>โœ… 1.00x<br>๐ŸŸ  1.54x<br>๐ŸŸ  1.45x

alloc_struct_32<br>โœ… 1.00x<br>๐ŸŸ  1.35x<br>๐ŸŸ  1.40x

alloc_struct_64<br>โœ… 1.00x<br>๐ŸŸ  1.44x<br>๐ŸŸ  1.48x

alloc_struct_96<br>โœ… 1.00x<br>๐ŸŸข 1.13x<br>๐ŸŸก 1.18x

alloc_struct_128<br>โœ… 1.00x<br>๐ŸŸก 1.33x<br>๐ŸŸก 1.17x

alloc_struct_192<br>โœ… 1.02x<br>โœ… 1.00x<br>๐ŸŸข 1.09x

alloc_struct_256<br>โœ… 1.00x<br>๐ŸŸก 1.16x<br>โœ… 1.01x

alloc_struct_512<br>๐ŸŸข 1.06x<br>โœ… 1.00x<br>โœ… 1.02x

alloc_struct_1k<br>โœ… 1.00x<br>๐ŸŸข 1.05x<br>โœ… 1.01x

alloc_str_8<br>๐ŸŸข 1.11x<br>โœ… 1.05x<br>โœ… 1.00x

alloc_str_16<br>๐ŸŸข 1.07x<br>โœ… 1.02x<br>โœ… 1.00x

alloc_str_32<br>โœ… 1.04x<br>โœ… 1.00x<br>๐ŸŸข 1.07x

alloc_str_40<br>โœ… 1.00x<br>๐ŸŸข 1.08x<br>๐ŸŸข 1.06x

alloc_str_48<br>โœ… 1.00x<br>โœ… 1.03x<br>๐ŸŸข 1.06x

alloc_str_64<br>โœ… 1.00x<br>โœ… 1.04x<br>๐ŸŸข 1.06x

alloc_str_72<br>โœ… 1.04x<br>โœ… 1.00x<br>๐ŸŸข 1.07x

alloc_str_80<br>โœ… 1.03x<br>โœ… 1.00x<br>๐ŸŸข 1.07x

alloc_str_128<br>โœ… 1.00x<br>๐ŸŸข 1.11x<br>๐ŸŸข 1.08x

alloc_slice_lit_u8_8<br>โœ… 1.00x<br>๐Ÿ”ด 2.47x<br>๐Ÿ”ด 2.23x

alloc_slice_lit_u8_32<br>โœ… 1.00x<br>๐Ÿ”ด 1.83x<br>๐ŸŸ  1.71x

alloc_slice_lit_u8_64<br>โœ… 1.00x<br>๐ŸŸก 1.34x<br>๐ŸŸ  1.42x

alloc_slice_lit_u8_128<br>โœ… 1.00x<br>๐ŸŸก 1.31x<br>๐ŸŸก 1.31x

alloc_str_lit_8<br>โœ… 1.00x<br>๐Ÿ”ด 2.02x<br>๐Ÿ”ด 1.82x

alloc_str_lit_16<br>โœ… 1.00x<br>๐Ÿ”ด 1.78x<br>๐ŸŸ  1.60x

alloc_str_lit_32<br>โœ… 1.00x<br>๐ŸŸ  1.51x<br>๐ŸŸ  1.42x

alloc_str_lit_40<br>โœ… 1.00x<br>๐Ÿ”ด 1.76x<br>๐Ÿ”ด 1.93x

alloc_str_lit_48<br>โœ… 1.00x<br>๐ŸŸ  1.74x<br>๐Ÿ”ด 1.82x

alloc_str_lit_64<br>โœ… 1.00x<br>๐Ÿ”ด 1.75x<br>๐ŸŸ  1.69x

alloc_str_lit_72<br>โœ… 1.00x<br>๐ŸŸ  1.53x<br>๐ŸŸ  1.61x

alloc_str_lit_80<br>โœ… 1.00x<br>๐ŸŸ  1.54x<br>๐ŸŸ  1.63x

alloc_str_lit_128<br>โœ… 1.00x<br>๐ŸŸ  1.36x<br>๐ŸŸ  1.35x

clear<br>โœ… 1.00x<br>โœ… 1.04x<br>โœ… 1.04x

clear_and_reuse<br>โœ… 1.00x<br>๐ŸŸฅ 3.35x<br>๐ŸŸฅ 3.35x

Benchmark machine: AMD Ryzen 3900x, Arch Linux, kernel 7.0.3

# Where does the speed come from

In an arena allocator, the fast path is everything.<br>The fast path has to check whether there&rsquo;s room in the current chunk, if so,<br>allocate the value in the current chunk, and if not, jump to the slow path.

# Using more information

Rustc / LLVM is able to erase if/else statements whose conditions are expressions known<br>at compile-time.

Different types have different information available at compile-time. Think alignment and size.<br>When this information is available, stumpalo uses it, as well as information about the hardware<br>you&rsquo;re running on, to avoid overflow/underflow checks, when overflow/underflow couldn&rsquo;t<br>possibly occur anyway.

Generally, stumpalo&rsquo;s fast-paths contain a single conditional branch, and as few as six<br>instructions.

# Less indirection

A stumpalo arena contains pointers to the top and bottom of the chunk.<br>Other libraries contain a pointer to a chunk, whose header contains pointers to their top.<br>Stumpalo goes through one less layer of indirection to read the top.

# Example

The following function:

fn alloc_u32(a: &mut Arena, n: u32) -> &mut u32 {<br>a.alloc(n)

Compiles down to this fast path:

alloc_u32:<br>mov rcx, qword ptr [rdi]<br>and rcx, -4<br>lea rax, [rcx - 4]<br>cmp rax, qword ptr [rdi + 8]<br>jb example::ArenaRef::alloc_slow_with::h903e68372b5b408b<br>mov dword ptr [rcx - 4], esi<br>mov...

stumpalo allocator rsquo bump fast faster

Related Articles