Safe SIMD in Rust, even on the inside

g0xA52A2A1 pts0 comments

Safe SIMD in Rust, even on the inside | by Sergey "Shnatsel" Davidoff | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

Safe SIMD in Rust, even on the inside

Sergey "Shnatsel" Davidoff

9 min read·<br>11 hours ago

Listen

Share

Rust’s SIMD abstractions were not as safe as I’d like. Until now.<br>It’s no secret that raw SIMD intrinsics are unpleasant to use.<br>You want to write a + b, not this monstrosity:<br>unsafe {<br>#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "avx2"))]<br>_mm256_add_ps(a, b)<br>#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "sse", not(target_feature = "avx2")))]<br>_mm_add_ps(a, b)<br>#[cfg(all(target_arch = "aarch64", target_feature = "neon"))]<br>vaddq_f32(a, b)<br>}Look at it. It’s hideous. And the whole thing is wrapped in unsafe!<br>And that’s a simplified example. It still doesn’t handle:<br>Other common platforms: AVX-512, 32-bit ARM, WebAssembly<br>Platforms without SIMD or obscure platforms like RISC-V<br>Actually loading data like &[f32] into a form that each intrinsic accepts<br>Selecting the best implementation for the CPU it’s running on<br>Luckily, Rust provides many SIMD abstractions that handle all of that for you and let you simply write a + b.<br>There is just one wrinkle. Inside, they’re still full of unsafe. It wasn’t gone, just hidden. Vast quantities of it lurking just beneath the surface, getting screwed up occasionally.<br>Or rather, they were. Until now.<br>Why do we even need ‘unsafe’?<br>For the longest time you couldn’t get around wrapping the call to each intrinsic function such as _mm256_add_ps into unsafe because it is illegal to call one when it’s not available on the CPU you’re running on.<br>So you had to have some mechanism for tracking which instructions are needed for each intrinsic, and which instructions you have access to, and cross-referencing them to decide if it’s safe to call a given function.<br>It was either tedious if done by hand or complex if done by a code generator, always error-prone, and required unsafe around every intrinsic.<br>This changed in Rust 1.87 when the compiler started tracking the required instruction sets itself, so you could write this:<br>#[target_feature(enable = "avx2")]<br>fn add_avx2(a: __m256, b: __m256) -> __m256 {<br>_mm256_add_ps(a, b) // this is an avx2 intrinsic<br>}Look ma, no unsafe!<br>…yet.<br>You still cannot write a + b with this. The best you can do is this:<br>unsafe { add_avx2(a, b) }This only shifts the unsafe up a layer. You can call intrinsics inside functions annotated with the correct #[target_feature] now, but there still has to be unsafe somewhere in the chain.<br>The other problem is more fundamental. You cannot put #[target_feature] on the implementation of + for your type, because + must be available always. So no a + b for us using this mechanism.<br>Lemma: CPU feature tokens<br>To understand how the final solution works, you first need to understand how CPU feature detection works.<br>Normally, checking for a CPU feature like AVX2 is done at runtime using is_x86_feature_detected!("avx2"). But we definitely don’t want to run this check every single time we add two numbers together — that would completely tank performance. We want to check it once, and then prove to the compiler that it’s safe to use AVX2 instructions from that point on.<br>Instead we can encode this proof into the type system using an unforgeable token: a zero-sized type with a private inner field. The only way to obtain this token is to call a function that performs the CPU feature check. If the check passes, the function hands you the token:<br>pub struct Avx2(());

fn detect_avx2() -> Option {<br>if is_x86_feature_detected!("avx2") {<br>Some(Avx2(()))<br>} else {<br>None<br>}And because it’s a zero-sized type, passing this token around has no runtime overhead. It exists purely as a compile-time proof.<br>The upshot is that as long as you have an instance of the Avx2 struct, you can be sure that AVX2 instructions are available on the system.<br>The key insight<br>The compiler doesn’t know it, but this function is safe to call:<br>#[target_feature(enable = "avx2")]<br>fn add_avx2(token: Avx2, a: __m256, b: __m256) -> __m256 {<br>_mm256_add_ps(a, b)<br>}You can only call this function if you have an Avx2 token, which you can only get if AVX2 instructions are available on the system.<br>If we can explain to the compiler that this is valid (using unsafe), we can write that unsafe only once and reuse it everywhere.<br>What we need is a macro which is safe to invoke:<br>with_avx2!(<br>fn add_avx2(token: Avx2, a: __m256, b: __m256) -> __m256 {<br>_mm256_add_ps(a, b)<br>)but expands into this behind the scenes:<br>fn add_avx2(token: Avx2, a: __m256, b: __m256) -> __m256 {<br>// SAFETY: Avx2 is available according to the token,<br>// and we verified that the inner function is not an `unsafe fn`<br>unsafe { inner(token, a, b) }

#[target_feature(enable = "avx2")]<br>fn inner(token: Avx2, a: __m256, b: __m256) -> __m256 {<br>_mm256_add_ps(a, b)<br>}Now if you use an intrinsic that isn’t in...

avx2 __m256 unsafe token target_feature safe

Related Articles