Safe SIMD in Rust, even on the inside

Safe SIMD in Rust, even on the inside | by Sergey "Shnatsel" Davidoff | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Sergey "Shnatsel" Davidoff

9 min read· 11 hours ago

Listen

Rust’s SIMD abstractions were not as safe as I’d like. Until now. It’s no secret that raw SIMD intrinsics are unpleasant to use. You want to write a + b, not this monstrosity: unsafe { #[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "avx2"))] _mm256_add_ps(a, b) #[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), target_feature = "sse", not(target_feature = "avx2")))] _mm_add_ps(a, b) #[cfg(all(target_arch = "aarch64", target_feature = "neon"))] vaddq_f32(a, b) }Look at it. It’s hideous. And the whole thing is wrapped in unsafe! And that’s a simplified example. It still doesn’t handle: Other common platforms: AVX-512, 32-bit ARM, WebAssembly Platforms without SIMD or obscure platforms like RISC-V Actually loading data like &[f32] into a form that each intrinsic accepts Selecting the best implementation for the CPU it’s running on Luckily, Rust provides many SIMD abstractions that handle all of that for you and let you simply write a + b. There is just one wrinkle. Inside, they’re still full of unsafe. It wasn’t gone, just hidden. Vast quantities of it lurking just beneath the surface, getting screwed up occasionally. Or rather, they were. Until now. Why do we even need ‘unsafe’? For the longest time you couldn’t get around wrapping the call to each intrinsic function such as _mm256_add_ps into unsafe because it is illegal to call one when it’s not available on the CPU you’re running on. So you had to have some mechanism for tracking which instructions are needed for each intrinsic, and which instructions you have access to, and cross-referencing them to decide if it’s safe to call a given function. It was either tedious if done by hand or complex if done by a code generator, always error-prone, and required unsafe around every intrinsic. This changed in Rust 1.87 when the compiler started tracking the required instruction sets itself, so you could write this: #[target_feature(enable = "avx2")] fn add_avx2(a: __m256, b: __m256) -> __m256 { _mm256_add_ps(a, b) // this is an avx2 intrinsic }Look ma, no unsafe! …yet. You still cannot write a + b with this. The best you can do is this: unsafe { add_avx2(a, b) }This only shifts the unsafe up a layer. You can call intrinsics inside functions annotated with the correct #[target_feature] now, but there still has to be unsafe somewhere in the chain. The other problem is more fundamental. You cannot put #[target_feature] on the implementation of + for your type, because + must be available always. So no a + b for us using this mechanism. Lemma: CPU feature tokens To understand how the final solution works, you first need to understand how CPU feature detection works. Normally, checking for a CPU feature like AVX2 is done at runtime using is_x86_feature_detected!("avx2"). But we definitely don’t want to run this check every single time we add two numbers together — that would completely tank performance. We want to check it once, and then prove to the compiler that it’s safe to use AVX2 instructions from that point on. Instead we can encode this proof into the type system using an unforgeable token: a zero-sized type with a private inner field. The only way to obtain this token is to call a function that performs the CPU feature check. If the check passes, the function hands you the token: pub struct Avx2(());

fn detect_avx2() -> Option { if is_x86_feature_detected!("avx2") { Some(Avx2(())) } else { None }And because it’s a zero-sized type, passing this token around has no runtime overhead. It exists purely as a compile-time proof. The upshot is that as long as you have an instance of the Avx2 struct, you can be sure that AVX2 instructions are available on the system. The key insight The compiler doesn’t know it, but this function is safe to call: #[target_feature(enable = "avx2")] fn add_avx2(token: Avx2, a: __m256, b: __m256) -> __m256 { _mm256_add_ps(a, b) }You can only call this function if you have an Avx2 token, which you can only get if AVX2 instructions are available on the system. If we can explain to the compiler that this is valid (using unsafe), we can write that unsafe only once and reuse it everywhere. What we need is a macro which is safe to invoke: with_avx2!( fn add_avx2(token: Avx2, a: __m256, b: __m256) -> __m256 { _mm256_add_ps(a, b) )but expands into this behind the scenes: fn add_avx2(token: Avx2, a: __m256, b: __m256) -> __m256 { // SAFETY: Avx2 is available according to the token, // and we verified that the inner function is not an `unsafe fn` unsafe { inner(token, a, b) }

#[target_feature(enable = "avx2")] fn inner(token: Avx2, a: __m256, b: __m256) -> __m256 { _mm256_add_ps(a, b) }Now if you use an intrinsic that isn’t in...

Safe SIMD in Rust, even on the inside

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

German ruling declares Google liable for false answers in AI Overviews