SIMD | Mojo
IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md).<br>For the complete Mojo documentation index, see llms.txt.<br>Skip to main content
Version: 1.0.0b1On this page<br>For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
struct SIMD[dtype: DType, size: Int]
Represents a vector type that leverages hardware acceleration to process multiple data elements with a single operation.
SIMD (Single Instruction, Multiple Data) is a fundamental parallel<br>computing paradigm where a single CPU instruction operates on multiple data<br>elements at once. Modern CPUs can perform 4, 8, 16, or even 32 operations<br>in parallel using SIMD, delivering substantial performance improvements<br>over scalar operations. Instead of processing one value at a time, SIMD<br>processes entire vectors of values with each instruction.
For example, when adding two vectors of four values, a scalar operation<br>adds each value in the vector one by one, while a SIMD operation adds all<br>four values at once using vector registers:
Scalar operation: SIMD operation:
┌─────────────────────────┐ ┌───────────────────────────┐
│ 4 instructions │ │ 1 instruction │
│ 4 clock cycles │ │ 1 clock cycle │
│ │ │ │
│ ADD a[0], b[0] → c[0] │ │ Vector register A │
│ ADD a[1], b[1] → c[1] │ │ ┌─────┬─────┬─────┬─────┐ │
│ ADD a[2], b[2] → c[2] │ │ │a[0] │a[1] │a[2] │a[3] │ │
│ ADD a[3], b[3] → c[3] │ │ └─────┴─────┴─────┴─────┘ │
└─────────────────────────┘ │ + │
│ Vector register B │
│ ┌─────┬─────┬─────┬─────┐ │
│ │b[0] │b[1] │b[2] │b[3] │ │
│ └─────┴─────┴─────┴─────┘ │
│ ↓ │
│ SIMD_ADD │
│ ↓ │
│ Vector register C │
│ ┌─────┬─────┬─────┬─────┐ │
│ │c[0] │c[1] │c[2] │c[3] │ │
│ └─────┴─────┴─────┴─────┘ │
└───────────────────────────┘
The SIMD type maps directly to hardware vector registers and<br>instructions. Mojo automatically generates optimal SIMD code that leverages<br>CPU-specific instruction sets (such as AVX and NEON) without requiring<br>manual intrinsics or assembly programming.
This type is the foundation of high-performance CPU computing in Mojo,<br>enabling you to write code that automatically leverages modern CPU vector<br>capabilities while maintaining code clarity and portability.
Caution: If you declare a SIMD vector size larger than the vector<br>registers of the target hardware, the compiler will break up the SIMD into<br>multiple vector registers for compatibility. However, you should avoid<br>using a vector that's more than 2x the hardware's vector register size<br>because the resulting code will perform poorly.
Key properties:
Hardware-mapped : Directly maps to CPU vector registers
Type-safe : Data types and vector sizes are checked at compile time
Zero-cost : No runtime overhead compared to hand-optimized intrinsics
Portable : Same code works across different CPU architectures<br>(x86, ARM, etc.)
Composable : Seamlessly integrates with Mojo's parallelization features
Key APIs:
Construction:
Broadcast single value to all elements: SIMD[dtype, size](value)
Initialize with specific values: SIMD[dtype, size](v1, v2, ...)
Zero-initialized vector: SIMD[dtype, size]()
Element operations:
Arithmetic: +, -, *, /, %, //
Comparison: ==, !=, , , >, >=
Math functions: sqrt(), sin(), cos(), fma(), etc.
Bit operations: &, |, ^, ~, , >>
Vector operations:
Horizontal reductions: reduce_add(), reduce_mul(), reduce_min(), reduce_max()
Element-wise conditional selection: select(condition, true_case, false_case)
Vector manipulation: shuffle(), slice(), join(), split()
Type conversion: cast[target_dtype]()
Examples:
Vectorized math operations:
# Process 8 floating-point numbers simultaneously
var a = SIMD[DType.float32, 8](1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
var b = SIMD[DType.float32, 8](2.0) # Broadcast 2.0 to all elements
var result = a * b + 1.0
print(result) # => [3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0, 17.0]
Conditional operations with masking:
# Double the positive values and negate the negative values
var values = SIMD[DType.int32, 4](1, -2, 3, -4)
var is_positive = values.gt(0) # greater-than: gets SIMD of booleans
var result = is_positive.select(values * 2, values * -1)
print(result) # => [2, 2, 6, 4]
Horizontal reductions:
# Sum all elements in a vector
var data = SIMD[DType.float64, 4](10.5, 20.3, 30.1, 40.7)
var total = data.reduce_add()
var maximum = data.reduce_max()
print(total, maximum) # => 101.6 40.7
Constraints:
The size of the SIMD vector must be positive and a power of 2.
Parameters
dtype (DType): The data type of SIMD vector elements.
size (Int): The size of the SIMD vector (number of elements).
Implemented...