Mojo: SIMD

tosh1 pts0 comments

SIMD | Mojo

IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md).<br>For the complete Mojo documentation index, see llms.txt.<br>Skip to main content

Version: 1.0.0b1On this page<br>For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

struct SIMD[dtype: DType, size: Int]

Represents a vector type that leverages hardware acceleration to process multiple data elements with a single operation.

SIMD (Single Instruction, Multiple Data) is a fundamental parallel<br>computing paradigm where a single CPU instruction operates on multiple data<br>elements at once. Modern CPUs can perform 4, 8, 16, or even 32 operations<br>in parallel using SIMD, delivering substantial performance improvements<br>over scalar operations. Instead of processing one value at a time, SIMD<br>processes entire vectors of values with each instruction.

For example, when adding two vectors of four values, a scalar operation<br>adds each value in the vector one by one, while a SIMD operation adds all<br>four values at once using vector registers:

Scalar operation: SIMD operation:

┌─────────────────────────┐ ┌───────────────────────────┐

│ 4 instructions │ │ 1 instruction │

│ 4 clock cycles │ │ 1 clock cycle │

│ │ │ │

│ ADD a[0], b[0] → c[0] │ │ Vector register A │

│ ADD a[1], b[1] → c[1] │ │ ┌─────┬─────┬─────┬─────┐ │

│ ADD a[2], b[2] → c[2] │ │ │a[0] │a[1] │a[2] │a[3] │ │

│ ADD a[3], b[3] → c[3] │ │ └─────┴─────┴─────┴─────┘ │

└─────────────────────────┘ │ + │

│ Vector register B │

│ ┌─────┬─────┬─────┬─────┐ │

│ │b[0] │b[1] │b[2] │b[3] │ │

│ └─────┴─────┴─────┴─────┘ │

│ ↓ │

│ SIMD_ADD │

│ ↓ │

│ Vector register C │

│ ┌─────┬─────┬─────┬─────┐ │

│ │c[0] │c[1] │c[2] │c[3] │ │

│ └─────┴─────┴─────┴─────┘ │

└───────────────────────────┘

The SIMD type maps directly to hardware vector registers and<br>instructions. Mojo automatically generates optimal SIMD code that leverages<br>CPU-specific instruction sets (such as AVX and NEON) without requiring<br>manual intrinsics or assembly programming.

This type is the foundation of high-performance CPU computing in Mojo,<br>enabling you to write code that automatically leverages modern CPU vector<br>capabilities while maintaining code clarity and portability.

Caution: If you declare a SIMD vector size larger than the vector<br>registers of the target hardware, the compiler will break up the SIMD into<br>multiple vector registers for compatibility. However, you should avoid<br>using a vector that's more than 2x the hardware's vector register size<br>because the resulting code will perform poorly.

Key properties:

Hardware-mapped : Directly maps to CPU vector registers

Type-safe : Data types and vector sizes are checked at compile time

Zero-cost : No runtime overhead compared to hand-optimized intrinsics

Portable : Same code works across different CPU architectures<br>(x86, ARM, etc.)

Composable : Seamlessly integrates with Mojo's parallelization features

Key APIs:

Construction:

Broadcast single value to all elements: SIMD[dtype, size](value)

Initialize with specific values: SIMD[dtype, size](v1, v2, ...)

Zero-initialized vector: SIMD[dtype, size]()

Element operations:

Arithmetic: +, -, *, /, %, //

Comparison: ==, !=, , , >, >=

Math functions: sqrt(), sin(), cos(), fma(), etc.

Bit operations: &, |, ^, ~, , >>

Vector operations:

Horizontal reductions: reduce_add(), reduce_mul(), reduce_min(), reduce_max()

Element-wise conditional selection: select(condition, true_case, false_case)

Vector manipulation: shuffle(), slice(), join(), split()

Type conversion: cast[target_dtype]()

Examples:

Vectorized math operations:

# Process 8 floating-point numbers simultaneously

var a = SIMD[DType.float32, 8](1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)

var b = SIMD[DType.float32, 8](2.0) # Broadcast 2.0 to all elements

var result = a * b + 1.0

print(result) # => [3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0, 17.0]

Conditional operations with masking:

# Double the positive values and negate the negative values

var values = SIMD[DType.int32, 4](1, -2, 3, -4)

var is_positive = values.gt(0) # greater-than: gets SIMD of booleans

var result = is_positive.select(values * 2, values * -1)

print(result) # => [2, 2, 6, 4]

Horizontal reductions:

# Sum all elements in a vector

var data = SIMD[DType.float64, 4](10.5, 20.3, 30.1, 40.7)

var total = data.reduce_add()

var maximum = data.reduce_max()

print(total, maximum) # => 101.6 40.7

Constraints:

The size of the SIMD vector must be positive and a power of 2.

Parameters​

​dtype (DType): The data type of SIMD vector elements.

​size (Int): The size of the SIMD vector (number of elements).

Implemented...

simd vector dtype values size data

Related Articles