Mojo: SIMD

SIMD | Mojo

IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt. Skip to main content

Version: 1.0.0b1On this page For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

struct SIMD[dtype: DType, size: Int]

Represents a vector type that leverages hardware acceleration to process multiple data elements with a single operation.

SIMD (Single Instruction, Multiple Data) is a fundamental parallel computing paradigm where a single CPU instruction operates on multiple data elements at once. Modern CPUs can perform 4, 8, 16, or even 32 operations in parallel using SIMD, delivering substantial performance improvements over scalar operations. Instead of processing one value at a time, SIMD processes entire vectors of values with each instruction.

For example, when adding two vectors of four values, a scalar operation adds each value in the vector one by one, while a SIMD operation adds all four values at once using vector registers:

Scalar operation: SIMD operation:

┌─────────────────────────┐ ┌───────────────────────────┐

│ 4 instructions │ │ 1 instruction │

│ 4 clock cycles │ │ 1 clock cycle │

│ │ │ │

│ ADD a[0], b[0] → c[0] │ │ Vector register A │

│ ADD a[1], b[1] → c[1] │ │ ┌─────┬─────┬─────┬─────┐ │

│ ADD a[2], b[2] → c[2] │ │ │a[0] │a[1] │a[2] │a[3] │ │

│ ADD a[3], b[3] → c[3] │ │ └─────┴─────┴─────┴─────┘ │

└─────────────────────────┘ │ + │

│ Vector register B │

│ ┌─────┬─────┬─────┬─────┐ │

│ │b[0] │b[1] │b[2] │b[3] │ │

│ └─────┴─────┴─────┴─────┘ │

│ ↓ │

│ SIMD_ADD │

│ ↓ │

│ Vector register C │

│ ┌─────┬─────┬─────┬─────┐ │

│ │c[0] │c[1] │c[2] │c[3] │ │

│ └─────┴─────┴─────┴─────┘ │

└───────────────────────────┘

The SIMD type maps directly to hardware vector registers and instructions. Mojo automatically generates optimal SIMD code that leverages CPU-specific instruction sets (such as AVX and NEON) without requiring manual intrinsics or assembly programming.

This type is the foundation of high-performance CPU computing in Mojo, enabling you to write code that automatically leverages modern CPU vector capabilities while maintaining code clarity and portability.

Caution: If you declare a SIMD vector size larger than the vector registers of the target hardware, the compiler will break up the SIMD into multiple vector registers for compatibility. However, you should avoid using a vector that's more than 2x the hardware's vector register size because the resulting code will perform poorly.

Key properties:

Hardware-mapped : Directly maps to CPU vector registers

Type-safe : Data types and vector sizes are checked at compile time

Zero-cost : No runtime overhead compared to hand-optimized intrinsics

Portable : Same code works across different CPU architectures (x86, ARM, etc.)

Composable : Seamlessly integrates with Mojo's parallelization features

Key APIs:

Construction:

Broadcast single value to all elements: SIMD[dtype, size](value)

Initialize with specific values: SIMD[dtype, size](v1, v2, ...)

Zero-initialized vector: SIMD[dtype, size]()

Element operations:

Arithmetic: +, -, *, /, %, //

Comparison: ==, !=, , , >, >=

Math functions: sqrt(), sin(), cos(), fma(), etc.

Bit operations: &, |, ^, ~, , >>

Vector operations:

Horizontal reductions: reduce_add(), reduce_mul(), reduce_min(), reduce_max()

Element-wise conditional selection: select(condition, true_case, false_case)

Vector manipulation: shuffle(), slice(), join(), split()

Type conversion: cast[target_dtype]()

Examples:

Vectorized math operations:

# Process 8 floating-point numbers simultaneously

var a = SIMD[DType.float32, 8](1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)

var b = SIMD[DType.float32, 8](2.0) # Broadcast 2.0 to all elements

var result = a * b + 1.0

print(result) # => [3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0, 17.0]

Conditional operations with masking:

# Double the positive values and negate the negative values

var values = SIMD[DType.int32, 4](1, -2, 3, -4)

var is_positive = values.gt(0) # greater-than: gets SIMD of booleans

var result = is_positive.select(values * 2, values * -1)

print(result) # => [2, 2, 6, 4]

Horizontal reductions:

# Sum all elements in a vector

var data = SIMD[DType.float64, 4](10.5, 20.3, 30.1, 40.7)

var total = data.reduce_add()

var maximum = data.reduce_max()

print(total, maximum) # => 101.6 40.7

Constraints:

The size of the SIMD vector must be positive and a power of 2.

Parameters

dtype (DType): The data type of SIMD vector elements.

size (Int): The size of the SIMD vector (number of elements).

Implemented...

Mojo: SIMD

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast