BaseRT, A fast inference runtime for local AI on Apple Silicon

Get BaseRT · Base Compute

BaseRT Meet BaseRT, the fastest runtime on Apple Silicon $ curl -LsSf https://basecompute.co/install.sh | shCopy Read the technical paper →BaseRT docs →GitHub →

Benchmarks Faster than MLX Up to 35% on Decode, up to 78% on Prefill. BaseRTMLXLlama.cpp Decode · tg128 Qwen3 0.6B

465(+35%)

344

297

Llama 3.2 1B

295(+15%)

258

230

Llama 3.2 3B

117(+5%)

112

102

Prefill · pp128 Qwen3 30B-A3B

738(+78%)

415

407

Gemma 4 26B-A4B

659(+42%)

464

414

Tokens / sec · Apple M4 Pro · 4-bit

Coding agents Run it with a local coding agent Serve a model with BaseRT, point your agent at it, and keep everything on your machine. No API keys, no data leaving your device.

# 1. Serve a model basert serve basecompute/gemma-4-E4B-it

# 2. Install the coding-agent plugin pi install git:github.com/basecompute/pi-basert

# 3. Run it — everything is set pi

Models Supported models Qwen3 Llama 3.2 Llama 3.1 Gemma 3 Gemma 4 Mistral Phi-3 Nomic BERT

Melbourne & Berlin

BaseRT, A fast inference runtime for local AI on Apple Silicon

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level