BaseRT, A fast inference runtime for local AI on Apple Silicon

prabod2 pts0 comments

Get BaseRT · Base Compute

BaseRT<br>Meet BaseRT, the fastest runtime on Apple Silicon<br>$ curl -LsSf https://basecompute.co/install.sh | shCopy<br>Read the technical paper →BaseRT docs →GitHub →

Benchmarks<br>Faster than MLX<br>Up to 35% on Decode, up to 78% on Prefill.<br>BaseRTMLXLlama.cpp<br>Decode · tg128<br>Qwen3 0.6B

465(+35%)

344

297

Llama 3.2 1B

295(+15%)

258

230

Llama 3.2 3B

117(+5%)

112

102

Prefill · pp128<br>Qwen3 30B-A3B

738(+78%)

415

407

Gemma 4 26B-A4B

659(+42%)

464

414

Tokens / sec · Apple M4 Pro · 4-bit

Coding agents<br>Run it with a<br>local coding agent<br>Serve a model with BaseRT, point your agent at it, and keep everything on your machine. No API keys, no data leaving your device.

# 1. Serve a model<br>basert serve basecompute/gemma-4-E4B-it

# 2. Install the coding-agent plugin<br>pi install git:github.com/basecompute/pi-basert

# 3. Run it — everything is set<br>pi

Models<br>Supported models<br>Qwen3<br>Llama 3.2<br>Llama 3.1<br>Gemma 3<br>Gemma 4<br>Mistral<br>Phi-3<br>Nomic BERT

Melbourne & Berlin

Melbourne & Berlin

basert llama gemma apple basecompute install

Related Articles