Get BaseRT · Base Compute
BaseRT<br>Meet BaseRT, the fastest runtime on Apple Silicon<br>$ curl -LsSf https://basecompute.co/install.sh | shCopy<br>Read the technical paper →BaseRT docs →GitHub →
Benchmarks<br>Faster than MLX<br>Up to 35% on Decode, up to 78% on Prefill.<br>BaseRTMLXLlama.cpp<br>Decode · tg128<br>Qwen3 0.6B
465(+35%)
344
297
Llama 3.2 1B
295(+15%)
258
230
Llama 3.2 3B
117(+5%)
112
102
Prefill · pp128<br>Qwen3 30B-A3B
738(+78%)
415
407
Gemma 4 26B-A4B
659(+42%)
464
414
Tokens / sec · Apple M4 Pro · 4-bit
Coding agents<br>Run it with a<br>local coding agent<br>Serve a model with BaseRT, point your agent at it, and keep everything on your machine. No API keys, no data leaving your device.
# 1. Serve a model<br>basert serve basecompute/gemma-4-E4B-it
# 2. Install the coding-agent plugin<br>pi install git:github.com/basecompute/pi-basert
# 3. Run it — everything is set<br>pi
Models<br>Supported models<br>Qwen3<br>Llama 3.2<br>Llama 3.1<br>Gemma 3<br>Gemma 4<br>Mistral<br>Phi-3<br>Nomic BERT
Melbourne & Berlin
Melbourne & Berlin