llama.app - Official home for llama.cpp
curl -LsSf https://llama.app/install.sh | sh
Prefer Brew or Winget? Package managers · Rather build from source? Follow instructions
AI that lives on your computer.<br>Open-source, private, always local.<br>Run frontier AI entirely on your machine. No API keys, no telemetry, no limits. Take AI<br>back.
# 1. Serve a model<br>llama serve
# 2. Install the pi-llama plugin<br>pi install git:github.com/huggingface/pi-llama
# 3. Run Pi, everything is set<br>pi
Pair it with a local coding agent.<br>Run llama serve, then launch Pi. It auto-discovers your local model. No config, no API keys. Files stay on your machine,<br>requests never leave it.
Optimized for any hardware.<br>From your laptop to a cluster, llama.cpp runs on whatever you have. Same binary, same<br>models, same hand-tuned kernels for every GPU and CPU.
Apple Silicon M Ultra RTX 5090<br>H100 MI300 RTX 4090<br>M Max A100 DGX Spark T4<br>Jetson B200 Intel Arc<br>CPU Radeon RX M Pro RTX 3090
Run your first model<br>Qwen3.6-27B<br>27B params<br>Coding & reasoning. Single-GPU sweet spot.<br>Qwen3.6-35B-A3B<br>35B MoE · 3B active<br>MoE: 35B-class quality at 3B-class speed.<br>Gemma-4-26B-A4B<br>26B MoE · 4B active<br>Google's desktop MoE. Strong reasoning, fast inference.<br>Gemma-4-E4B<br>4B effective<br>Tiny footprint. Runs on phones and low-end laptops.<br>gpt-oss-20b<br>20B params<br>OpenAI's open weights. Frontier reasoning, local.<br>Step-3.5-Flash<br>Flash variant<br>Snappy generalist for everyday chat and writing.
Discover more models on Hugging Face