Llama.cpp now has an official website: llama.app

llama.app - Official home for llama.cpp

curl -LsSf https://llama.app/install.sh | sh

Prefer Brew or Winget? Package managers · Rather build from source? Follow instructions

AI that lives on your computer. Open-source, private, always local. Run frontier AI entirely on your machine. No API keys, no telemetry, no limits. Take AI back.

# 1. Serve a model llama serve

# 2. Install the pi-llama plugin pi install git:github.com/huggingface/pi-llama

# 3. Run Pi, everything is set pi

Pair it with a local coding agent. Run llama serve, then launch Pi. It auto-discovers your local model. No config, no API keys. Files stay on your machine, requests never leave it.

Optimized for any hardware. From your laptop to a cluster, llama.cpp runs on whatever you have. Same binary, same models, same hand-tuned kernels for every GPU and CPU.

Apple Silicon M Ultra RTX 5090 H100 MI300 RTX 4090 M Max A100 DGX Spark T4 Jetson B200 Intel Arc CPU Radeon RX M Pro RTX 3090

Run your first model Qwen3.6-27B 27B params Coding & reasoning. Single-GPU sweet spot. Qwen3.6-35B-A3B 35B MoE · 3B active MoE: 35B-class quality at 3B-class speed. Gemma-4-26B-A4B 26B MoE · 4B active Google's desktop MoE. Strong reasoning, fast inference. Gemma-4-E4B 4B effective Tiny footprint. Runs on phones and low-end laptops. gpt-oss-20b 20B params OpenAI's open weights. Frontier reasoning, local. Step-3.5-Flash Flash variant Snappy generalist for everyday chat and writing.

Discover more models on Hugging Face

Llama.cpp now has an official website: llama.app

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan