Llama-dash – One go-to control plane for local inference

llama-dash — self-hosted inference gatewaygateway online|running 3 · peer 2|req/s 0.87 LOCAL INFERENCE CONTROL PLANE One control plane for local inference. Monitor models, requests, API keys, routing rules, and proxy metrics from one dashboard for llama-swap and compatible upstreams. Read the docs →Star on GitHub WORKS WITHOpenAI SDK·Claude Code·Continue·Open WebUI

OPERATOR DASHBOARD2026-04-30 · 22:01 REQ/S · 1M 0.87

P50 LATENCY 1.83 s

MODEL RESIDENCY · 60 MIN gemma-4-26B

kokoro · peer

qwen-3.6-37B

RECENT REQUESTS /v1/messages● 200950 ms /v1/chat/completions● 2003.29 s /v1/messages● 200644 ms

REQUEST PIPELINE CLIENTS OpenAI SDK Claude Code Continue · Open WebUI

──▶ llama-dash :3000 dashboard · auth · logs routing · metrics

──▶ llama-swap :8080 llama.cpp · peers

direct /v1 upstreams OpenAI · Anthropic

WHAT IT DOES D01 Watch the box Live request, token, model, upstream, and GPU status in one dashboard.

M05 Manage models Load, unload, inspect per-model stats, and edit llama-swap config with validation.

R02 Track requests Searchable history with filters, histograms, token counts, and cost estimates.

K08 Control access Hashed API keys, per-key RPM/TPM limits, and model allow-lists.

P10 Enforce policy Routing rules for model rewrites, passthrough auth, and encrypted credentials.

P06 Test models Playgrounds for chat, image, speech, and article-to-speech transcription.

Llama-dash – One go-to control plane for local inference

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org