Show HN: Route LLM prompts to cheapest capable model – pydantic-AI and litellm

GitHub - Reactance0083/pydantic-ai-multi-llm-cost-optimizer: Route prompts to the cheapest model that handles them. Claude + GPT-4o + Groq. Live cost tracking. Built with pydantic-ai + litellm. · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Reactance0083

pydantic-ai-multi-llm-cost-optimizer

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 17 Commits 17 Commits

.env.example

LICENSE

README.md

main.py

requirements.txt

View all files

Repository files navigation

Multi-LLM Cost Optimizer (pydantic-ai + litellm + FastAPI)

Routes every prompt to the cheapest model that can handle it well. Uses pydantic-ai for the routing decision and litellm for unified execution across Claude, GPT-4o, and Groq. Tracks cost per model with a live /stats endpoint.

What It Does

Receives a prompt with a quality tier (fast / standard / quality / max)

Routes to the cheapest appropriate model using claude-haiku-4-5 as the router

Executes via litellm (handles auth + API differences for all providers)

Returns the response with cost breakdown and latency

Quick Start

pip install -r requirements.txt cp .env.example .env # Fill in at minimum ANTHROPIC_API_KEY. OPENAI and GROQ are optional. uvicorn main:app --reload --port 8002

API Usage

POST /complete

curl -X POST http://localhost:8002/complete \ -H "Content-Type: application/json" \ -d '{ "prompt": "Summarize the key differences between REST and GraphQL", "quality": "standard", "task_type": "general" }'

Response:

"text": "...", "model_used": "anthropic/claude-haiku-4-5", "input_tokens": 42, "output_tokens": 218, "cost_usd": 0.000283, "latency_ms": 847

GET /stats

"models": { "anthropic/claude-haiku-4-5": {"calls": 47, "total_cost": 0.0134, "total_tokens": 52400} }, "total_cost_usd": 0.0134, "total_calls": 47

Cost Table (May 2026)

Model Input/1k Output/1k Best For

groq/llama-3.1-8b-instant $0.00005 $0.00008 Fast, simple tasks

anthropic/claude-haiku-4-5 $0.00025 $0.00125 Structured outputs, classification

openai/gpt-4.1-mini $0.0004 $0.0016 General tasks, good value

anthropic/claude-sonnet-4-6 $0.003 $0.015 Code, complex reasoning

openai/gpt-4.1 $0.002 $0.008 Complex tasks

anthropic/claude-opus-4-7 $0.015 $0.075 Hardest tasks only

openai/gpt-5.5 $0.005 $0.015 Flagship reasoning, hardest tasks

Quality Tiers

Tier Models Considered Use When

fast Groq llama-8b, Claude haiku Low-stakes, high-volume, simple classification

standard Groq llama-70b, GPT-4o-mini, Claude haiku Most production tasks

quality Claude sonnet, GPT-4o Code generation, complex analysis

max Claude opus Hardest problems, highest stakes

Structured Routing (pydantic-ai)

class RoutingDecision(BaseModel): model: str # exact litellm model string reason: str # 1-sentence justification expected_tokens: int # rough output estimate

Architecture

POST /complete → routing agent (claude-haiku-4-5) → RoutingDecision → litellm.completion(model=decision.model, ...) → cost calculation → response + /stats update

Requirements

Python 3.11+

Anthropic API key (required)

OpenAI API key (optional, enables GPT routing)

Groq API key (optional, enables cheapest tier)

Get the Complete Bundle

All 5 templates are available individually or as a $39 bundle (saves $15 vs individual).

Template Price Link

Slack → Notion Automation $9 Buy on Gumroad

GitHub Issue → Linear Triage $9 Buy on Gumroad

Multi-LLM Cost Optimizer $12 Buy on Gumroad

Web Scraper + Semantic Search $9 Buy on Gumroad

Prompt Engineering Runbook $15 Buy on Gumroad

Complete Bundle (all 5) $39 Buy on Gumroad

Buying includes: all source files, README, requirements.txt, .env.example, and lifetime updates.

Free to use — the source is here on GitHub. Buying supports continued development and gets you a clean download with everything packaged.

Built by Wade Allen — AI Workflow Architect

About

Route prompts to the cheapest model that handles them. Claude + GPT-4o + Groq. Live cost...

Show HN: Route LLM prompts to cheapest capable model – pydantic-AI and litellm

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI