GitHub - Reactance0083/pydantic-ai-multi-llm-cost-optimizer: Route prompts to the cheapest model that handles them. Claude + GPT-4o + Groq. Live cost tracking. Built with pydantic-ai + litellm. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
Reactance0083
pydantic-ai-multi-llm-cost-optimizer
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>17 Commits<br>17 Commits
.env.example
.env.example
LICENSE
LICENSE
README.md
README.md
main.py
main.py
requirements.txt
requirements.txt
View all files
Repository files navigation
Multi-LLM Cost Optimizer (pydantic-ai + litellm + FastAPI)
Routes every prompt to the cheapest model that can handle it well. Uses pydantic-ai for the routing decision and litellm for unified execution across Claude, GPT-4o, and Groq. Tracks cost per model with a live /stats endpoint.
What It Does
Receives a prompt with a quality tier (fast / standard / quality / max)
Routes to the cheapest appropriate model using claude-haiku-4-5 as the router
Executes via litellm (handles auth + API differences for all providers)
Returns the response with cost breakdown and latency
Quick Start
pip install -r requirements.txt<br>cp .env.example .env<br># Fill in at minimum ANTHROPIC_API_KEY. OPENAI and GROQ are optional.<br>uvicorn main:app --reload --port 8002
API Usage
POST /complete
curl -X POST http://localhost:8002/complete \<br>-H "Content-Type: application/json" \<br>-d '{<br>"prompt": "Summarize the key differences between REST and GraphQL",<br>"quality": "standard",<br>"task_type": "general"<br>}'
Response:
"text": "...",<br>"model_used": "anthropic/claude-haiku-4-5",<br>"input_tokens": 42,<br>"output_tokens": 218,<br>"cost_usd": 0.000283,<br>"latency_ms": 847
GET /stats
"models": {<br>"anthropic/claude-haiku-4-5": {"calls": 47, "total_cost": 0.0134, "total_tokens": 52400}<br>},<br>"total_cost_usd": 0.0134,<br>"total_calls": 47
Cost Table (May 2026)
Model<br>Input/1k<br>Output/1k<br>Best For
groq/llama-3.1-8b-instant<br>$0.00005<br>$0.00008<br>Fast, simple tasks
anthropic/claude-haiku-4-5<br>$0.00025<br>$0.00125<br>Structured outputs, classification
openai/gpt-4.1-mini<br>$0.0004<br>$0.0016<br>General tasks, good value
anthropic/claude-sonnet-4-6<br>$0.003<br>$0.015<br>Code, complex reasoning
openai/gpt-4.1<br>$0.002<br>$0.008<br>Complex tasks
anthropic/claude-opus-4-7<br>$0.015<br>$0.075<br>Hardest tasks only
openai/gpt-5.5<br>$0.005<br>$0.015<br>Flagship reasoning, hardest tasks
Quality Tiers
Tier<br>Models Considered<br>Use When
fast<br>Groq llama-8b, Claude haiku<br>Low-stakes, high-volume, simple classification
standard<br>Groq llama-70b, GPT-4o-mini, Claude haiku<br>Most production tasks
quality<br>Claude sonnet, GPT-4o<br>Code generation, complex analysis
max<br>Claude opus<br>Hardest problems, highest stakes
Structured Routing (pydantic-ai)
class RoutingDecision(BaseModel):<br>model: str # exact litellm model string<br>reason: str # 1-sentence justification<br>expected_tokens: int # rough output estimate
Architecture
POST /complete<br>→ routing agent (claude-haiku-4-5) → RoutingDecision<br>→ litellm.completion(model=decision.model, ...)<br>→ cost calculation → response + /stats update
Requirements
Python 3.11+
Anthropic API key (required)
OpenAI API key (optional, enables GPT routing)
Groq API key (optional, enables cheapest tier)
Get the Complete Bundle
All 5 templates are available individually or as a $39 bundle (saves $15 vs individual).
Template<br>Price<br>Link
Slack → Notion Automation<br>$9<br>Buy on Gumroad
GitHub Issue → Linear Triage<br>$9<br>Buy on Gumroad
Multi-LLM Cost Optimizer<br>$12<br>Buy on Gumroad
Web Scraper + Semantic Search<br>$9<br>Buy on Gumroad
Prompt Engineering Runbook<br>$15<br>Buy on Gumroad
Complete Bundle (all 5)<br>$39<br>Buy on Gumroad
Buying includes: all source files, README, requirements.txt, .env.example, and lifetime updates.
Free to use — the source is here on GitHub. Buying supports continued development and gets you a clean download with everything packaged.
Built by Wade Allen — AI Workflow Architect
About
Route prompts to the cheapest model that handles them. Claude + GPT-4o + Groq. Live cost...