Show HN: Route LLM prompts to cheapest capable model – pydantic-AI and litellm

reactance00831 pts0 comments

GitHub - Reactance0083/pydantic-ai-multi-llm-cost-optimizer: Route prompts to the cheapest model that handles them. Claude + GPT-4o + Groq. Live cost tracking. Built with pydantic-ai + litellm. · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

Reactance0083

pydantic-ai-multi-llm-cost-optimizer

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>17 Commits<br>17 Commits

.env.example

.env.example

LICENSE

LICENSE

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

View all files

Repository files navigation

Multi-LLM Cost Optimizer (pydantic-ai + litellm + FastAPI)

Routes every prompt to the cheapest model that can handle it well. Uses pydantic-ai for the routing decision and litellm for unified execution across Claude, GPT-4o, and Groq. Tracks cost per model with a live /stats endpoint.

What It Does

Receives a prompt with a quality tier (fast / standard / quality / max)

Routes to the cheapest appropriate model using claude-haiku-4-5 as the router

Executes via litellm (handles auth + API differences for all providers)

Returns the response with cost breakdown and latency

Quick Start

pip install -r requirements.txt<br>cp .env.example .env<br># Fill in at minimum ANTHROPIC_API_KEY. OPENAI and GROQ are optional.<br>uvicorn main:app --reload --port 8002

API Usage

POST /complete

curl -X POST http://localhost:8002/complete \<br>-H "Content-Type: application/json" \<br>-d '{<br>"prompt": "Summarize the key differences between REST and GraphQL",<br>"quality": "standard",<br>"task_type": "general"<br>}'

Response:

"text": "...",<br>"model_used": "anthropic/claude-haiku-4-5",<br>"input_tokens": 42,<br>"output_tokens": 218,<br>"cost_usd": 0.000283,<br>"latency_ms": 847

GET /stats

"models": {<br>"anthropic/claude-haiku-4-5": {"calls": 47, "total_cost": 0.0134, "total_tokens": 52400}<br>},<br>"total_cost_usd": 0.0134,<br>"total_calls": 47

Cost Table (May 2026)

Model<br>Input/1k<br>Output/1k<br>Best For

groq/llama-3.1-8b-instant<br>$0.00005<br>$0.00008<br>Fast, simple tasks

anthropic/claude-haiku-4-5<br>$0.00025<br>$0.00125<br>Structured outputs, classification

openai/gpt-4.1-mini<br>$0.0004<br>$0.0016<br>General tasks, good value

anthropic/claude-sonnet-4-6<br>$0.003<br>$0.015<br>Code, complex reasoning

openai/gpt-4.1<br>$0.002<br>$0.008<br>Complex tasks

anthropic/claude-opus-4-7<br>$0.015<br>$0.075<br>Hardest tasks only

openai/gpt-5.5<br>$0.005<br>$0.015<br>Flagship reasoning, hardest tasks

Quality Tiers

Tier<br>Models Considered<br>Use When

fast<br>Groq llama-8b, Claude haiku<br>Low-stakes, high-volume, simple classification

standard<br>Groq llama-70b, GPT-4o-mini, Claude haiku<br>Most production tasks

quality<br>Claude sonnet, GPT-4o<br>Code generation, complex analysis

max<br>Claude opus<br>Hardest problems, highest stakes

Structured Routing (pydantic-ai)

class RoutingDecision(BaseModel):<br>model: str # exact litellm model string<br>reason: str # 1-sentence justification<br>expected_tokens: int # rough output estimate

Architecture

POST /complete<br>→ routing agent (claude-haiku-4-5) → RoutingDecision<br>→ litellm.completion(model=decision.model, ...)<br>→ cost calculation → response + /stats update

Requirements

Python 3.11+

Anthropic API key (required)

OpenAI API key (optional, enables GPT routing)

Groq API key (optional, enables cheapest tier)

Get the Complete Bundle

All 5 templates are available individually or as a $39 bundle (saves $15 vs individual).

Template<br>Price<br>Link

Slack → Notion Automation<br>$9<br>Buy on Gumroad

GitHub Issue → Linear Triage<br>$9<br>Buy on Gumroad

Multi-LLM Cost Optimizer<br>$12<br>Buy on Gumroad

Web Scraper + Semantic Search<br>$9<br>Buy on Gumroad

Prompt Engineering Runbook<br>$15<br>Buy on Gumroad

Complete Bundle (all 5)<br>$39<br>Buy on Gumroad

Buying includes: all source files, README, requirements.txt, .env.example, and lifetime updates.

Free to use — the source is here on GitHub. Buying supports continued development and gets you a clean download with everything packaged.

Built by Wade Allen — AI Workflow Architect

About

Route prompts to the cheapest model that handles them. Claude + GPT-4o + Groq. Live cost...

claude model cost groq pydantic litellm

Related Articles