Show HN: Our Claude Code Plugin Routes Lightweight AI Tasks to Specialized SLMs

joshdappier1 pts0 comments

How to Reduce AI Compute Costs with our Claude Code Plugin: Routing Lightweight AI Tasks to Small Language Models | by ZeroGPU | ZeroGPU | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

ZeroGPU

ZeroGPU is where engineers, founders, and builders learn how to deploy fast, lightweight AI using edge devices, SLMs, and distributed inference all with real examples from the ZeroGPU ecosystem.

An Explainer on ZeroGPU’s new Router for Claude Code<br>How to Reduce AI Compute Costs with our Claude Code Plugin: Routing Lightweight AI Tasks to Small Language Models

Our new plug-in lets Claude Code route repeatable or lightweight tasks to specialized small and nano language models.

ZeroGPU

2 min read·<br>Jun 3, 2026

Listen

Share

Developers using Claude Code can now offload lightweight AI tasks like classification, extraction, tagging, and PII redaction to ZeroGPU’s small and nano language models directly from the terminal.

Use our ZeroGPU Router with Claude Code to automatically reduce costs on relevant tasksThe new zerogpu-router plugin integrates ZeroGPU’s inference platform into Claude Code’s plugin system, exposing ZeroGPU commands as Claude-accessible skills and slash commands. Instead of sending every task to a frontier model, developers can selectively or automatically route narrow NLP workloads to smaller, specialized models designed for speed and cost efficiency.<br>The release reflects a growing shift in agentic coding workflows: not every request inside an AI coding session needs Claude-level reasoning.<br>Turning Claude Code Into a Multi-Model Router<br>Claude Code, Anthropic’s terminal-based coding agent, allows developers to extend sessions with plugins, slash commands, and auto-invoked skills. The zerogpu-router plugin adds ZeroGPU’s inference layer directly into that workflow.<br>Once installed, Claude can automatically detect and route requests like:<br>PII redaction<br>Named entity extraction<br>IAB taxonomy classification<br>Sentiment and topic labeling<br>JSON extraction from free text<br>Short single-turn chat responses<br>Claude can also auto-invoke skills based on intent. Requests mentioning “redact,” “extract,” or “classify” automatically trigger the appropriate ZeroGPU model behind the scenes.<br>Specialized Nano Models for Structured Tasks<br>Our plugin routes requests to a catalog of smaller models hosted on ZeroGPU’s serverless inference platform.<br>Examples include:<br>gliner-multi-pii-v1 for PII extraction and redaction<br>gliner2-base-v1 for entity extraction and structured classification<br>deberta-v3-small for zero-shot classification<br>zlm-v1-iab-classify-edge for IAB taxonomy tagging<br>LFM2.5–1.2B-Instruct for lightweight chat responses

Using a ZeroGPU model to redact PII, an example of a lightweight AI task that can be automatically routed to a SLM.Use large reasoning models where reasoning matters, and use smaller edge-optimized models for deterministic NLP tasks.<br>Instead of overspending on premium inference costs, developers can route those requests through more cost-efficient, specialized models.<br>Keep Claude focused on higher-context reasoning work. Let ZeroGPU do the rest.<br>Built for Faster, More Cost-Effective Inference<br>Claude Code plugin extends that positioning into developer tooling, where inference routing is becoming increasingly important as teams balance cost, latency, and model capability. As AI coding agents become more central to engineering workflows, infrastructure layers that decide which model should handle which task are becoming part of the stack itself.<br>Get started today:<br>📑Read the full Claude Code Plugin docs📑<br>Review the ⭐️zerogpu-router README on GitHub ⭐️<br>ℹ️ Learn more about ZeroGPU ℹ️<br>Press enter or click to view image in full size

ZeroGPU x Claude Code

Zerogpu

Claude Code

Anthropic Claude

Claude Plugin

Generative Ai Tools

Published in ZeroGPU<br>9 followers<br>·Last published 1 day ago

ZeroGPU is where engineers, founders, and builders learn how to deploy fast, lightweight AI using edge devices, SLMs, and distributed inference all with real examples from the ZeroGPU ecosystem.

Written by ZeroGPU<br>3 followers<br>·4 following

The compute efficient layer for AI inference.

Help

Status

About

Careers

Press

Blog

Store

Privacy

Rules

Terms

Text to speech

zerogpu claude code plugin models lightweight

Related Articles