How Tool Search Works and How It Saves Tokens - Mykhailo Chalyi<br>Let’s look into how tool search works, and why it saves tokens.
The Problem: Schemas Are Expensive
Every tool you give an agent is not just a name. It is a full contract: name, description, and a JSON Schema for parameters with types, enums, required fields, and usually a paragraph of docs so the model knows when to use it.
One decent tool definition is easily 500 to 1500 tokens. Multiply by 200 and you are burning tens of thousands of tokens on a menu the agent mostly will not order from.
And it is not only tokens. There is second, sneakier cost. When the model has 200 tools in front of it, picking the right one gets harder. Selection accuracy drops. More tools means more chances to grab the wrong one, or to hallucinate parameters from a tool that looks similar.
So you pay twice: once in money and latency for the tokens, once in quality for the confusion.
The Idea: A Tool That Finds Tools
Tool search flips the default. Instead of loading every schema upfront, the harness loads a small index — just the names, and maybe one line of metadata each — plus one extra tool whose only job is to find tools.
The agent does not see 200 schemas. It sees 200 names and a tool_search tool. When it actually needs to start a resource or open a pull request, it searches, gets back the full schema for just that tool, and only then calls it.
You can think of it as a tool whose job is the toolbox itself. A tool inside the toolbox that hands you other tools on request. Brand inside a brand, if you like.
In this very environment it looks like this. Deferred tools show up by name only:
The following deferred tools are now available via ToolSearch.<br>Their schemas are NOT loaded — calling them directly will fail.<br>WebFetch, WebSearch, NotebookEdit, Monitor, ...<br>The names are there. The schemas are not. If I call WebFetch right now, I get an InputValidationError, because there is no parameter schema yet to validate against. I have to fetch it first.
How It Saves Tokens
The trick is that names are cheap and schemas are not .
A name plus one line of metadata is maybe 10-15 tokens. A full schema is 500-1500. So the index of 200 tools costs about the same as two or three full schemas. You keep the whole menu in context for almost nothing, and you only pay the real price for the handful of tools you actually load.
The math is boring but it works. 200 schemas upfront might be 60k tokens. The same catalog as names plus a search tool is maybe 2-3k. You pull two schemas during the task and add another 2k. You spent 5k instead of 60k, and the agent saw a shorter, cleaner menu the whole time.
That is the 80% of the value. Everything else is tuning.
Two Kinds of Tool Search
Here is where it gets interesting. There are really two layers, and they do not compete — they stack.
Generic tool search
This is the one built into the harness. It indexes everything — built-in tools, every connected MCP server, all of it — into one searchable catalog. You do not configure it per provider. New MCP server connects, its tools just show up in the same index next to everything else.
Generic search usually gives you two query styles:
select:Read,Edit,Grep # I know the exact names, just load them<br>notebook jupyter # keyword search, give me best matches<br>select: is direct loading by name. Use it when you already know what you want — the agent saw the name in the index, no need to “search,” just fetch the schema. Keyword search is for when you know the capability but not the exact tool: “something that sends slack messages.”
Provider-specific tool search
This is the second layer, and it is easy to miss. A provider can ship its own search tool, scoped to its own domain. GitHub MCP has search_code and search_issues. A docs MCP server might expose a grep over its files. Figma has search_design_system.
These are not searching for tools. They are searching inside the provider’s data. But — and this is the neat part — from the agent’s point of view they are just more tools in the generic index. So the agent first uses generic tool search to find search_code, loads it, and then uses search_code to find actual code.
Search to find the search. It nests, and that is fine. Each layer does one job.
You Do Not Have to Build This
Good news. The model providers are shipping tool search as a native API feature, so in many cases you do not hand-roll the loop at all.
Anthropic has a Tool Search Tool. You mark tools with defer_loading: true. Those definitions get sent to the API but do not land in context — they are stripped from the rendered tools before the prompt cache key is even computed. Claude gets a small search tool, finds what it needs, and only then are the matched schemas appended. There are two flavors: regex search (tool_search_tool_regex_20251119) and BM25 search (tool_search_tool_bm25_20251119) for natural-language queries.
OpenAI has tool_search too. Same idea — load deferred...