Bad MCP design cost your Agent 5× more tokens

I recently did some tests on two MCPs with identical functionalities. Turns out one of them has really bad performance. So I wanna share those bad MCP design patterns that cause this.It all started when I wrote an MCP Server (MCP-A) for a to-do list app. Later, the app officially released its own MCP Server (MCP-B). Both MCPs have the same functionalities and hit the same backend API.The experiment is set up as follows:- Both MCP Servers connect to the same ToDo list account, and it will be reset after each test. - 40 test prompts to simulate typical use cases for these MCPs. - The test was conducted with the same model, system prompt, and Agent frameworkHere are the results:| Metric | MCP-A | MCP-B | Gap | | ------------------- | ----------- | ----------- | ----- | | Tool Desc Length | 11,464 | 3,682 | — | | Pass Rate | 36/40 (90%) | 36/40 (90%) | Same | | Total input tokens | 637,244 | 3,174,329 | 4.98× | | Total output tokens | 17,301 | 23,238 | 1.34× | | Total Agent steps | 122 | 157 | 1.29× | | Total time | 597s | 676s | 1.13× |---The result shows that MCP-B took 35 more ReAct loops to complete 40 test cases compared to MCP-A, which means 30% more output tokens. I examined the log and found that the root cause is poor query tool design.Take the `search tool` for example, its job is to find a todo item in the ToDo list. In MCP-B, this tool returns this:{ id : 6a1916b48f08cb3a4c857ed0 , title : buy some groceries , url : https://todo.example.com/tasks/6a1916b48f08cb3a4c857ed0 }But other CRUD operations require `project_id`, and `search_tool` doesn t return it. So the Agent has to call another tool `get_task_by_id`. On the other hand, MCP-A s query_tasks returns all necessary info to perform the next action in a single call:Task 1: ID: 6a19143e8f084a8c8101612f Title: buy some groceries Project ID: 6a1914378f084a8c810160a9 Start Date: 2025-07-19 10:00:00 Priority: Medium Status: Active Unfiltered API Data was dumped into context windowIf MCP returns pure API results to the Agent s context unprocessed, the Agent s context window will accumulate very fast.Take MCP-B s `create_task` tool, for example. Its job is to create a to-do item. This is what this tool returns:{ id : 6a180de78f086bdead0608be , projectId : inbox125587327 , ..... createdTime : 2026-05-28T09:41:59+0000 , modifiedTime : 2026-05-28T09:41:59+0000 , focusSummaries : null }These 600+ characters mean nothing to the Agent s task, but are still dumped into the Agent s context. On the other hand, MCP-A s create_tasks does a layer of filtering and formatting. This little tweak makes a huge difference in input token usage.Another issue is tool count. More tools mean a larger candidate set for the model to choose from, which directly increases decision difficulty. In MCP-A, 47 tools were compressed down to 14, covering the same functionality with fewer tools.---So here are my takeaways on good MCP tool design: - When designing a tool, think about what the Agent will need next, not just what it s asking for right now. Return enough context in the result so the Agent can take the next action without making another round-trip.- Too many tools will increase the model s decision burden. So it d be better to minimize the number of tools within an MCP. Make sure they don t overlap functionalities.- When your MCP returns data to the LLM, try to keep it LLM-friendly, which means readable. You can filter out unnecessary fields from the API response and format the data, rather than passing through raw JSON.---All the tests above were run by MCP-Eval. It s an MCP Server benchmarking tool. If you want to check your MCP s performance, feel free to check this out.https://github.com/Code-MonkeyZhang/mcp-eval

Bad MCP design cost your Agent 5× more tokens

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy