4 rules to build an efficient MCP server · Bump.sh
Skip to content
Skip to post
Skip to footer
Sign up
4 rules to build an efficient MCP server
Tech
06/23/2026
Sébastien Charrier
8 minutes read
In this article
">
MCP is a UI: design it like a UI
">
Limit the number of tools, and optimize them
">
Instructions are context
">
Filter the outputs
">
Start building efficient MCP servers
In this article
">
MCP is a UI: design it like a UI
">
Limit the number of tools, and optimize them
">
Instructions are context
">
Filter the outputs
">
Start building efficient MCP servers
Share this article
Book your demo now
Schedule a call with our team
Over the past few months, I’ve seen more and more posts claiming that MCP is dead, with many developers advocating for the Skills + CLI combo instead. This is a misunderstanding. The Skills + CLI approach is fantastic for developers. But when it comes to exposing your product capabilities to end users through AI assistants, MCP remains the best option.
Yet, it has earned a bad reputation for one particular reason: context bloat.
Every MCP server adds information to the model’s context before it’s even used. A Claude user measured that just seven MCP servers consumed more than 67,000 tokens of tool definitions before the first prompt was even entered. The GitHub MCP server alone accounted for nearly 18,000 tokens across only 27 tools.
But let’s be clear: the problem isn’t MCP. It’s how we build MCP servers .
Context bloat isn’t an unavoidable consequence of MCP. It’s mostly the result of design decisions: exposing too many tools, writing verbose descriptions, returning oversized payloads, or simply treating an MCP server like an API instead of what it really is: a user interface for an LLM .
By carefully designing your server, you can dramatically reduce its token footprint while improving reliability, usability, and the overall experience for both users and AI agents.
In this article, I’ll share four practical rules we’ve learned while building MCP servers that stay lightweight, reduce context usage, and help LLMs consistently make better decisions.
MCP is a UI: design it like a UI
In many discussions I’ve had with people exploring MCP, I noticed the same pattern over and over again: they were aiming to turn their existing API into an MCP server by simply exposing one tool per endpoint.
The result was always the same: it kind of worked. But that “kind of” is exactly why these projects stayed as POCs and never made it to production.
An API is not designed for end users. It is built to be the perfect abstraction layer between an application and a database (or any backend service). MCP is the opposite. MCP is built for end users. Whether the consumer is an agent or a human using a language model, they do not care about your data model or your internal systems: they simply want to accomplish a task using the tool you expose: your MCP server.
This means MCP is a user interface, and it should be designed as one.
Don’t start from your API and ask yourself what you should expose. Start from your users and ask:
What do they want to achieve?
How will they ask for it?
What information do they need to provide?
What information should the system return?
From these questions, the structure of your MCP server starts to appear:
User intentions become your tools.
The way users ask becomes your prompt examples.
Required information becomes your input schemas.
Returned information becomes your outputs.
Once you have that foundation, you can think about implementation details: which endpoints to call, which operations to perform, which validations and transformations are needed, and so on.
If this is your first MCP server, keep the scope small. Start with a read-only use case, build it, and test it with an LLM as quickly as possible.
This will help you iterate and learn how to structure your server so it works efficiently.
Limit the number of tools, and optimize them
Claiming that MCP is completely broken and wrecks the context window is plain nonsense (I’ll probably write another article about that), but it is still important to understand what happens under the hood.
For every interaction, models receive the list of available tools, their descriptions, and their input schemas. This is how they decide which tools to use and when.
Currently, this means that the more tools you expose, the larger the initial context becomes before the user even sends their first message. Even if they are simply asking ChatGPT to write a Father’s Day poem. (By the way, don’t do that. Ever. Please.) You should keep that tools number as low as possible.
This is no longer entirely true in Claude Code since January 2026. Lazy loading is now enabled by default: tool definitions are deferred rather than loaded into context upfront, and Claude searches for the relevant ones only when a task needs them (see the Claude Code MCP docs). It is likely that most AI...