MCP Is Dead

nadis1 pts0 comments

MCP is dead | Quandri Engineering

Articles

MCP is dead

Chloe Kim<br>Backend Engineer @ Quandri

TL;DR : MCP eats context, has low reliability, and overlaps with existing CLI/API.

💡<br>Reference: MCP is dead. Long live the CLI<br>After reading the above article, we ran the experiments on our actual stack. This document covers the original argument, additional research, and our measurements.<br>📌<br>Update: Since these measurements were taken, Claude Code has rolled out Tool Search with Deferred Loading, which loads MCP tool schemas on-demand and reduces context usage by 85%+. The context bloat described in Problem 1 is largely addressed for users on current Claude Code versions. The performance, debugging, and architectural arguments below still apply.

What's Wrong with MCP<br>MCP (Model Context Protocol) connects LLMs to external tools (GitHub, Linear, Notion, Slack, etc.).<br>Since its launch in late 2024, it's been called "the USB-C of the AI ecosystem." But developers actually using it day-to-day are starting to think differently.<br>TL;DR : MCP eats context, has low reliability, and overlaps with existing CLI/API.

Problem 1: It Devours the Context Window<br>The context window is the LLM's desk. When you connect MCP servers, tool definitions alone take up a significant chunk of that desk.<br>Restaurant analogy:<br>You sit down and 10 menus (MCP tool definitions) are spread across the table<br>There's no room left for actual food (your work)<br>Every time you order, the menus have to be pulled out again<br>We extracted and measured the actual tool definitions from the MCP servers connected in our environment. With all 4 servers connected, 10.5% of the context window is consumed by tool definitions alone.

Measurement: Tool Definition Sizes (Quandri Stack)

MCP Server<br>Tools<br>Estimated Chars<br>Estimated Tokens

Linear<br>42<br>~51,229<br>~12,807

Notion<br>14<br>~16,156<br>~4,039

Slack<br>12<br>~15,168<br>~3,792

Postgres<br>~1,755<br>~438

Total<br>77<br>~84,308<br>~21,077

Context Window Usage (all servers combined)‍

Model<br>Context Window<br>Usage by Tool Definitions

Claude (200K)<br>200,000 tokens<br>10.5%

GPT-4o (128K)<br>128,000 tokens<br>16.5%

Linear alone accounts for over 12,800 tokens. That's 42 tool definitions always loaded, even if you only ever use get_issue and save_issue.

Biggest Tools by Size

Tool<br>Chars<br>~Tokens

linear/save_issue<br>2,479<br>~619

slack/search_public<br>1,614<br>~403

linear/list_issues<br>1,588<br>~397

notion/fetch<br>1,379<br>~344

slack/send_message<br>1,248<br>~312

Problem 2: Low Operational Reliability

Issue<br>Detail

Init failure, repeated re-auth<br>Requires starting and maintaining a separate process

Slower AI responses<br>External server round-trip on every tool call

Mid-session tool death<br>MCP server process crashes

Opaque permissions<br>Unclear what permissions each tool actually has

Performance is a known issue. The author of the original article benchmarked Jira MCP against its REST API directly and found MCP was 3x slower per call, and 9.4x slower on first call including initialization . This isn't Jira-specific, it's architectural: every MCP server adds a process layer between the LLM and the underlying API. The same overhead applies to the Linear, Notion, and Slack servers in our stack.

Problem 3: Overlaps with Existing CLI/API

Aspect<br>CLI / API<br>MCP

Human-machine parity<br>Same commands for humans and LLMs<br>Only exists inside LLM conversations

Composability<br>Pipes, jq, grep freely combinable<br>Locked to server return format

Debugging<br>Reproduce immediately in terminal<br>Only reproducible inside conversation context

Training data<br>Already learned from man pages, StackOverflow<br>Requires separate tool definitions

Install cost<br>Mostly already installed<br>Server setup, auth, process management needed

Token Comparison: MCP vs CLI for Linear Issue Lookup<br>How many tokens does it cost to look up the same Linear issue?<br>MCP consumes ~65x more tokens than the CLI approach.<br>[ CLI approach: ~200 tokens ]<br>curl -s -H "Authorization: Bearer $LINEAR_TOKEN" \<br>-H "Content-Type: application/json" \<br>-d '{"query":"{ issue(id: \"ISSUE-ID\") { title state { name } assignee { name } } }"}' \<br>https://api.linear.app/graphql

-> Prompt (curl command): ~50 tokens<br>-> Response: ~150 tokens

[ MCP approach: ~12,957 tokens ]<br>-> Tool definitions (always loaded): ~12,807 tokens (42 tools)<br>-> Tool call + response: ~150 tokens‍

What Are the Alternatives?<br>Alternative 1: CLI-First Strategy<br>Provide CLI -> API -> docs, in that order. LLMs already learned from man pages and StackOverflow.<br>Using existing CLI directly:<br>No context wasted on tool definitions<br>Same interface for humans and AI, easy to debug<br>Freely composable with pipelines<br>Alternative 2: Skills Pattern<br>If MCP is "spreading all menus on the table upfront", Skills is "asking the librarian for only the book you need ".

Aspect<br>MCP<br>Skills

Loading time<br>All tool definitions loaded on connect<br>Only loaded when needed

Context consumption<br>Always occupied<br>Only when in use

Scalability<br>Context pressure grows with each server<br>Not proportional to skill count

The key is embedding...

tool context tokens definitions linear server

Related Articles