MCP Tool Schema Bloat: The Hidden Token Tax (and How to Fix It)
On this page
Your MCP tool descriptions are eating your context window.
I’ve been reviewing MCP implementations, and the same pattern keeps appearing: verbose tool schemas that burn thousands of tokens before the agent does any actual work. In a world where context is your scarcest resource, this matters more than most teams realize.
A recent proposal in the MCP repo measured a MySQL server with 106 tools: 207KB of schema data, roughly 54,600 tokens, on every initialization. Even when the model only needs 2-3 tools.
The good news: this problem is getting serious attention at multiple layers of the stack.
Understanding the Layers
Token efficiency isn’t a single problem. It’s three problems at three architectural layers:
- Server-side — How verbose are your tool schemas?
- Protocol-level — Does MCP support lazy or filtered discovery?
- Host-side — Does the MCP host send every tool to the LLM?
That last one is crucial and often overlooked. The MCP host (Claude Desktop, your custom integration, etc.) doesn’t have to forward every discovered tool to the model. It can implement its own filtering, search, or progressive disclosure before anything hits the context window.
Claude Code is now rolling out MCP Tool Search, as of v2.1.7, which automatically triggers when your MCP tool descriptions would consume more than 10% of context. Instead of preloading all tools, Claude Code loads them via search on demand. This directly addresses one of the most-requested features on GitHub: users were documenting setups with 7+ MCP servers consuming 67k+ tokens.
Anthropic’s engineering post on advanced tool use describes the underlying pattern. Their Tool Search Tool lets the agent query for relevant tools rather than receiving everything upfront. They report 85% token reduction for large tool libraries, with tool definitions dropping from 10K+ tokens to around 3K per request.
Let’s look at what’s happening at each layer.
Server-Side: Description Economy
Every word in your tool descriptions should earn its place.
Verbose vs. Concise Descriptions
Here’s a common pattern I see:
{
"name": "search_files",
"description": "This tool allows you to search for files in the filesystem by providing a glob pattern. It will return a list of all files that match the pattern. You can use standard glob syntax including wildcards like * and ** for recursive matching."
}
Compare that to:
{
"name": "search_files",
"description": "Search files by glob pattern. Returns matching paths."
}
The agent doesn’t need a tutorial. It needs enough context to decide when to use the tool and what parameters to provide. Everything else is wasted tokens.
Naming Matters for Discovery
Tool names carry semantic weight, especially when hosts implement search-based discovery. search_customer_orders beats query_db_orders because the semantics are in the name, not buried in documentation. This helps both humans reading manifests and embedding-based search systems matching intent to tools.
Server Instructions Become Critical
With tool search enabled, your MCP server’s instructions field becomes much more important. It helps the host understand when to search for your tools in the first place, similar to how skill descriptions work in Agent Skills.
If you’re building an MCP server, make sure your server-level instructions clearly describe what domain your tools cover and when they’d be useful. This metadata guides the search process before any individual tool descriptions are even considered.
Schema Redundancy
SEP-1576 from Huawei researchers analyzed the official GitHub MCP server’s 60 tools:
- 60% of tools share an identical
ownerfield definition - 65% share an identical
repofield definition
That’s significant redundancy. The proposal suggests using JSON Schema’s $ref references for deduplication:
{
"$defs": {
"owner": {
"type": "string",
"description": "Repository owner"
},
"repo": {
"type": "string",
"description": "Repository name"
}
},
"tools": [
{
"name": "get_issues",
"inputSchema": {
"properties": {
"owner": { "$ref": "#/$defs/owner" },
"repo": { "$ref": "#/$defs/repo" }
}
}
}
]
}
This requires LLMs to parse JSON references, which modern models handle well. The token savings for servers with shared parameter patterns could be substantial.
Schema Optimization Strategies
Beyond descriptions, the schema structure itself affects token count:
- Flatten where possible — Nested objects add structural tokens. If a flat parameter list works, use it.
- Defaults over documentation — Instead of describing what happens when a parameter is omitted, set a sensible default in the schema.
- Enum over string with description —
"format": {"enum": ["json", "csv"]}gives the agent exactly what it needs without explanation. - Required only when required — Every optional parameter with a description adds tokens the agent may never use.
- Tool use examples for complex cases — Anthropic’s data shows accuracy jumping from 72% to 90% on complex parameter handling when concrete examples are provided. A single good example can replace paragraphs of documentation.
Protocol-Level: What’s In Flight
The MCP community is actively working on several approaches to address token efficiency at the protocol level.
Lazy Tool Hydration
Issue #1978 proposes two protocol changes:
- A
minimalflag fortools/listthat returns only tool names, categories, and summaries (~5K tokens instead of full schemas) - A new
tools/get_schemamethod to fetch full schemas on-demand for specific tools (~400 tokens each)
The proposal includes a reference implementation showing 91% token savings (54,604 → 4,899 tokens for 106 tools).
Progressive Disclosure
PR #1928 takes this further with standardized <library>.searchTools and <library>.runSandbox tools. The pattern:
- Agent searches for relevant tools by intent
- Agent composes operations in a sandbox
- Intermediate results stay in the sandbox, not the context
The proposal claims up to 98.7% token reduction by keeping intermediate results out of context entirely.
Scope-Filtered Discovery
SEP-1881 approaches the problem from authorization: only return tools the current user can actually invoke.
This prevents agents from:
- Seeing tools they can never use
- Planning with inaccessible capabilities
- Leaking information about privileged operations
The filtering happens based on OAuth scopes in the access token, aligning MCP discovery with standard authorization patterns.
Embedding-Based Tool Selection
SEP-1576 also proposes client-side embedding similarity matching:
- The LLM generates intent from user semantics
- The host creates embeddings for that intent
- Tool descriptions are also embedded (can be pre-computed)
- Similarity matching returns top-k most relevant tools
This keeps the full tool catalog accessible while only forwarding relevant tools to the model. The filtering happens at the orchestration layer without requiring protocol changes.
Gateways and Proxies: The Ecosystem Response
The market isn’t waiting for protocol changes. MCP gateway and proxy companies are already shipping solutions at both ends of the stack.
Server-Side Gateways
Gateways aggregate multiple MCP servers behind a unified interface. Some collapse hundreds of tools into just two operations:
- search — Find relevant tools by intent
- execute — Run a specific tool with parameters
The gateway handles tool discovery, selection, and routing internally. The host sees a minimal surface area regardless of how many servers sit behind it.
This pattern mirrors Anthropic’s Tool Search Tool but implements it at the infrastructure layer rather than the host layer.
Host-Side Proxies
Proxy layers intercept tool manifests before they reach the model. They can:
- Apply semantic filtering based on conversation context
- Enforce token budgets per request
- Implement org-specific policies about tool exposure
- Cache and deduplicate across multiple MCP connections
This is healthy ecosystem innovation. The token efficiency problem is real, the stakes are high, and teams are building solutions at every layer. If you’re hitting context limits with MCP tools, there’s a good chance someone in the community has already solved your specific variant.
Host-Side Techniques
Even without a gateway or protocol changes, your MCP host can implement tool filtering today.
Semantic Search
Embed tool descriptions once at startup. When a user request comes in, embed the intent and find the k most similar tools. Forward only those to the model.
This is exactly what SEP-1576 proposes, and it’s implementable today without any spec changes.
Category-Based Loading
Structure your tools into categories. Load core tools always; expose category-specific tools when the conversation indicates relevance.
For example, a database MCP server might always expose query and list_tables, but only load schema modification tools when the user explicitly mentions migrations.
Usage-Based Prioritization
Track which tools get used most frequently. Always load high-frequency tools; defer rarely-used tools behind a search interface or on-demand loading.
This adapts to actual usage patterns rather than assumptions about what might be needed.
Agent Skills: Progressive Disclosure in Practice
There’s already a shipping example of progressive disclosure for agent capabilities: Agent Skills.
Skills are folders of instructions, scripts, and resources that agents discover and load on demand. In Claude Code, the implementation demonstrates the pattern clearly:
- At startup: Only a 1024-character description loads for each skill
- On activation: The full
SKILL.mdcontent loads when the skill is relevant - Supporting files: Load via markdown links only when referenced
- Scripts: Execute without their code ever entering context
This is progressive disclosure implemented at the knowledge layer, complementing MCP’s tool layer:
| Layer | What It Provides | Example |
|---|---|---|
| MCP Tools | Capabilities (what tools exist) | Database queries, file operations |
| Agent Skills | Guidance (how to use them) | Query patterns, safety constraints |
Together, they let agents scale to complex domains without front-loading everything into context.
The Agent Skills format has become an open standard adopted by:
- Claude Code
- Cursor
- VS Code
- Gemini CLI
- OpenAI Codex
- And others
It’s proof that the industry is converging on lazy-loading patterns, whether at the protocol, host, or knowledge layer.
Measuring Token Overhead
Token cost is invisible until you track it.
What to Instrument
Add monitoring to see:
-
Tokens per tool manifest at each layer
- Server → Host: What the MCP server sends
- Host → Model: What actually reaches the LLM
- The gap between these is your filtering efficiency
-
Tool usage frequency vs. description length
- Find verbose tools that rarely get used
- These are your highest-ROI optimization targets
-
Cache hit rates (if implementing lazy loading)
- How often do you fetch full schemas?
- Are the same tools requested repeatedly?
Where to Start
You’ll likely find a Pareto pattern: a few verbose tools that rarely get used are consuming disproportionate context. That’s where optimization pays off fastest.
The Lazy Tool Hydration reference implementation includes instrumentation you can adapt for your own measurements.
Three Things to Remember
If you take nothing else from this post:
-
The host layer is your lever, particularly if you’re building your own agent. You don’t have to wait for protocol changes. Your MCP host can filter, search, and prioritize tools before they ever reach the model. Claude Code is already doing this with MCP Tool Search.
-
Every word should earn its place. Concise descriptions, semantic naming, and clear server instructions help both token budgets and tool discovery accuracy.
-
You’re not alone. Gateways, proxies, and patterns like Agent Skills already exist. If you’re hitting context limits, someone in the ecosystem has likely solved your variant.
What’s Next
The MCP community is actively iterating on these problems. Key proposals to watch:
- #1978 - Lazy Tool Hydration
- #1576 - Token Bloat Mitigation (SEP)
- #1881 - Scope-Filtered Discovery
- PR #1928 - Progressive Disclosure
If you’re building MCP servers or hosts, consider contributing to these discussions. The decisions being made now will shape how the ecosystem handles scale.
Are you optimizing tool efficiency at the server, protocol, or host layer? I’d love to hear which approaches have worked for you. Reach out on LinkedIn or open an issue on the proposals above.