January 16, 2026 · Kevin Swiber · 10 min read

MCP Tool Schema Bloat: The Hidden Token Tax (and How to Fix It)

AI MCP

On this page

Your MCP tool descriptions are eating your context window.

I’ve been reviewing MCP implementations, and the same pattern keeps appearing: verbose tool schemas that burn thousands of tokens before the agent does any actual work. In a world where context is your scarcest resource, this matters more than most teams realize.

A recent proposal in the MCP repo measured a MySQL server with 106 tools: 207KB of schema data, roughly 54,600 tokens, on every initialization. Even when the model only needs 2-3 tools.

The good news: this problem is getting serious attention at multiple layers of the stack.

Understanding the Layers

Token efficiency isn’t a single problem. It’s three problems at three architectural layers:

Server-side — How verbose are your tool schemas?
Protocol-level — Does MCP support lazy or filtered discovery?
Host-side — Does the MCP host send every tool to the LLM?

That last one is crucial and often overlooked. The MCP host (Claude Desktop, your custom integration, etc.) doesn’t have to forward every discovered tool to the model. It can implement its own filtering, search, or progressive disclosure before anything hits the context window.

Claude Code is now rolling out MCP Tool Search, as of v2.1.7, which automatically triggers when your MCP tool descriptions would consume more than 10% of context. Instead of preloading all tools, Claude Code loads them via search on demand. This directly addresses one of the most-requested features on GitHub: users were documenting setups with 7+ MCP servers consuming 67k+ tokens.

Anthropic’s engineering post on advanced tool use describes the underlying pattern. Their Tool Search Tool lets the agent query for relevant tools rather than receiving everything upfront. They report 85% token reduction for large tool libraries, with tool definitions dropping from 10K+ tokens to around 3K per request.

Let’s look at what’s happening at each layer.

Server-Side: Description Economy

Every word in your tool descriptions should earn its place.

Verbose vs. Concise Descriptions

Here’s a common pattern I see:

{
  "name": "search_files",
  "description": "This tool allows you to search for files in the filesystem by providing a glob pattern. It will return a list of all files that match the pattern. You can use standard glob syntax including wildcards like * and ** for recursive matching."
}

Compare that to:

{
  "name": "search_files",
  "description": "Search files by glob pattern. Returns matching paths."
}

The agent doesn’t need a tutorial. It needs enough context to decide when to use the tool and what parameters to provide. Everything else is wasted tokens.

Naming Matters for Discovery

Tool names carry semantic weight, especially when hosts implement search-based discovery. search_customer_orders beats query_db_orders because the semantics are in the name, not buried in documentation. This helps both humans reading manifests and embedding-based search systems matching intent to tools.

Server Instructions Become Critical

With tool search enabled, your MCP server’s instructions field becomes much more important. It helps the host understand when to search for your tools in the first place, similar to how skill descriptions work in Agent Skills.

If you’re building an MCP server, make sure your server-level instructions clearly describe what domain your tools cover and when they’d be useful. This metadata guides the search process before any individual tool descriptions are even considered.

Schema Redundancy

SEP-1576 from Huawei researchers analyzed the official GitHub MCP server’s 60 tools:

60% of tools share an identical owner field definition
65% share an identical repo field definition

That’s significant redundancy. The proposal suggests using JSON Schema’s $ref references for deduplication:

{
  "$defs": {
    "owner": {
      "type": "string",
      "description": "Repository owner"
    },
    "repo": {
      "type": "string",
      "description": "Repository name"
    }
  },
  "tools": [
    {
      "name": "get_issues",
      "inputSchema": {
        "properties": {
          "owner": { "$ref": "#/$defs/owner" },
          "repo": { "$ref": "#/$defs/repo" }
        }
      }
    }
  ]
}

This requires LLMs to parse JSON references, which modern models handle well. The token savings for servers with shared parameter patterns could be substantial.

Schema Optimization Strategies

Beyond descriptions, the schema structure itself affects token count:

Flatten where possible — Nested objects add structural tokens. If a flat parameter list works, use it.
Defaults over documentation — Instead of describing what happens when a parameter is omitted, set a sensible default in the schema.
Enum over string with description — "format": {"enum": ["json", "csv"]} gives the agent exactly what it needs without explanation.
Required only when required — Every optional parameter with a description adds tokens the agent may never use.
Tool use examples for complex cases — Anthropic’s data shows accuracy jumping from 72% to 90% on complex parameter handling when concrete examples are provided. A single good example can replace paragraphs of documentation.

Protocol-Level: What’s In Flight

The MCP community is actively working on several approaches to address token efficiency at the protocol level.

Lazy Tool Hydration

Issue #1978 proposes two protocol changes:

A minimal flag for tools/list that returns only tool names, categories, and summaries (~5K tokens instead of full schemas)
A new tools/get_schema method to fetch full schemas on-demand for specific tools (~400 tokens each)

The proposal includes a reference implementation showing 91% token savings (54,604 → 4,899 tokens for 106 tools).

Progressive Disclosure

PR #1928 takes this further with standardized <library>.searchTools and <library>.runSandbox tools. The pattern:

Agent searches for relevant tools by intent
Agent composes operations in a sandbox
Intermediate results stay in the sandbox, not the context

The proposal claims up to 98.7% token reduction by keeping intermediate results out of context entirely.

Scope-Filtered Discovery

SEP-1881 approaches the problem from authorization: only return tools the current user can actually invoke.

This prevents agents from:

Seeing tools they can never use
Planning with inaccessible capabilities
Leaking information about privileged operations

The filtering happens based on OAuth scopes in the access token, aligning MCP discovery with standard authorization patterns.

Embedding-Based Tool Selection

SEP-1576 also proposes client-side embedding similarity matching:

The LLM generates intent from user semantics
The host creates embeddings for that intent
Tool descriptions are also embedded (can be pre-computed)
Similarity matching returns top-k most relevant tools

This keeps the full tool catalog accessible while only forwarding relevant tools to the model. The filtering happens at the orchestration layer without requiring protocol changes.

Gateways and Proxies: The Ecosystem Response

The market isn’t waiting for protocol changes. MCP gateway and proxy companies are already shipping solutions at both ends of the stack.

Server-Side Gateways

Gateways aggregate multiple MCP servers behind a unified interface. Some collapse hundreds of tools into just two operations:

search — Find relevant tools by intent
execute — Run a specific tool with parameters

The gateway handles tool discovery, selection, and routing internally. The host sees a minimal surface area regardless of how many servers sit behind it.

This pattern mirrors Anthropic’s Tool Search Tool but implements it at the infrastructure layer rather than the host layer.

Host-Side Proxies

Proxy layers intercept tool manifests before they reach the model. They can:

Apply semantic filtering based on conversation context
Enforce token budgets per request
Implement org-specific policies about tool exposure
Cache and deduplicate across multiple MCP connections

This is healthy ecosystem innovation. The token efficiency problem is real, the stakes are high, and teams are building solutions at every layer. If you’re hitting context limits with MCP tools, there’s a good chance someone in the community has already solved your specific variant.

Host-Side Techniques

Even without a gateway or protocol changes, your MCP host can implement tool filtering today.

Semantic Search

Embed tool descriptions once at startup. When a user request comes in, embed the intent and find the k most similar tools. Forward only those to the model.

This is exactly what SEP-1576 proposes, and it’s implementable today without any spec changes.

Category-Based Loading

Structure your tools into categories. Load core tools always; expose category-specific tools when the conversation indicates relevance.

For example, a database MCP server might always expose query and list_tables, but only load schema modification tools when the user explicitly mentions migrations.

Usage-Based Prioritization

Track which tools get used most frequently. Always load high-frequency tools; defer rarely-used tools behind a search interface or on-demand loading.

This adapts to actual usage patterns rather than assumptions about what might be needed.

Agent Skills: Progressive Disclosure in Practice

There’s already a shipping example of progressive disclosure for agent capabilities: Agent Skills.

Skills are folders of instructions, scripts, and resources that agents discover and load on demand. In Claude Code, the implementation demonstrates the pattern clearly:

At startup: Only a 1024-character description loads for each skill
On activation: The full SKILL.md content loads when the skill is relevant
Supporting files: Load via markdown links only when referenced
Scripts: Execute without their code ever entering context

This is progressive disclosure implemented at the knowledge layer, complementing MCP’s tool layer:

Layer	What It Provides	Example
MCP Tools	Capabilities (what tools exist)	Database queries, file operations
Agent Skills	Guidance (how to use them)	Query patterns, safety constraints

Together, they let agents scale to complex domains without front-loading everything into context.

The Agent Skills format has become an open standard adopted by:

Claude Code
Cursor
VS Code
Gemini CLI
OpenAI Codex
And others

It’s proof that the industry is converging on lazy-loading patterns, whether at the protocol, host, or knowledge layer.

Measuring Token Overhead

Token cost is invisible until you track it.

What to Instrument

Add monitoring to see:

Tokens per tool manifest at each layer
- Server → Host: What the MCP server sends
- Host → Model: What actually reaches the LLM
- The gap between these is your filtering efficiency
Tool usage frequency vs. description length
- Find verbose tools that rarely get used
- These are your highest-ROI optimization targets
Cache hit rates (if implementing lazy loading)
- How often do you fetch full schemas?
- Are the same tools requested repeatedly?

Where to Start

You’ll likely find a Pareto pattern: a few verbose tools that rarely get used are consuming disproportionate context. That’s where optimization pays off fastest.

The Lazy Tool Hydration reference implementation includes instrumentation you can adapt for your own measurements.

Three Things to Remember

If you take nothing else from this post:

The host layer is your lever, particularly if you’re building your own agent. You don’t have to wait for protocol changes. Your MCP host can filter, search, and prioritize tools before they ever reach the model. Claude Code is already doing this with MCP Tool Search.
Every word should earn its place. Concise descriptions, semantic naming, and clear server instructions help both token budgets and tool discovery accuracy.
You’re not alone. Gateways, proxies, and patterns like Agent Skills already exist. If you’re hitting context limits, someone in the ecosystem has likely solved your variant.

What’s Next

The MCP community is actively iterating on these problems. Key proposals to watch:

If you’re building MCP servers or hosts, consider contributing to these discussions. The decisions being made now will shape how the ecosystem handles scale.

Are you optimizing tool efficiency at the server, protocol, or host layer? I’d love to hear which approaches have worked for you. Reach out on LinkedIn or open an issue on the proposals above.