Anthropic Measured the Token Problem. The Number Is 134,000.

Jun 16, 2026

An AI influencer declared it this week: MCP servers are dead. The fix is programmatic tool calling, and anyone still connecting MCP servers the traditional way is burning tokens for no reason.

The provocative framing gets clicks. But the actual problem it points to is real, and it deserves a cleaner explanation.

The Token Problem Is Not Hype

When an AI agent connects to MCP servers the standard way, every tool definition loads into context upfront. Every single one. The agent needs to “see” the tools before it can use them.

Anthropic’s own engineering team documented what this looks like at scale. A five-server setup with 58 tools consumes approximately 55,000 tokens before the conversation even begins. Add a tool-heavy server like Jira and you cross 100,000 tokens of overhead before the agent reads a single word of the user’s request. In internal testing at Anthropic, tool definitions consumed 134,000 tokens prior to any optimization.

At current API pricing, that overhead adds up fast. For teams running high-volume agentic workflows, it is not a rounding error. It is a line item.

The secondary problem is performance, not just cost. Standard tool calling works in a loop: agent requests Tool A, system runs it, result goes back into context, agent requests Tool B, and so on. Each round trip burns a full inference pass. Intermediate results accumulate in the context window whether or not they are useful. By the time the agent completes a multi-step task, you have padded the context with noise.

And it is not just tokens. The most common failures in standard tool use are wrong tool selection and incorrect parameters, particularly when tools have similar names. Loading 134,000 tokens worth of tool definitions into context does not help the agent pick the right one. It makes the problem worse.

What Programmatic Tool Calling Actually Does

Anthropic released three features in November 2025 to address this directly: a Tool Search Tool, Programmatic Tool Calling, and Tool Use Examples.

The Tool Search Tool changes when tool definitions enter context. Instead of loading everything upfront, the agent discovers tools on-demand as they become relevant to the task. Anthropic measured the difference: the traditional approach consumes roughly 77,000 tokens before any work begins. With on-demand discovery, that drops to 8,700 tokens. An 85% reduction. More importantly, accuracy on MCP evaluations improved from 49% to 74% on Opus 4 with Tool Search enabled. Better token efficiency and better task performance at the same time.

Programmatic Tool Calling addresses the loop problem. Instead of giving the agent a list of raw tools and letting it call them one at a time, the agent writes code that orchestrates multiple tool calls, processes the outputs, and controls what flows back into context. Intermediate results stay inside the execution environment. Only what the agent decides is relevant makes it into the context window. Anthropic built this into Claude for Excel specifically to handle spreadsheets with thousands of rows without overloading context.

What This Does Not Mean

MCP is not dead. That framing misses the point entirely.

MCP is infrastructure. It is the standardized protocol that lets agents discover and connect to external tools and data sources without custom integrations for every combination. The industry converged on it fast. By late 2025 the MCP SDK had tens of millions of monthly downloads. OpenAI, Google, Microsoft, and Anthropic all signed on to shared governance under the Linux Foundation. That does not go away.

The problem was never MCP itself. The problem is a specific implementation pattern: loading every tool definition into context at session start, calling tools one at a time in a loop, and letting intermediate results stack up in context whether or not they matter.

Programmatic tool calling and on-demand discovery are smarter ways to use MCP. Not replacements for it.

What This Means for Operators

If you are building or evaluating AI agent deployments for your business, three things matter here.

First, token overhead is a cost driver most teams have not measured. If your agents connect to multiple MCP servers, pull the actual token counts on tool definitions and intermediate results. The number is likely higher than you expect. Anthropic’s benchmark of 134,000 tokens was their own internal system before optimization. Yours may not be far behind.

Second, the architecture of how your agent calls tools affects both cost and reliability. The accuracy improvement from 49% to 74% on tool selection is not a marginal gain. It is the difference between an agent that works and one that does not. If your agents are failing on multi-tool workflows, the fix may not be a better prompt. It may be a better tool architecture.

Third, these capabilities are available now. Anthropic released Tool Search and Programmatic Tool Calling in November 2025 as beta features on the Claude Developer Platform. If your current stack predates that release, it is worth a review.

MCP is the foundation. How you build on top of it determines whether your agents are efficient or expensive. That distinction is where operator-level decisions actually matter.

six50 solutions works with founder-led and PE-backed businesses on AI strategy and implementation. More at six50.io.

six50 Solutions

Discussion about this post

Ready for more?