At the Ask 2026 conference on March 11, Perplexity CTO Denis Yarats dropped a statement that should have rattled every team building AI agent infrastructure: his company is moving away from MCP internally because tool descriptions consume 40–50% of available context windows before agents do any actual work.
Let that land for a second. Half the brain of your agent — gone — just from loading tool definitions.
This is not a theoretical concern from someone writing blog posts. This is the CTO of a company processing millions of AI-powered queries daily, saying out loud that the protocol everyone is adopting has a fundamental efficiency problem in production.
And he is not wrong. But the problem is not MCP itself. The problem is how we are building tools on top of it — especially database tools.
MCP’s Explosive Growth Hides a Scaling Problem
The numbers are hard to argue with. MCP crossed 97 million monthly SDK downloads in February 2026. Every major AI provider — Anthropic, OpenAI, Google, Microsoft, Amazon — now supports it. The Agentic AI Foundation under the Linux Foundation has taken governance, giving it the same institutional legitimacy as Kubernetes or GraphQL.
On March 9, the MCP project published its 2026 roadmap with four focus areas: scaling Streamable HTTP transport for horizontal deployments, closing lifecycle gaps in the Tasks primitive, building enterprise readiness features around audit trails and SSO, and publishing a standard metadata format for registry-based server discovery.
All of this is real progress. MCP has won the protocol war for connecting AI agents to tools. But winning adoption and scaling to production are different problems. And the context window tax is the first major scaling problem that adoption success has exposed.
The Context Window Tax, Explained
Every MCP server exposes a set of tools through tools/list. Each tool includes a name, description, and an inputSchema — a JSON Schema object describing every parameter the tool accepts.
When an AI agent connects to an MCP server, the client fetches that tool list and injects the definitions into the model’s context window. This is how the model knows what tools are available and how to call them. It is elegant, flexible, and completely invisible to the end user.
It is also expensive. Context window tokens are the scarcest resource in any LLM-powered system. Every token spent on tool definitions is a token not available for user queries, retrieved context, conversation history, or chain-of-thought reasoning.
Here is what a single database table tool definition might look like in MCP:
{
"name": "query_orders",
"description": "Query the orders table. Supports filtering by customer_id, status, created_at, updated_at, total_amount, shipping_address, billing_address, payment_method, tracking_number, and notes. Returns paginated results.",
"inputSchema": {
"type": "object",
"properties": {
"filter": {
"type": "string",
"description": "Filter expression. Supported operators: eq, neq, gt, gte, lt, lte, like, in, is_null. Example: customer_id=eq.42&status=eq.shipped"
},
"select": {
"type": "string",
"description": "Comma-separated list of columns to return. Default: all columns."
},
"order": {
"type": "string",
"description": "Column to sort by, with optional .asc or .desc suffix. Example: created_at.desc"
},
"limit": {
"type": "integer",
"description": "Maximum number of rows to return. Default: 100."
},
"offset": {
"type": "integer",
"description": "Number of rows to skip for pagination. Default: 0."
}
}
}
}
That is roughly 200 tokens for a single table. A production database with 50 tables means 10,000 tokens — just for tool definitions. A database with 200 tables? You are looking at 40,000 tokens before the agent processes a single user request.
Claude 4.6 Opus has a million-token context window (in beta). GPT-4.5 has 128K. Most production deployments use models with 128K or 200K context limits. Even in the best case, 40,000 tokens of tool definitions is 20–30% of the working context. In the common case, it is devastating.
And those numbers assume clean, minimal tool definitions. In practice, many database MCP servers include verbose descriptions, examples in every field, enum lists for every column type, and nested schema objects for complex filters. It adds up fast.
Why Database Tools Are the Worst Offenders
Not all MCP tools are created equal. A tool that sends a Slack message has a fixed, small schema: channel, message, maybe a thread ID. A tool that queries a GitHub repository is similarly bounded.
Database tools are different. Their schemas scale with the complexity of the underlying data. Every table is a potential tool. Every column is a parameter or filter option. Every relationship is a join path that the tool description might need to express.
This is the fundamental tension: the more useful a database MCP server tries to be — the more tables it exposes, the more filter operators it supports, the more column metadata it includes — the more context window it consumes. Usefulness and efficiency are directly opposed.
Most existing database MCP servers resolve this tension by choosing usefulness, dumping the full schema into tool definitions and hoping the context window is big enough. That worked when MCP was a demo-day curiosity. It does not work when agents need to chain multiple tool calls across multiple servers within a single conversation.
The Math Gets Worse With Multi-Server Agents
The Perplexity scenario highlights why this matters. A real production agent does not connect to a single MCP server. It connects to several: a database server, a search server, a file system server, an observability server, a communication server.
If each server contributes 5,000–10,000 tokens of tool definitions, the combined tax across five servers is 25,000–50,000 tokens. That is where Yarats’s 40–50% figure comes from. It is not one bloated server — it is the aggregate cost of connecting an agent to real infrastructure.
Datadog just launched their own MCP server on March 9, giving AI agents real-time access to logs, metrics, and traces. That is unambiguously useful. But stack it alongside a database server, a code search server, and a deployment server, and you are playing context window Tetris with your agent’s cognitive capacity.
Three Strategies for Lean Database Tooling
The solution is not to abandon MCP. The protocol is fine. The solution is to build database tools that are deliberate about context efficiency.
1. Expose Navigation Tools, Not Table Tools
Instead of registering one tool per table — which scales linearly with database size — expose a small set of navigation tools that let the agent discover and query tables dynamically.
Faucet takes this approach with its MCP server:
# Start Faucet with MCP enabled
faucet serve --mcp
# The MCP server exposes a fixed set of tools regardless of database size:
# - list_tables: Returns available tables and their descriptions
# - describe_table: Returns column names, types, and relationships for one table
# - query_table: Executes a filtered, paginated read against any table
# - create_record: Inserts a row into any table
# - update_record: Updates rows matching a filter
# - delete_record: Deletes rows matching a filter
Six tools. Six tool definitions in the context window. Whether your database has 5 tables or 500, the context window cost is identical. The agent discovers the schema it needs at runtime through list_tables and describe_table calls, rather than having every schema pre-loaded.
This is the same pattern that made Unix powerful: small, composable tools rather than monolithic commands that try to encode every option upfront.
2. Minimize Description Verbosity
Every word in a tool description costs tokens. Compare these two approaches:
Verbose (common in many MCP servers):
{
"description": "Query the specified database table using filter expressions. This tool supports a wide range of SQL-compatible filter operations including equality checks (eq), inequality checks (neq), greater than (gt), greater than or equal to (gte), less than (lt), less than or equal to (lte), pattern matching (like), set membership (in), and null checks (is_null). Results are returned as a JSON array of objects. Use the 'select' parameter to specify which columns to return, 'order' to sort results, and 'limit'/'offset' for pagination. Default limit is 100 rows."
}
Lean (what Faucet generates):
{
"description": "Query a table. Filters: eq, neq, gt, gte, lt, lte, like, in, is_null. Returns JSON array.",
"inputSchema": {
"type": "object",
"properties": {
"table": { "type": "string" },
"filter": { "type": "string" },
"select": { "type": "string" },
"order": { "type": "string" },
"limit": { "type": "integer" },
"offset": { "type": "integer" }
},
"required": ["table"]
}
}
The verbose version is roughly 120 tokens. The lean version is roughly 60 tokens. That is a 2x reduction per tool, and it compounds across every tool in the server.
Modern LLMs do not need hand-holding in tool descriptions. Claude, GPT-4.5, and Gemini all understand what “filter,” “select,” and “order” mean in a database context. Over-explaining wastes context on information the model already has in its weights.
3. Use Lazy Schema Loading
The MCP 2026 roadmap mentions a standard metadata format for server discovery. This hints at a future where clients can selectively load tools rather than fetching everything upfront.
You do not have to wait for the spec to implement this pattern. A well-designed database MCP server can expose a meta-tool — something like enable_table — that dynamically registers table-specific tools only when the agent needs them:
Agent: "I need to work with the orders and customers tables"
→ calls enable_table("orders")
→ calls enable_table("customers")
→ Now has typed CRUD tools for just those two tables
→ Context cost: proportional to 2 tables, not 200
This is the on-demand loading pattern that every modern frontend framework uses for code splitting. The same principle applies to tool definitions: load what you need, when you need it.
The Real Cost Is Not Tokens — It Is Quality
The most insidious effect of the context window tax is not that it limits what fits in the window. It is that it degrades agent performance on everything else.
LLM attention is not uniform across the context window. Research consistently shows that models perform worse on information in the middle of long contexts — the “lost in the middle” phenomenon. When 30–40% of the context is tool definitions, the remaining space for actual user intent, retrieved data, and conversation history gets compressed into a zone where the model pays less attention.
This means a bloated tool definition does not just waste space. It actively makes the agent worse at understanding what the user asked for, worse at interpreting query results, and worse at maintaining coherent multi-turn conversations.
For database operations — where precision matters, where a wrong filter or a misunderstood column type can return garbage data or worse — this degradation is particularly dangerous.
What the Industry Is Doing About It
The MCP community is not ignoring this. The 2026 roadmap’s metadata format work is directly aimed at enabling smarter tool loading. SurePath AI’s MCP Policy Controls, launched March 12, give security teams the ability to control which tools are exposed to which agents — which has the side effect of reducing per-agent tool counts.
Qualys published a detailed analysis on March 19 about MCP servers becoming “shadow IT for AI,” raising concerns about ungoverned tool proliferation. Their recommendation: audit and prune MCP server deployments to reduce both security surface area and context overhead.
Google’s developer guide to AI agent protocols explicitly recommends “keeping tool interfaces focused and minimal” as a best practice for MCP server authors.
But the most practical solution is architectural. Do not build database MCP servers that mirror the database schema one-to-one. Build servers that give agents the primitives to explore and query databases efficiently, with a fixed context footprint that does not grow with database complexity.
Measuring Your Context Window Tax
Here is a practical exercise. Take your current MCP setup and measure it:
# Connect to your MCP server and dump the tool list
# Count the tokens in the full tools/list response
# With Faucet, you can check this directly:
faucet serve --mcp --port 8080 &
curl -s http://localhost:8080/mcp/tools/list | wc -c
# Divide by ~4 for approximate token count
If your tool definitions exceed 5,000 tokens, you have a problem worth solving. If they exceed 20,000, your agents are operating with a significant cognitive handicap on every single request.
Getting Started
Faucet is designed from the ground up for lean context footprints. Six fixed tools, lazy schema loading, minimal descriptions. Whether your database has 10 tables or 1,000, the context window cost stays flat.
Install Faucet in 30 seconds:
curl -fsSL https://get.faucet.dev | sh
Point it at your database and start an MCP server:
faucet serve --dsn "postgres://user:pass@localhost/mydb" --mcp
Connect it to Claude Code, Cursor, or any MCP-compatible client:
claude mcp add faucet -- faucet serve --dsn "postgres://..." --mcp --stdio
Your agent gets full database access. Your context window stays intact.
The MCP protocol is not the problem. Bloated tool definitions are. As AI agents move from demos to production — and the 97 million monthly SDK downloads say they already are — the teams that treat context efficiency as a first-class design constraint will build agents that actually work. The ones that don’t will wonder why their agents keep losing the plot halfway through a conversation.