Essay · 45 min read
MCP and the Road Not Taken
An Architect's Reading of the Model Context Protocol
An architectural critique read through the lens of REST and distributed systems design.
What MCP gets right, what it reinvents, and what it left on the table.
§1
1. The Good — What MCP Gets Right
A critique that does not acknowledge what works is not a critique. It is a complaint. Before examining where MCP diverges from convention, it is worth being specific about what it gets right, because the good parts are not incidental. They are structural, and the rest of this essay will argue that the protocol's weaknesses exist precisely because the good parts were coupled too tightly to transport decisions that did not earn the same quality.
The Problem Statement Is Real
LLM applications need a standard way to discover and invoke external capabilities. Before MCP, every AI application built its own ad-hoc integrations: custom function calling schemas, bespoke plugin APIs, wrapper scripts with no shared vocabulary. A Claude user could not use a tool built for ChatGPT. A VS Code extension could not share a debugger adapter with a Cursor plugin. The integrations were N-squared: every application had to wire up every data source independently.
MCP addressed a coordination failure, not a technology gap. The technology for tool calling existed. What did not exist was the agreement that there should be one way to describe a tool, one way to list available resources, one handshake protocol for capability negotiation. MCP's core contribution is the assertion that this problem deserves a standard solution, and the market has responded: as of early 2026, Claude, ChatGPT, VS Code, Cursor, and dozens of other clients support MCP, and the official registry lists hundreds of publicly available servers. That level of adoption in under a year is not a rounding error. It is proof that the problem was real and the timing was right.
JSON-RPC 2.0: A Defensible Message Format
JSON-RPC 2.0 is simple, widely implemented, and language-agnostic. It has three message types (request, response, notification), a straightforward error model, and no opinion about transport. It is the wire format, not the transport, and that separation of concerns is exactly right. A JSON-RPC message that travels over stdio is the same message that travels over HTTP or WebSocket. The framing changes; the message does not.
The alternative would have been gRPC, which gives you strong typing, code generation, and bidirectional streaming, but at the cost of a protobuf dependency, a build step, and a transport commitment (HTTP/2). For a protocol whose primary constituency is scripting-language developers writing tool integrations, JSON-RPC's zero-dependency, human-readable approach is the pragmatic choice. The spec did not need to solve serialization sophistication. It needed to solve agreement.
Capability Negotiation: The Right Instinct
MCP's initialization handshake is one of its strongest design decisions. Before any tools are called or resources read, the client and server exchange capabilities. The client declares what it supports. The server declares what it offers. Both sides operate within those bounds for the rest of the session.
This is the right instinct for two reasons. Architecturally, it prevents the "try and fail" pattern that plagues ad-hoc integrations: a client calls a method the server does not implement, gets an error, falls back. With capability negotiation, the client knows what is possible before it asks. Operationally, it creates a contract that is auditable. A session's behavior is bounded by what was declared at initialization, which means debugging, monitoring, and security review all have a starting point.
The spec does not go as far as it could here. Capabilities are declared at the transport level and do not generalize across bindings. A server that supports tools over HTTP may not support tools over MQTT, and the capability declaration does not distinguish. But the pattern is correct, and a transport-agnostic redesign would preserve it while giving each binding the vocabulary to express its own constraints.
stdio Transport: Unix Philosophy Done Right
The stdio transport is MCP at its most elegant. The client launches a server as a subprocess. The server reads JSON-RPC messages from stdin and writes responses to stdout. No network. No ports. No discovery. No authentication. No configuration. Process lifetime is session lifetime. When the client exits, the server exits. When the server crashes, the client knows immediately because the pipe closes.
This is Unix philosophy applied to an AI protocol, and it works for the same reason Unix pipes work: no ceremony, no setup, no indirection. A developer can write an MCP server in 50 lines of Python, test it from the command line by piping in a JSON message, and ship it without thinking about HTTP, TLS, or deployment topology. The friction reduction is immense, and it shows in the ecosystem: the majority of community MCP servers use stdio transport because it is the fastest path from idea to working integration.
The LSP Lineage: Standing on Shoulders That Worked
MCP takes explicit inspiration from the Language Server Protocol, which solved a similar problem: every IDE was building its own language support from scratch, and the solution was a standard protocol that separated language intelligence from editor implementation. LSP defined what a language server could do. Each editor wrote one LSP client. Each language wrote one LSP server. The N-squared problem became N+M.
LSP worked. It is one of the few protocol success stories in developer tooling from the last decade, and MCP's designers recognized that the pattern applied here too. The architectural insight is the same: separate the capability (what can be done) from the client (who is doing it). The structural difference is that LSP had one well-understood transport need (local process, stdio-ish) and one well-understood deployment model (1:1 client-server), while MCP needs to span local tools, remote APIs, and eventually distributed systems. LSP's pattern was right for LSP's problem. The question for MCP is whether LSP's transport choices are right for MCP's broader problem, and the rest of this essay will argue they are not.
Three Distinct Primitives
MCP defines three server-side primitives: tools, resources, and prompts. Each is genuinely different. Tools are executable functions with typed parameters. Resources are addressable data sources identified by URI. Prompts (which this essay will argue should become Skills) are reusable procedural knowledge. These are not three names for the same thing. They have different lifecycles, different trust requirements, and different interaction patterns.
The spec's decision to separate them, rather than collapsing everything into a single "capability" abstraction, is correct. Tools execute. Resources read. Skills compose. A protocol that treats "run this calculation" and "read this file" and "follow this workflow" as the same operation has lost information that matters to both the client and the human operating it.
The Market Got It Right First
Standard protocols succeed or fail on adoption, not architectural purity. USB-C is not the best possible connector design. HTTP/1.1 is not the best possible application protocol. They won because they were good enough, available at the right time, and backed by enough market momentum to cross the coordination threshold.
MCP has crossed that threshold. Anthropic, OpenAI, Microsoft, and GitHub are all backing it. The registry exists. The SDKs exist. Developers are shipping servers. The conversation has moved from "should we have a standard?" to "how should the standard evolve?" That transition is the most important thing MCP has gotten right, because a protocol that nobody uses is not a protocol. It is a PDF.
The architectural critique that follows is offered in that spirit. MCP's adoption proves the problem is real and the solution is viable. The question is whether the solution's transport layer will scale to the environments that adoption is already creating: MQTT sensors, gRPC microservices, WebSocket browsers, NATS event streams, and the rest of the real-world infrastructure that a "USB-C for AI" will eventually need to plug into.
§1.1
1.1 Dropping Prompts, Adding Skills
MCP defines three server primitives: Tools, Resources, and Prompts. The first two are well-grounded. The third reflects where the industry started, not where it's going. In an agentic world, prompts are giving way to skills, and the protocol should evolve with that trajectory. Here's the case for keeping Tools and Resources, dropping Prompts, and replacing them with Skills.
Tools — Actions with Side Effects
Tools are verbs. The LLM calls a tool to do something: invoke an API, execute code, send an email, debit an account. The defining property of a tool is that calling it twice may produce different results. This is the CQRS command side: non-idempotent, non-cacheable, strongly consistent. The LLM drives the action. No justification needed: this is the core of what a tool protocol must support.
Resources — Observations of Independent State
Resources are nouns that live on their own schedule. A database view, a set of log files, a live sensor feed: these exist and evolve whether or not the LLM is watching. The LLM can list what's available, read the current state, or subscribe to changes. The defining property of a resource is that the LLM is an observer, not a driver. If you model "read the current git diff" as a tool call, you've made the LLM the instigator of an observation. But the diff exists without the LLM. The subscription semantic ("tell me when this changes") is fundamentally different from "call this function." Resources earn their place because they model a different relationship between the LLM and the data.
The 2025-11-25 specification formalizes resource subscriptions with resources/subscribe, resources/unsubscribe, and notifications/resources/updated. It also introduces resources/templates/list for parameterized URI templates. Both additions reinforce the architectural point: resources are a distinct interaction pattern (observe and be notified of changes), not a variation on tool invocation.
Why Prompts Don't Earn Their Place
MCP's Prompts are spec'd as "templated messages and workflows," but in practice they are parameterized text templates: fill in the blanks, assemble some context, hand it to the LLM. This is decorative, not architectural. A prompt template is just data. It could be a Resource (the LLM reads the template, fills it in itself) or a Tool (the server renders the template and returns the text). Giving Prompts their own first-class resource type is like giving HTML templates their own HTTP method: it confuses the medium with the message.
Skills — Procedural Knowledge, Not Decorative Text
A skill is a richer construct. Where a prompt says "here's how to ask the question," a skill says "here's how to solve this class of problem." Skills encode procedural knowledge: which tools to call, in what order, with what validation, producing what kind of output. A skill for "debug a failing test" isn't a text template. It's a workflow: read the test output, identify the failing assertion, read the relevant source files, propose a fix, run the test again, iterate. The skill isn't the text; it's the strategy.
This distinction matters architecturally because skills are composed from tools and resources. A skill orchestrates. A prompt decorates. The protocol needs orchestration, not decoration.
The three-construct model: Tools (commands), Resources (queries), Skills (workflows). Each earns a distinct place because each has a distinct semantic:
- Tools: side-effecting, non-idempotent, non-cacheable (commands)
- Resources: side-effect-free, cacheable, subscribable (queries)
- Skills: composed, procedural, orchestrating tools and resources (workflows)
The rest of this document uses "Skills" where MCP says "Prompts," and treats Skills as the orchestration primitive that Prompts never were.
Skills: Authoring and Transport
The emerging industry direction, exemplified by the Agent Skills specification, defines how humans author skills: a directory containing SKILL.md (metadata and instructions), scripts/ (executable code), references/ (documentation), and assets/ (templates and resources). This is the authoring layer. It's what a skill looks like on disk.
But a protocol needs more than an authoring format. It needs to define how skills travel over the wire: how a client discovers, retrieves, and trusts the content. The Agent Skills specification defines the first of these (the directory format). The protocol must define the rest.
The three protocol operations for Skills:
skills/list: what skills are available? Returns metadata: name, description, version. No content, no code, no trust decision required.
skills/get: give me the skill definition. Returns a manifest: an envelope listing the skill's parts with their content types, trust classifications, and content URLs. The client fetches content on demand after deciding to trust it.
- Skills execute on the client side. There is no
skills/invoke. The LLM reads SKILL.md, follows the instructions, and calls tools and resources as the skill directs. Server-side execution is what Tools are for.
The Skill Envelope
A skills/get response carries the skill definition as a structured envelope. Each part is individually addressable, typed, and trust-classified:
{
"type": "skill-definition",
"name": "pdf-processing",
"description": "Extract PDF text, fill forms, merge files.",
"version": "1.2.0",
"parts": [
{
"path": "SKILL.md",
"contentType": "text/markdown",
"url": "/skills/pdf-processing/files/SKILL.md",
"sha256": "b7c3e1a9d4f2...",
"executable": false
},
{
"path": "scripts/extract.py",
"contentType": "text/x-python",
"url": "/skills/pdf-processing/files/scripts/extract.py",
"sha256": "a3f2b8c9d1e4...",
"executable": true
},
{
"path": "references/api.md",
"contentType": "text/markdown",
"url": "/skills/pdf-processing/files/references/api.md",
"sha256": "c5d9e7f1a2b3...",
"executable": false
},
{
"path": "assets/template.html",
"contentType": "text/html",
"url": "/skills/pdf-processing/files/assets/template.html",
"sha256": "e8f4d2a6b1c3...",
"executable": false
}
]
}
Content is never inlined in the envelope. Every part is a URL. The client fetches what it needs, when it decides to trust it. This is important for three reasons: multi-line YAML+Markdown inside a JSON string field is fragile and invites escaping bugs; deserializing code before deciding to trust it is a security problem; and content-addressable URLs enable caching and progressive disclosure (the client fetches SKILL.md on activation, references on demand, scripts only when execution is approved).
The client reconstructs the directory locally: create the directory structure, write each fetched part to its path, set the executable bit where executable: true. The result is a directory that conforms exactly to the Agent Skills specification. The envelope is the transport format between the authoring layer (how humans write skills) and the runtime layer (how clients use them).
The 2025-11-25 MCP specification introduces outputSchema and structuredContent for tools, allowing tools to declare their output schema and return structured data alongside text. Tool annotations (audience, priority) and resource_link types in results add further structured metadata. This is the spec moving in the direction our envelope model already occupies: typed, structured, content-addressable outputs. The difference is that the spec's structured content is limited to individual tool results, while the envelope model applies to the full skill package (instructions, references, scripts, and their integrity hashes). Convergence is the right word. The spec is catching up to what Skills require; it has not yet reached what Skills deliver.
Trust Domains and Content Types
Every part in the envelope carries a content type and an executable flag. These aren't optional metadata. They're trust classifications that gate what the client is willing to receive and run.
The protocol defines three trust domains:
Information domain (the LLM reads these; no execution risk):
- text/markdown, text/plain, text/html
- application/json, application/yaml, text/csv
- image/png, image/svg+xml
Execution domain (these want to be run; explicit client opt-in required):
- text/x-python, text/x-shellscript, text/x-javascript
- application/x-python, application/x-sh
Every part in the execution domain must have executable: true. The flag is not a permission. It is a warning label. The client sees it and decides its own policy: display a prompt, refuse entirely, log and allow.
Content types are restricted to an allowlist. application/zip, application/x-tar, application/octet-stream, and other opaque container types are rejected at the protocol level. If a skill needs to carry a binary asset, it references it as a separate Resource. The envelope is for things the protocol needs to classify. Opaque containers bypass the trust model. New content types can be added through capability negotiation, but the default set is small, typed, and always inspectable.
Integrity: Content-Addressable Parts
Every part carries a sha256 hash. This is not optional. The client must verify every fetched part against its manifest hash. Mismatch means tampering or corruption. Reject the part.
The hash chain provides four properties:
- Integrity: fetch the content, hash it, compare to the manifest. Mismatch, reject.
- Content-addressed caching: two skills sharing the same
references/api.md share one cached copy, keyed by sha256. No redundant downloads.
- Version pinning: a client or organization pins a skill to a specific version, or to a specific manifest hash, or to a specific source. Pin by version ("accept 1.x, not 2.x"), pin by manifest hash ("accept exactly this build"), pin by source ("trust our internal registry").
- Reproducibility: a corporate registry signs the manifest. The signature covers the hashes. Any change to any part invalidates the signature. Supply chain integrity for skills.
SHA-256 is the mandated algorithm. MD5 and SHA-1 are broken for collision resistance. Algorithm negotiation is not offered. If quantum computing concerns emerge, that's a protocol version bump, not a per-message negotiation.
Skill Versioning
Three versioning axes operate at different scales:
- Protocol version (semver): "can we talk at all?" Governs interop. Major version mismatch means incompatible operations. Negotiated at initialization.
- Skill version (semver): "which release is this?" Governs release tracking. Independent of the protocol version. A skill can be republished with bug fixes (1.0.0 to 1.0.1) without any protocol change.
- Part hashes (sha256): "is this specific file exactly what the manifest says?" Governs content integrity. Changes when any byte in any part changes.
A compliance officer reads version numbers. A verification pipeline runs sha256. They serve different audiences and different purposes. The three form a chain: protocol version governs interop, skill version governs release tracking, part hashes govern content integrity.
Elicitation — Complementary, Not Competing
The 2025-11-25 MCP specification introduces Elicitation, a client-side feature that allows a server to request structured input from the user. Form mode presents a JSON Schema and collects validated responses. URL mode redirects the user to an out-of-band interaction for sensitive data (OAuth flows, credential collection).
Elicitation and Skills address different problems. Elicitation collects input. Skills deliver procedure. A skill may use elicitation to gather parameters (the equivalent of a tool's input schema), but the skill's value is in the orchestration logic, not the input collection. Form mode is a limited version of what Skills can do (Skills can include multi-step workflows, validation rules, and execution strategies that go far beyond a single form). URL mode is an out-of-band security escape hatch that raises trust concerns directly relevant to our trust model: the client is sending the user to an arbitrary URL controlled by the server.
The relationship is complementary. Elicitation solves the "ask the user a question" problem within a single turn. Skills solve the "here is how to solve this class of problems" problem across multiple turns. A skill that includes an elicitation step (confirm parameters, then proceed) is more capable than either construct alone. But the protocol should not conflate them. Elicitation is a client-side input mechanism. Skills are a server-side knowledge delivery mechanism. They meet at the point where procedural knowledge requires user input, and that intersection is where trust negotiation matters most.
Trust Model: Safe Defaults, Explicit Escalation
The out-of-the-box client experience should be conservative to the point of boring:
- Allow
skills/list (browsing is safe; metadata only)
- Allow fetching information-domain content types (markdown, JSON, images)
- Refuse fetching execution-domain content types (scripts, shell, anything
executable: true)
- Verify sha256 on every fetched part
- Refuse to set the filesystem executable bit on any written file
Code execution is opt-in, not opt-out. The default answer to "can I run this script?" is no.
Above that foundation, escalation tiers:
Trusted sources. A client maintains a registry of trusted skill sources. An organization might trust corp-registry.internal/* but refuse public registries. A personal user might trust agentskills.io/* after review. Sources not in the trust list can be browsed (metadata only) but their content and code URLs are blocked.
Content-type policy. The protocol defines the default allowlist (information-domain types). A corporate deployment can restrict it further (no images, no HTML) or expand it through capability negotiation (adding application/x-wasm-module because they have a WASM runtime). Expanded types are a client capability, not a server permission. The server advertises what it has; the client decides what it accepts.
Skills kill switch. skills.enabled: false. Some deployments don't want the LLM downloading and following instructions from external sources. Turn it off entirely. No listing, no fetching. The protocol operations are gone. This is the corporate governance position: "our agents use tools and resources, not skills."
Audit logging. Every skill fetch, every part downloaded, every file written to disk, every executable bit set. The client logs sha256, source URL, content type, and the trust decision made. Corporate compliance can review these logs. Incident response can trace exactly which skill's script ran on which date.
The framing matters. A model that assumes trust starts open and lets you close doors (blocklists, revocation) inevitably leaks. A model that assumes distrust starts closed and lets you open doors (trust lists, capability negotiation) is hard to misuse by default.
Spec Additions: Roots and Security
The 2025-11-25 specification introduces roots/list and notifications/roots/list_changed, which allow clients to declare filesystem boundaries to servers. A root is a trust boundary declaration: the client tells the server "you may access files within these paths." This complements our trust domains. Where our trust domains govern what content the client is willing to receive and execute (information vs. execution vs. artifact), Roots govern what locations the server is permitted to access. Both are boundary declarations. Both default to minimal access and require explicit opt-in.
The specification also introduces extensive security considerations: user consent requirements, data privacy protections, untrusted tool descriptions, LLM sampling controls, DNS rebinding prevention, token passthrough prohibitions, and phishing prevention. These address the connection layer: how the client and server authenticate and authorize each other. Our trust domains address the content layer: what the client is willing to receive, execute, and store. The spec secures the channel; our model secures the payload. Both are necessary. Neither is sufficient alone.
Spec Addition: Sampling with Tools
The 2025-11-25 specification allows servers to include tools and toolChoice in sampling requests, enabling multi-turn agentic loops within a single sampling interaction. The server can request that the client's LLM make tool calls, receive results, and continue reasoning, all within the context of a sampling request.
This changes what "sampling" means architecturally. Sampling was originally a simple request: "here's a prompt, generate a completion." With tools, it becomes a recursive agentic loop: the server asks the client's LLM to reason, the LLM calls tools, the server provides tool results, the LLM continues reasoning. This is convergent with our Skills concept, from the opposite direction. Skills give the client procedural knowledge from the server. Sampling with tools gives the server the ability to drive the client's reasoning with its own tools. Both are composable. Both require trust negotiation. Both create recursive execution patterns that need the actor-like supervision the Addendum describes.
§2
2. REST Conventions — The Yardstick We Should Be Using
REST isn't just GET /things. It's a set of architectural constraints that produced the web's scalability. Fielding's dissertation gave us six constraints. When a protocol tunnels RPC through HTTP, it opts out of every one of them.
2.1 Uniform Interface — The Big One
- Resource identification in requests: URLs identify nouns, not verbs.
/tools/weather/call names a resource (the invocation of the weather tool), not an RPC method buried in a JSON body.
- Manipulation of resources through representations: the client holds enough state in the representation to modify or delete it. A JSON body that says
{"method":"tools/call"} tells the transport nothing: the transport is blind.
- Self-descriptive messages: every HTTP request should carry enough metadata (method, headers, status) that intermediaries can act without parsing the body. JSON-RPC bodies require body-parsing to understand what's happening.
- HATEOAS: the server drives client state through links. MCP's capability advertisement at init is a step toward this, but it's a one-time handshake, not an ongoing navigable graph.
2.2 Statelessness
- Each request from client to server must contain all information needed to understand the request. The server doesn't store client context between requests.
- MCP's sessions (
Mcp-Session-Id) are explicitly stateful: the server must remember context across requests. That's not inherently wrong, but it's a deliberate departure.
- REST's statelessness constraint exists for a reason: it enables horizontal scaling, failure recovery, and caching. MCP pays the cost without acknowledging the tradeoff.
2.3 Cacheability
- Responses must implicitly or explicitly label themselves as cacheable or non-cacheable.
- In MCP, every response tunnels through the same POST endpoint with the same URL: no CDN, no
ETag, no Cache-Control granularity. A tool list that changes once a month gets fetched with the same POST as a live tool call.
- REST would give you
GET /tools → Cache-Control: max-age=3600. MCP gives you POST /mcp → opaque body, no cache layer can help.
2.4 Layered System
- A client cannot tell whether it's connected directly to the end server or to an intermediary. Proxies, gateways, and caches can be inserted transparently.
- MCP's single-endpoint design doesn't prevent layering, but it strips intermediaries of the metadata they need to be useful. A proxy sees
POST /mcp and has no idea if this is a read, a write, or a stream negotiation.
2.5 Code on Demand (Optional)
The least exercised of Fielding's constraints, and the most interesting for MCP. Code on Demand means the server can extend client functionality at runtime by transferring executable code: JavaScript in the browser, stored procedures in a database, plugins in an editor.
MCP already does something analogous. It just does not have the vocabulary to say so.
A tool definition is code on demand by analogy. The client did not know how to call weather_forecast until the server told it about the tool's name, parameters, and return type. The server extended the client's capability set at runtime. This is not literally executable code the client runs, as Fielding originally defined Code on Demand, but the architectural pattern, server-ships-capability-to-an-unspecialized-client, is the same. A skill definition (as outlined in §1.1) goes further: the server ships a package containing instructions, context, and validation rules. The client reads the package and gains the ability to perform a task it could not perform before. The LLM interprets the instructions instead of a JavaScript runtime, but the architecture is the same: the client is a universal executor that becomes specialized by the code (or prompt, or procedural knowledge) the server sends.
The difference between "code on demand" and "prompt on demand" is the execution model, not the architectural pattern. In both cases, the server ships capability to a previously unspecialized client. In both cases, the client's behavior after receiving the payload differs from its behavior before. In both cases, trust is the central question: should the client execute what the server sent?
This is where Code on Demand connects to the trust model. Fielding's constraint assumes the client trusts the server enough to execute the code it receives. In the web, that trust is bounded by the browser's sandbox, the same-origin policy, and content security policies. In MCP, the trust boundary is undefined. A tool call can execute arbitrary code on the server. A skill definition can direct arbitrary behavior on the client. The protocol has no sandbox, no same-origin equivalent, and no capability-based trust boundary. It assumes the client-server relationship is already trusted.
This assumption works for local tools on your own machine (the stdio use case). It breaks for remote servers whose trustworthiness is unknown. Code on Demand, when done deliberately, includes trust negotiation. MCP has no such mechanism.
2.6 The Richardson Maturity Model — Where MCP Lands
- Level 0: Single URI, single method (RPC over HTTP). This is where MCP lives.
- Level 1: Multiple URIs, single method. Partial improvement.
- Level 2: Multiple URIs, multiple methods (GET/POST/PUT/DELETE with proper semantics). Where a convention-respecting MCP could be.
- Level 3: HATEOAS: navigable resource graph. Aspirational, but the capability model is half the bridge already built.
2.7 REST Is Not the Only Way — But It's the Default Way
- This isn't an argument that everything must be REST. gRPC, GraphQL, and WebSockets all have their place.
- The argument is: if you're going to run over HTTP, the burden of proof is on you to justify why you're not using HTTP's native semantics.
- MCP has reasons for tunneling (streaming, batching, bidirectional messaging), but none of them actually require abandoning REST. They require thoughtful design, not a tunnel.
2.8 Other Conventions MCP Leaves on the Table
REST isn't the only body of knowledge MCP bypasses. A protocol that respects the web's accumulated wisdom would also draw on:
Postel's Law (originally RFC 793, restated in RFC 1122): "Be conservative in what you send, liberal in what you accept." MCP's session rules are rigid: one header format, one Accept combo, fail hard on mismatch. A Postel-respecting transport would negotiate gracefully rather than 400 on any deviation.
Idempotency: HTTP distinguishes safe/unsafe methods. GET /tools is safe: retry all day. POST /tools/weather/call may not be: retrying could charge you twice. MCP's POST-everything makes every operation look identical to the transport layer. An Idempotency-Key header (as Stripe and others use) or native PUT semantics for idempotent operations would solve this.
CQRS (Command Query Responsibility Segregation): Reads and writes have different scaling, caching, and consistency profiles. A tool list (query, cacheable, eventually consistent) vs. a tool invocation (command, non-cacheable, strongly consistent) shouldn't share the same endpoint and method. MCP conflates them entirely.
OpenAPI / AsyncAPI: Standard machine-readable API descriptions exist. MCP invents its own capability advertisement at init, which is fine, but it means no existing tooling (Swagger UI, API gateways, client generators) can consume an MCP server's capabilities without a translation layer. AsyncAPI could describe the streaming event contracts and MQTT topic structures that MCP might expose.
The Twelve-Factor App: Port binding (the server is self-contained), backing services as attached resources, dev/prod parity. MCP's transport spec doesn't address how an MCP server should be deployed. It assumes a process exists. Twelve-factor guidance on configurability (environment variables for the MCP endpoint URL, not hard-coded) and disposability (fast startup, graceful shutdown of streaming connections) would be natural additions.
Unix Philosophy: Do one thing well. Stdio transport respects this: stdin in, stdout out, no ceremony. Streamable HTTP violates it: one endpoint does reads, writes, streaming, session management, cancellation, and batching. A Unix-philosophy MCP would have small, composable endpoints with clear boundaries.
Fallacies of Distributed Computing: The eight fallacies are a checklist for protocol design. MCP trips on several: "the network is reliable" (SSE disconnection is an afterthought, resumability is optional), "latency is zero" (timeout guidance was added in the 2025-11-25 revision; earlier drafts had none), "topology doesn't change" (no service discovery, no retry-on-redirect), "there is one administrator" (custom headers mean every admin must know MCP's quirks).
Event Sourcing: Server-to-client notifications (tool results, resource changes, progress updates) are an event stream. SSE provides the transport but MCP doesn't define event types, versioning, or replay semantics beyond the raw id cursor. An event-sourcing approach would give clients a durable, re-playable log of what happened, not just a transient SSE firehose.
The Open/Closed Principle (Meyer): Software entities should be open for extension, closed for modification. MCP's transport model is hard-coded to two options: stdio and Streamable HTTP. Adding a third transport means amending the spec itself. A protocol designed for extension would define a transport interface (message framing, connection lifecycle, capability negotiation) and let the community implement it over MQTT, AMQP, WebSocket, gRPC, Unix sockets, or anything with bidirectional message exchange, without touching the core spec.
Authorization Is a Transport Concern: MCP's 2025-11-25 specification devotes over 40 pages to OAuth 2.1 framework details: RFC 9728 Protected Resource Metadata, RFC 8414 Authorization Server Discovery, OpenID Connect Discovery, PKCE, step-up authorization, token audience binding, and confused deputy mitigations. This is an extensive, well-considered authorization model. It is also a transport-layer concern elevated to the protocol layer.
The distinction matters. OAuth 2.1 is how HTTP handles authorization. It is the right answer for the HTTP binding. But MQTT handles authorization with TLS client certificates and username/password at the broker level. gRPC handles it with TLS mutual authentication and channel credentials. stdio handles it with Unix file permissions and process UID. A transport-agnostic protocol core should define what authorization means (the server needs to know who the client is and what it is allowed to do) but should let each binding express it in that transport's native idiom. The HTTP binding should use OAuth 2.1. The MQTT binding should use TLS mutual auth. The stdio binding should use process permissions. None of these is wrong. Each is correct for its transport.
MCP's approach of embedding OAuth 2.1 details in the protocol specification is the same pattern as embedding SSE details in the protocol: it is correct for one transport and opaque to every other.
§3
3. Transport Diversity — Why Only stdio and HTTP?
MCP defines exactly two standard transports: stdio (local subprocess) and Streamable HTTP (network). Both are defensible. Neither is sufficient. The protocol's ambition, a universal standard for LLM-tool integration, deserves a transport model that matches the diversity of the environments tools live in.
3.1 The MQTT-Shaped Hole
MQTT is the most conspicuous absence. It's the dominant protocol for IoT, sensor networks, and event-driven systems, precisely the domains where MCP tools are likely to live.
Why MQTT fits MCP better than HTTP does:
-
Inherently bidirectional: MQTT clients publish AND subscribe. No SSE hack, no separate GET/POST paths for inbound vs. outbound messages. An MCP client subscribes to mcp/server/{session-id}/notifications and publishes to mcp/server/{session-id}/requests. Symmetric, clean, one connection.
-
Topic-based routing maps to MCP's resource model: Everything in MCP has a natural MQTT topic hierarchy:
mcp/tools/list → request tool catalog
mcp/tools/weather/call → invoke the weather tool
mcp/resources/sensors/temperature → read a resource
mcp/skills/debug-failing-test → retrieve a skill definition
-
mcp/notifications/progress → server pushes progress updates
No routing logic needed: the broker does it.
-
QoS levels solve reliability without reinventing it: QoS 0 (fire and forget), QoS 1 (at least once), QoS 2 (exactly once). Tool invocations that debit an account? QoS 2. Progress notifications? QoS 0. MCP's Streamable HTTP offers optional SSE resumability via Last-Event-ID, but this is application-level retry logic layered on top of a transport that was not designed for delivery guarantees. QoS is built into the protocol.
-
Retained messages are initialization without an init handshake: A server publishes its tool list to mcp/tools/list with the retain flag. Any client that connects immediately gets the current catalog. No InitializeRequest/InitializeResult dance needed. This is capability advertisement as persistent state, not a one-time handshake.
-
Last Will and Testament gives graceful disconnection semantics: MCP has no standard way to detect server death. Streamable HTTP's SSE streams can disconnect, but the client cannot distinguish a server crash from a network interruption or an intentional close. MQTT's LWT message fires on ungraceful disconnect: the client knows the server is gone, not just that a TCP socket closed.
-
It's already running everywhere: Home Assistant, industrial IoT, building automation, sensor networks. The tools MCP wants to expose are often already on MQTT. An MCP-MQTT transport wouldn't require a new server; it would bridge the protocol.
Why MCP probably skipped it: The LSP lineage. Language servers are subprocesses on a developer's machine. Stdio and HTTP cover that world completely. But MCP's scope is broader than LSP's. A temperature sensor in a factory doesn't run as a subprocess. A smart lock doesn't expose an HTTP endpoint. MQTT is the missing transport for the physical world that MCP claims to want to connect to.
3.2 NATS — The Lightweight Messaging Alternative
NATS occupies a sweet spot between MQTT's IoT focus and HTTP's request-response model. It's a publish-subscribe system designed for cloud-native, microservice, and edge deployments, precisely the environments where MCP servers are proliferating.
Why NATS fits MCP better than HTTP does:
-
Subject-based routing, not URL routing: NATS subjects (mcp.tools.list, mcp.tools.weather.call) map to MCP's resource model the way MQTT topics do, but with wildcard subscriptions (mcp.tools.> catches all tool operations). A client subscribes to one pattern and gets every tool event: no SSE endpoint, no separate notification channel.
-
JetStream for durable semantics: NATS' built-in persistence layer (JetStream) gives exactly-once delivery, replay from any offset, and durable queues. MCP tool invocations that charge money or side-effect the world get the durability guarantees that Streamable HTTP's optional SSE resumption cannot match, without bringing in a separate database or message broker.
-
Request-reply is a first-class pattern: Unlike MQTT, NATS handles request-reply natively. You publish a request to mcp.tools.weather.call and the server replies on a unique reply subject. No correlation IDs to manage, no response topic conventions to invent. The protocol does it.
-
Leaf nodes for edge deployments: NATS leaf nodes let isolated clusters (a factory floor, a home network) connect to a central hub on intermittent links. An MCP tool server behind a leaf node is reachable from anywhere in the hierarchy. MQTT has bridges; NATS has a built-in, operationally simpler version.
-
Authentication and authorization are built-in: NATS supports decentralized JWT-based auth, account isolation, and per-subject permissions. MCP's current security model is "transport handles it," but HTTP's transport auth (TLS, CORS) doesn't help for pub/sub. NATS does.
-
It's already in the ecosystem: Cloudflare, Siemens, and hundreds of edge-computing deployments use NATS. These are the organizations building the tools MCP wants to connect. A NATS binding wouldn't be exotic. It would meet the infrastructure where it already lives.
Why MCP probably skipped it: Same reason as MQTT: the LSP heritage. Language servers don't need a message broker. But MCP is not LSP. Once you're connecting LLMs to IoT sensors, enterprise event streams, and distributed toolchains, a lightweight messaging substrate isn't exotic. It's essential.
3.3 ZeroMQ — The Socket Toolkit That Disappears Into Your Code
ZeroMQ isn't a broker. It's a socket library that gives you messaging patterns (request-reply, pub-sub, push-pull, router-dealer) without deploying any infrastructure. You link the library, open a socket, and you have distributed communication. No broker process, no configuration file, no operational overhead.
Why ZeroMQ fits MCP's tooling model:
- Pattern primitives map to MCP's communication modes: ZeroMQ's socket types are the right abstraction for MCP's varied message flows:
REQ/REP — synchronous tool calls (invoke tools/call, get a response)
PUB/SUB — server-to-client notifications (progress updates, resource changes)
PUSH/PULL — task distribution (fan out tool invocations across workers)
ROUTER/DEALER — asynchronous bidirectional messaging (MCP's full protocol, with correlation built in)
MCP currently squeezes all of these through a single HTTP POST endpoint. ZeroMQ's socket types give each mode its own wire pattern: the right tool for each job.
-
No broker to deploy: A stdio-based MCP server is appealing because it requires zero infrastructure: launch the process and talk to it. ZeroMQ extends this philosophy: no broker process, no configuration for simple cases. Bind to ipc://mcp-weather, connect, communicate. The operational simplicity of stdio with the topology flexibility of a network protocol.
-
Zero-copy for high-throughput tools: Tools that return large payloads (image generation, document extraction, dataset queries) currently serialize through JSON in an HTTP body. ZeroMQ's zero-copy message passing means a tool server can send a 100MB response without copying it into a buffer first. For compute-heavy local tools, this is the difference between "works" and "works fast."
-
Multicast and service discovery: ZeroMQ's RADIO/DISH pattern (draft) and ZMQ_DISCOVERY allow tool servers to announce themselves on the local network without a registry. An MCP client can discover available tools the way a browser discovers Bonjour services, without a centralized directory.
-
The same codebase, every topology: ZeroMQ lets you change the deployment topology by changing the socket type and bind/connect direction. Same binary, same protocol logic. A developer tool can run as ipc:// for local use, then switch to tcp:// for a team server, then switch to a ROUTER/DEALER fan-out for production, all by changing the endpoint string. MCP's current transport model requires a completely different implementation for stdio vs. HTTP.
The trade-off: ZeroMQ has no broker, which means no built-in message persistence, no built-in replay, no management dashboard. It trades operational guarantees for library-level simplicity. For local MCP tool servers (the majority use case today), that's exactly the right trade. For distributed deployments, NATS or MQTT are better answers.
3.4 Shared Memory / Ring Buffer — The Zero-Copy Local Fast Path
When client and server are on the same machine, even stdio copies data through kernel buffers twice (write to pipe → kernel buffer → read from pipe). Shared memory ring buffers eliminate the copies entirely. The producer writes directly into a memory-mapped region; the consumer reads from the same region. No syscall on the hot path, no JSON serialization with binary schemas, no kernel transition for small messages.
Why shared memory matters for MCP:
-
Latency disappears: A tool invocation on shared memory can reach sub-microsecond latency in optimized designs. stdio through a pipe is 10-50 microseconds. HTTP over localhost is 100-1000 microseconds. When an LLM calls a tool dozens of times per inference step (tool chaining, multi-step reasoning), this adds up. Shared memory is the only transport that makes per-token tool calls viable.
-
The ring buffer pattern is battle-tested: LMAX Disruptor (financial trading, 20M+ transactions/second), DPDK (network packet processing), audio subsystems (JACK, PipeWire), all use lock-free ring buffers over shared memory. This isn't exotic. It's how systems that care about latency actually work.
-
Structured data without serialization: With a schema-defined ring buffer (FlatBuffers, Cap'n Proto, or a hand-rolled struct layout), the client reads tool results directly from memory: no JSON parse, no allocation, no garbage collection pressure. The message is already in the format the consumer expects. MCP's JSON-RPC is convenient for humans; shared memory with a binary schema is convenient for machines.
-
Complementary, not competitive: Shared memory doesn't replace stdio or HTTP. It augments them. A local tool server could offer three transports simultaneously: shared memory for latency-critical invocations on the same host, stdio for simple subprocess use cases, and HTTP for remote access. The protocol core is the same. The transport binding picks the right lane.
-
The LLM-local sweet spot: The emerging architecture for LLM tool use is model and tool on the same machine, a local Ollama instance calling local Python tools. This is exactly the deployment where shared memory shines: both processes on the same host, calling each other at inference speed, with zero concern about network partitions, serialization overhead, or protocol framing. The ring buffer is the stdio of the future: same simplicity, orders of magnitude faster.
The trade-off: Shared memory is strictly local: same machine, same OS. No network, no remote access. And it requires coordination primitives (semaphores, memory barriers) that are platform-specific. But for the dominant MCP deployment pattern (LLM on localhost calling local tools), it's the fastest path that exists. A well-designed protocol would make this the first-class choice for same-host communication, with stdio as the fallback and HTTP for network access.
3.5 Other Transports Worth Considering
-
WebSocket: Already in every browser and every language. Bidirectional, binary or text frames, widely supported by proxies and CDNs (unlike SSE). A natural upgrade path from SSE that preserves HTTP compatibility during the handshake.
-
gRPC: Protobuf efficiency, generated clients in 12+ languages, bidirectional streaming, deadlines/timeouts, rich error model. If MCP wants high-performance tool invocation at scale, gRPC is the obvious answer. OpenAPI + gRPC together would cover every deployment profile.
-
AMQP / RabbitMQ: For enterprise deployments where tools span organizational boundaries. Message queues with guaranteed delivery, dead-letter handling, and routing exchanges that make MCP's batch model look primitive.
-
Unix Domain Sockets: Same machine, no network stack, lower latency than TCP. The natural upgrade from stdio for multi-process architectures on the same host.
-
D-Bus: On Linux, D-Bus is how system services talk to each other. An MCP-D-Bus transport would let systemd services, desktop applications, and hardware daemons expose themselves as MCP tools with zero network configuration.
3.6 The Architectural Lesson: Transport Agnosticism Is Earned, Not Declared
MCP's spec says it's "transport-agnostic" and that "custom transports" are permitted. But the spec also defines exactly two transports with very specific rules. That's not agnosticism: that's a duopoly. True transport agnosticism means:
- Defining the message layer independently of the transport layer (JSON-RPC messages exist regardless of how they're delivered)
- Defining a transport interface (connect, authenticate, send, receive, close), not a transport implementation
- Letting the community implement that interface over whatever channel makes sense for their domain
- Requiring only that all transports preserve message order, delivery semantics, and the initialization lifecycle
This is how HTTP itself works: it runs over TCP, QUIC, Unix sockets, and TLS tunnels without changing the spec. MCP should aspire to the same.
§4
4. The Single-Endpoint POST-Everything Antipattern
MCP's Streamable HTTP transport routes client-to-server operations through a single URL. Tools/list, tools/call, resources/read, prompts/list, cancellation notifications, initialization handshakes: all POST to /mcp. Server-to-client streaming uses GET on the same endpoint; session termination may use DELETE. The HTTP method conveys almost zero semantic information. The URL conveys zero semantic information. The entire request is an opaque JSON-RPC envelope dropped into a single hole.
This is not a stylistic preference. It is an architectural decision with operational consequences that are invisible until something breaks, at which point they are everywhere.
What RESTful Routing Looks Like
Under REST conventions, the same operations would be expressed as distinct resources with appropriate methods:
GET /tools → list available tools
GET /tools/:name → describe a specific tool
POST /tools/:name/call → invoke a tool
GET /resources → list available resources
GET /resources/:uri → read a resource value
POST /resources/:uri/subscribe → subscribe to resource updates
GET /prompts → list available prompts
GET /prompts/:name → describe a specific prompt
POST /prompts/:name → render a prompt with arguments
DELETE /session → terminate the session
The URL and method together tell the full story. A load balancer can route /tools requests to one pool and /resources requests to another. A CDN can cache GET /tools responses. A firewall can allow GET and block POST. An observability pipeline can count tool calls per endpoint without parsing JSON-RPC envelopes.
None of this works when everything is POST /mcp.
What Is Lost
Caching. GET responses are cacheable by definition. Proxies, CDNs, and browsers can cache GET /tools because the method is safe and the URL identifies the resource. POST /mcp is not cacheable by any standard HTTP mechanism. A tools/list response that changes once an hour is fetched every time because the client has no way to express "give me this if it has not changed." No ETag. No Cache-Control. No conditional If-None-Match.
Observability. Every monitoring tool in the HTTP ecosystem understands methods, URLs, and status codes. A dashboard that shows "95th percentile latency for POST /mcp" tells you nothing. A dashboard that shows "95th percentile latency for POST /tools/search/call" tells you which tool is slow. When the URL is always /mcp and the method is always POST, the HTTP layer becomes opaque. You have to parse JSON-RPC envelopes in your logging pipeline to get visibility that REST gives you for free.
Firewall and gateway granularity. A WAF rule that allows GET /resources/* and blocks POST /tools/* expresses a security policy in terms the infrastructure understands. A rule that allows POST /mcp with a JSON-RPC body filter does not, because WAFs do not parse JSON-RPC. The same applies to API gateways, service meshes, and rate limiters that operate at the HTTP method and path level.
CDN and proxy support. CDNs are built to cache GET responses and pass through POST requests. When a read-only operation like listing available tools is expressed as a POST, CDNs cannot help. Proxies that buffer POST bodies differently than GET responses introduce latency and failure modes that do not exist when the method matches the operation's semantics.
The RPC-over-HTTP Trap
This is not a new pattern. SOAP routed all operations through a single endpoint, typically /soap or /wsdl, and encoded the operation in the XML body. XML-RPC did the same with a smaller envelope. Both were technically functional. Both were operationally opaque. Both were replaced by REST not because REST was more powerful but because REST was more legible. The infrastructure could see what was happening without being told.
MCP has reinvented the SOAP pattern with a JSON envelope. The operation is in the body, not in the method or the URL. The endpoint is a tunnel. The HTTP layer is a pipe that carries bytes without understanding them. This is exactly the criticism that the REST community leveled against SOAP twenty years ago, and it is valid for the same reasons.
The defense of RPC-over-HTTP is that it simplifies the client: one URL, one method, one pattern. This is true. It simplifies the client by offloading complexity to every piece of infrastructure between the client and the server. That tradeoff might be acceptable in a controlled environment where you own every hop. It becomes a liability in any environment where you do not.
The Stdio Exception
One transport does not suffer from this problem: stdio. On stdio, there are no URLs, no methods, and no intermediaries. JSON-RPC over stdio makes sense because there is no HTTP layer to violate. The method is the message. The envelope is the protocol. Stdio is not REST, and it does not need to be.
The problem is that MCP's HTTP binding chose to be JSON-RPC over HTTP instead of REST over HTTP. It treated HTTP as a pipe rather than a protocol. The result is a transport that works but that refuses to participate in the ecosystem that makes HTTP valuable in the first place.
§5
5. Content Negotiation as Transport Fork
In MCP's original HTTP transport, a client sends a POST to /mcp with Accept: application/json and receives a JSON response body. The same POST to the same URL with Accept: text/event-stream returns an SSE stream. Same endpoint. Same method. Radically different response protocols, selected by a content negotiation header. (The 2025-11-25 Streamable HTTP transport changed this: the client's POST must list both application/json and text/event-stream in Accept, and the server decides which to return. The architectural concern remains; the mechanism shifted from a binary client-side switch to a server-side decision.)
Content negotiation in HTTP is designed for selecting between representations of the same resource: JSON vs. XML, English vs. French, compressed vs. uncompressed. The resource is the same. The representation varies. Accept: text/event-stream is not asking for a different representation of the same resource. It is asking to switch the entire communication protocol from request-response to streaming server push. The body format changes. The connection semantics change. The client's parsing code changes. The failure handling changes. This is not content negotiation. It is a transport fork disguised as a media type preference.
The Dual Code Path Problem
Every MCP HTTP client must implement two entirely different response handlers:
-
JSON path: Send request, receive 200 OK with application/json body, parse JSON-RPC response, done. One request, one response, connection closes or returns to pool.
-
SSE path: Send request, receive 200 OK with text/event-stream body, open an event stream, parse SSE frames, correlate events by ID, handle reconnection with Last-Event-ID, detect stream termination, and only then consider the request complete.
These are not variations on the same code. They are two different clients living inside one if/else branch on the Accept header. The JSON path is a standard HTTP request-response cycle. The SSE path is a long-lived subscription with its own framing protocol, its own error handling, and its own lifecycle management. A client that gets the wrong content type by mistake (a misconfigured proxy stripping Accept) does not get a garbled response. It gets a completely different protocol that its JSON parser will choke on, producing errors that have nothing to do with the actual problem.
How Proxies Break This
Corporate proxies, API gateways, and content delivery networks routinely modify or strip Accept headers. A proxy that replaces Accept: text/event-stream with Accept: */* will cause the server to return JSON when the client expected a stream. A load balancer that buffers response bodies will hold SSE events until the stream closes, which may be never, turning a real-time notification channel into a silent timeout.
These are not hypothetical failures. They are the class of failures that disappear when transport semantics are expressed in the transport rather than in content negotiation headers. An MQTT client does not negotiate whether it wants publish/subscribe semantics via an Accept header. The protocol IS publish/subscribe. A WebSocket client does not negotiate whether it wants bidirectional framing via a content type. The protocol provides it. MCP's use of Accept to select between request-response and streaming server push puts a transport-level decision in a content-level field, and every intermediary that does not understand this specific convention will misinterpret it.
What Proper Content Negotiation Looks Like
HTTP content negotiation, as defined in RFC 9110, allows a client to express preferences for representations of a resource. Accept: application/json says "I prefer JSON." Accept-Language: en says "I prefer English." The server responds with Content-Type: application/json and Content-Language: en. The resource is the same. The representation varies.
MCP's use of Accept violates this contract because the response is not a different representation of the same resource. A JSON-RPC response and an SSE stream are different communication patterns applied to the same logical operation. The JSON response closes the request. The SSE stream keeps it open and pushes additional frames. These are transport behaviors, not content preferences. Encoding them in Accept conflates "what format is this data in" with "what protocol are we speaking," and the conflation makes every intermediate system that correctly implements HTTP content negotiation into a potential source of failure.
The Streamable HTTP Update
The 2025-11-25 revision of the MCP specification replaces the original HTTP+SSE transport with "Streamable HTTP." SSE is now optional: the server can return application/json directly. A GET to the endpoint opens a standalone SSE stream. A POST can return JSON or SSE depending on whether the server needs to push. The MCP-Protocol-Version header is now required on all requests. Session termination uses an explicit DELETE request.
This is an improvement. It removes the worst fork: the client no longer has to guess whether a POST will return JSON or SSE based on its Accept header. But the fundamental issue remains: the MCP endpoint is still a single URL that does everything, and the choice between request-response and streaming is still expressed in the response rather than determined by the request's structure. The server decides. The client adapts. The transport semantics are still emergent rather than declared.
§6
6. Status Codes That Lie
HTTP status codes are the first thing every monitoring tool, load balancer, and API gateway reads. They are the lingua franca of operational visibility: a 5xx spike means the server is failing, a 4xx cluster means clients are confused, a steady stream of 200s means everything is fine.
MCP undermines this by burying all signal inside the JSON-RPC envelope. The HTTP status code says one thing. The JSON-RPC response says another. The operator, the dashboard, and the pager duty alert all see the HTTP layer first, and the HTTP layer is lying.
The Mapping That Should Exist
Here is what MCP operations would look like if they used HTTP status codes honestly:
| MCP Operation |
Honest HTTP Status |
tools/list success |
200 OK |
tools/call success |
200 OK |
tools/call with invalid params |
400 Bad Request |
tools/call method not found |
404 Not Found |
resources/read for missing resource |
404 Not Found |
resources/read for unauthorized resource |
403 Forbidden |
initialize with incompatible version |
400 Bad Request or 501 Not Implemented |
| Valid request but internal error |
500 Internal Server Error |
| Valid request but server is shutting down |
503 Service Unavailable |
Instead, operation-level responses come as 200 OK even when they encode failures. The JSON-RPC layer carries all the semantic status, and the HTTP layer carries transport status that is almost always "fine."
Everything Looks Like 200
The heading overstates slightly. The 2025-11-25 transport assigns 202 Accepted for notifications, 400 Bad Request for malformed input, 403 Forbidden for invalid origins, 404 Not Found for expired sessions, and 405 Method Not Allowed for unsupported methods. Transport-level failures use HTTP status codes correctly. The problem is that operation-level failures, the ones that matter most for debugging, are buried inside JSON-RPC.
The practical consequence: when you look at your APM dashboard, your Datadog metrics, your CloudWatch alarms, or your nginx access logs, most MCP requests appear as successful HTTP transactions at the operation level. A tool call that returns "code": -32601, "message": "Method not found" is a 200 OK at the HTTP layer. A resource read for a non-existent URI is a 200 OK. An internal server error that crashes the tool execution is a 200 OK.
Your 5xx alert does not fire because the error code is in the body, not in the status line. Your success rate metric reads 100% because every response has a 200 status code. Your error budget is unburned because the budget does not know about errors it cannot see.
This is the same mistake SOAP made. Every SOAP response is an HTTP 200. The fault is in the XML body. The monitoring infrastructure is blind. The operator has to write custom parsing logic to extract the real status from the envelope. MCP has replicated this pattern with JSON instead of XML, and it has the same operational consequences.
The Paradox of 404 for Expired Sessions
MCP uses 404 Not Found when a session has expired. But the endpoint exists. The /mcp URL is valid. The server is running. The session ID is stale. A 404 in this situation tells the operator "the resource does not exist," which is incorrect. The resource exists. The authorization has lapsed.
This should be 401 Unauthorized (the session is not valid) or 403 Forbidden (the session has expired and cannot be renewed). Instead, MCP chose the most misleading status code available: one that makes operations teams think their server is misconfigured when the actual problem is session lifecycle management.
The Monitoring Blindness
Consider an MCP server behind a reverse proxy. The proxy logs show:
POST /mcp 200 125ms
POST /mcp 200 89ms
POST /mcp 200 2340ms
POST /mcp 200 45ms
A slow request, sure. But all successful. Until someone parses the JSON-RPC body and discovers that the 2340ms request returned -32603 Internal error. The proxy did not know. The load balancer did not know. The rate limiter did not know. The only system that knew was the MCP client, which had to implement its own error tracking from scratch because every piece of infrastructural observability is blind to the actual status of the operation.
This is not a theoretical concern. It is a daily operational reality for any team running MCP servers in production behind standard HTTP infrastructure. They will build custom JSON-RPC parsing middleware for their monitoring, their alerting, and their dashboards. They will reinvent HTTP status codes, poorly, inside their own telemetry pipeline.
§7
7. Session Management via Custom Header
MCP's HTTP transport manages sessions with a custom header: Mcp-Session-Id. The server generates a session ID during initialization and returns it. The client includes it on every subsequent request. Session expiry is signaled by a 404 status code (which, as §6 argues, is a misleading choice). The 2025-11-25 revision adds an explicit DELETE request to terminate sessions.
HTTP already has session management mechanisms. Cookies (Set-Cookie / Cookie headers) are the standard session mechanism used by every web application in existence. The Authorization header carries authentication and identity information, which often includes or implies session context. Standard session patterns are understood by every proxy, load balancer, WAF, and monitoring tool in the HTTP ecosystem.
MCP chose to invent a new one.
What Happens at the Proxy
A corporate proxy that strips custom headers will remove Mcp-Session-Id. The server will see a request without a session, create a new one, and return a new Mcp-Session-Id in the response. The client will now have two session contexts. The old session will time out. The new session will be missing the history and state of the old one. The client will see mysterious resets. The server will see session churn.
A WAF that does not recognize Mcp-Session-Id will not apply session-based rate limiting to MCP requests. It will not correlate requests from the same session for anomaly detection. It will treat each request as independent, which is exactly what session management is designed to prevent.
An API gateway that needs to route sessions to the same backend (session affinity) will look for Cookie or Authorization headers. It will not find Mcp-Session-Id in its configuration, because no API gateway ships with knowledge of MCP's custom header. Every deployment will require custom configuration: an exemption from header stripping, a custom session affinity rule, a custom rate limiting policy. This is the reinvention tax.
How Other Transports Handle Sessions Naturally
The stdio transport handles sessions by process lifetime. The client launches the server. The session begins. The process exits. The session ends. No header. No ID. No expiry. The kernel enforces the lifecycle. This is the simplest possible session management, and it works because the transport is the session.
MQTT handles sessions with the Session Present flag in the CONNACK packet. The client connects with cleanSession=false. If the broker has a previous session for that client ID, it sets Session Present=true and resumes. If not, Session Present=false and the session starts fresh. The client does not need a custom header. The broker does not need to invent a session protocol. The transport specification already includes one.
gRPC manages RPC lifecycles with HTTP/2 streams. Each RPC is a stream. The connection is persistent. For cross-RPC session state (identity, contexts), you still need application-level metadata, but the transport naturally provides a per-call session boundary. No custom header required.
WebSocket handles sessions with the persistent connection itself. The connection is the session. Close the connection, close the session.
Every transport MCP could use already has a native session mechanism. The HTTP binding chose to invent one that works against, not with, the HTTP infrastructure it runs on.
The Streamable HTTP Improvement
The 2025-11-25 revision adds explicit DELETE for session termination, which is an improvement over relying on 404 to signal expiry. But Mcp-Session-Id remains a custom header. The proxy-stripping problem remains. The WAF-blindness problem remains. The reinvention tax remains. A DELETE /mcp with Mcp-Session-Id: abc123 is better than detecting session expiry from a 404, but it is still a custom mechanism that no standard HTTP infrastructure understands.
What a Transport-Agnostic Approach Would Do
A protocol that separates session semantics from transport bindings would define session lifecycle at the protocol level (initialize, resume, terminate) and let each binding express it in that transport's native idiom:
- HTTP:
Cookie or Authorization header for session identity. Standard session patterns. Every proxy understands.
- MQTT:
Session Present flag. Client ID is the session identifier. Every broker understands.
- stdio: Process lifetime. No header, no ID. The transport is the session.
- gRPC: HTTP/2 stream lifecycle per RPC. Cross-RPC sessions need application-level metadata.
- WebSocket: Connection lifetime. Persistent by definition.
Same protocol semantics. Different transport expressions. No custom headers. No reinvention tax. No proxy-stripping surprises.
§8
8. The SSE Stream Architecture — Clever but Fragile
The original MCP HTTP transport used Server-Sent Events (SSE) for server-to-client communication. A client GETs an SSE stream to receive server-initiated messages. A client POSTs JSON-RPC to send requests, and may receive an SSE response stream instead of a JSON body. Multiple SSE streams can be open simultaneously. Each SSE event carries an ID for resumption after disconnection.
This is clever. SSE is a well-understood technology with broad browser support and simple parsing rules. Using SSE to bridge the request-response gap over HTTP is the kind of creative engineering that looks elegant in a demo. In production, it is fragile in ways that compound.
The HTTP/1.1 Connection Limit
Most browsers limit HTTP/1.1 to six concurrent connections per origin (a browser implementation choice, not a protocol limit; RFC 9110 recommends 2, but browsers converged on 6). Each SSE stream is a long-lived connection. A client with two active MCP servers is using two persistent connections for SSE, plus a third for any POST. Add a third server and a fourth for notifications, and the connection pool is nearly exhausted before any normal HTTP traffic.
This is not a theoretical limit. It is a practical constraint that affects every HTTP/1.1 client, which is most of them. MCP does not require HTTP/2, and many deployment environments do not support it. The spec is optimized for a protocol feature (SSE) that works against the transport constraint (connection limits) that most clients will encounter.
Proxy Buffering and Timeout Mismatches
Corporate proxies and load balancers are trained by two decades of HTTP traffic to treat long-lived connections as problems. A proxy that buffers the response body will hold SSE frames until the stream closes, which in MCP's case may be never, turning real-time notifications into a silent void. A proxy with a 30-second idle timeout will kill SSE streams that pause for 31 seconds between events. A CDN with caching rules will not cache SSE responses (correctly), but it may refuse to pass them through without custom configuration.
These are the same class of problems that afflicted early WebSocket deployments, except WebSocket has a well-defined handshake (Upgrade header) that tells every intermediary "this is not a normal HTTP response." SSE has no such signal. It looks like a normal HTTP response that happens to be very long, and intermediaries treat it accordingly.
Disconnection Is Not Cancellation
When an SSE stream disconnects, the MCP server does not know whether the client intended to cancel the request or simply lost connectivity. The spec requires an explicit CancelledNotification to cancel a request. A disconnected client cannot send one. A client that reconnects with Last-Event-ID may find that the server has continued processing an expensive operation it no longer needs.
This is a real operational problem. A tool call that spawns a subprocess, queries a database, or calls an external API will continue consuming resources after the client's connection drops. The server has no way to distinguish "the client went away" from "the client is reconnecting." The timeout is implementation-defined, not protocol-specified. MCP's ping utility allows a connected client to check liveness, but there is no standard mechanism for the server to ask "are you still there?" when the client is disconnected, leaving SSE disconnection, server death, and intentional closure operationally ambiguous.
SSE Resumption as a Custom Cursor Protocol
MCP's SSE resumption uses per-event IDs and the Last-Event-ID header to resume streams after disconnection. This is a custom cursor protocol layered on top of SSE, which is itself layered on top of HTTP. The cursor is opaque to every intermediary. A proxy that buffers responses drops the cursor. A load balancer that fails over to a different server instance has no way to share cursor state unless the application implements it.
HTTP already has a mechanism for resuming interrupted transfers: Range requests and If-Range headers. These are understood by every proxy, CDN, and load balancer in the ecosystem. MCP's Last-Event-ID is a custom rebuild of the same concept, but opaque to the infrastructure that would need to support it.
What a Transport-Agnostic Approach Would Do
WebSocket does not have any of these problems because it was designed for persistent, bidirectional communication. A WebSocket connection is explicitly upgraded from HTTP. Proxies know not to buffer it. The connection is persistent and bidirectional. A single connection carries many message types, though multiplexing of independent logical streams requires a subprotocol. Disconnection is disconnection. The framing is built into the protocol.
MQTT does not have these problems because publish/subscribe is its native communication pattern. A client subscribes. The broker delivers messages. Disconnection triggers the will mechanism. Session state is maintained by the broker with the Session Present flag.
STDIO does not have these problems because process lifetime is connection lifetime.
Each transport has a native way to solve the problems MCP solved with SSE. The SSE layer exists because HTTP, alone among MCP's transports, does not have a native server-push mechanism. Rather than treating this as a signal that HTTP might not be the right transport for server-push communication, MCP built a custom protocol on top of SSE on top of HTTP, adding layers of indirection that duplicate features other transports provide natively.
The Streamable HTTP Update
The 2025-11-25 revision addresses the worst of these problems by making SSE optional. A POST can now return application/json directly for simple request-response interactions. SSE streams are only opened when the server needs to push data. This fixes the "every response is an SSE stream" problem and reduces connection consumption.
But SSE resumption, proxy buffering, disconnection semantics, and the custom cursor protocol remain. The underlying fragility is reduced, not eliminated. The server still decides whether a response is JSON or SSE, not the client's request structure. The transport semantics are still emergent rather than declared. The fundamental critique holds: MCP has built a custom streaming protocol on top of HTTP because HTTP alone cannot do what the application needs, when other transports can do it natively.
§9
9. Batch Without Boundaries
JSON-RPC 2.0 supports batch requests: an array of request objects sent in a single POST body. JSON-RPC defines this; earlier MCP revisions permitted it. The current 2025-11-25 Streamable HTTP transport narrows the POST body to a single JSON-RPC message, so this critique applies to the JSON-RPC specification itself and to implementations that accept batches, not to the latest HTTP binding. The server can still respond with a batch of responses. This is efficient on the wire. It is also dangerous in practice, because the spec imposes no constraints on batch size, no ordering guarantees, and no partial failure semantics.
The Partial Failure Problem
Consider a batch of ten requests sent in a single POST:
[
{"jsonrpc": "2.0", "id": 1, "method": "tools/list"},
{"jsonrpc": "2.0", "id": 2, "method": "tools/call", "params": {...}},
{"jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": {...invalid...}},
{"jsonrpc": "2.0", "id": 4, "method": "resources/read", "params": {...}},
{"jsonrpc": "2.0", "id": 5, "method": "prompts/list"},
...
{"jsonrpc": "2.0", "id": 10, "method": "tools/call", "params": {...}}
]
Request 3 has invalid parameters. What happens to requests 4 through 10?
The JSON-RPC 2.0 spec says the server should process each request independently and return a response for each. But "independently" is underspecified. Can the server process requests 4 through 10 while request 3 is failing? Can it short-circuit and return early errors for requests that depend on request 3's output? Can it reject the entire batch if any request is invalid?
MCP does not answer these questions. The spec says batching is supported, but it does not define partial failure semantics. Implementation-defined behavior is the protocol's way of saying "we have not thought about this." Every implementation will handle it differently, and clients that send batches will get different results from different servers.
Compare this to GraphQL, which has structured partial results. A query can succeed for some fields and fail for others, and the response includes both the successful data and the error details. The client knows exactly what worked and what did not. JSON-RPC batching has no equivalent. A batch response is an array of individual responses with no structural relationship between them. The client has to correlate by ID, check each response for errors, and decide what to do with the successful ones when some failed.
No Flow Control
There is no maximum batch size in the spec. A client can send a batch of 10,000 requests in a single POST body. The server has no way to signal "I can handle 50 requests at a time, please send the rest later." There is no backpressure mechanism. There is no maximum payload size defined by the protocol (only by transport-level limits like HTTP body size, which are typically configured by infrastructure, not by the application).
A batch of 10,000 tool calls is not a theoretical concern. An agent orchestrating a multi-step workflow may batch all tool calls for a single turn into one request. A client connecting to multiple servers may batch requests for resource discovery across all servers. Without flow control, the server's only options are to process the entire batch (and risk resource exhaustion), reject the entire batch (and lose the valid requests), or process as many as it can and silently drop the rest (which is the worst option, because it is the least detectable).
No Ordering Guarantees
JSON-RPC 2.0 specifies that the server should process requests in a batch independently. This means the server can process them in any order, including concurrently. A batch where request 1 creates a resource and request 2 reads it has no ordering guarantee. Request 2 might execute before request 1. The client cannot assume that the array order determines processing order.
This is documented, but it is a trap for client developers who naturally assume that array order means execution order. Every other messaging convention they have encountered (command-line arguments, function call arguments, pipeline stages) implies sequence. JSON-RPC batching explicitly discards sequence while using the data structure (an array) that most strongly implies it.
What a Protocol-Level Solution Would Look Like
A protocol that takes batch semantics seriously could define:
- Maximum batch size: a capability negotiated at initialization, so the client knows the server's limits.
- Partial failure semantics: the server returns a batch-level status (e.g.,
partial_success) and includes per-request results with individual status codes.
- Ordering guarantees: explicit opt-in, so the client can request sequential processing when needed and accept concurrent processing when not.
- Backpressure: a flow control mechanism (credit-based, window-based, or simply a
Retry-After in the response) that lets the server signal when the client should slow down.
None of these are novel. Message queue protocols, database batch APIs, and distributed task systems all solve these problems. MCP's batching inherits JSON-RPC 2.0's minimal semantics and adds nothing. The result is a feature that works for the happy path and is undefined for every failure mode.
§10
10. A Transport-Agnostic Alternative — Designing the Protocol First, Binding Later
This section is not a proposal to replace MCP. It's a design exercise: what does the same protocol look like when you separate the what (message semantics, resource model, session lifecycle) from the how (transport binding)? When you design the protocol first and let transports express its semantics in their own native idioms, the result is simpler, more composable, and naturally convention-respecting, because each binding uses the conventions of its own medium.
10.1 The Protocol Core — Semantics Independent of Transport
The first principle: define what the protocol means before deciding how it travels. MCP's current design conflates semantics with transport. POST /mcp with {"method":"tools/list"} ties the request type to an HTTP method and URL. A transport-agnostic protocol separates these concerns entirely.
Messages — The protocol uses JSON-RPC 2.0 as its message envelope, but only at the content layer. A message is a JSON object with a method and optional params and id. Whether it's a request (expects a response), a notification (fire-and-forget), or a response (the result of a prior request), this is a property of the message, not the transport.
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/list",
"params": {}
}
This object means "list available tools" regardless of whether it arrives over HTTP POST, an MQTT topic, a WebSocket frame, or stdin.
Resources — The protocol models the world as nouns with operations:
| Resource |
Operations |
Safe? |
Idempotent? |
tools |
list, get, call |
list/get: yes, call: no |
list/get: yes, call: no |
resources |
list, read, subscribe |
all yes |
all yes |
skills |
list, get |
yes |
yes |
session |
initialize, terminate |
no |
initialize: no, terminate: yes |
These are abstract operations. They don't mention URLs, methods, or topics. The protocol defines what "list tools" means. The transport decides how to express it.
Session lifecycle — A session is a logical relationship between client and server:
- Initialize: Client sends
initialize with protocol version and capabilities. Server responds with its version, capabilities, and optional session identifier.
- Operate: Client sends requests and notifications; server responds and pushes notifications. Messages are correlated by session, not by transport connection.
- Terminate: Either side ends the session. Pending requests may be cancelled or drained.
The session is an abstract concept. Some transports bind session to connection (stdio: process exit = session end). Others separate them (HTTP: cookies persist across connections; MQTT: session survives broker disconnect).
Capability negotiation — At initialization, each side declares what features it supports:
Client → Server: {
"method": "initialize",
"params": {
"protocolVersion": "2026-05",
"capabilities": {
"streaming": true,
"batching": true,
"transports": ["http", "mqtt"]
}
}
}
Server → Client: {
"id": 1,
"result": {
"protocolVersion": "2026-05",
"capabilities": {
"tools": true,
"resources": true,
"skills": true,
"streaming": true
},
"serverInfo": {"name": "weather-server", "version": "2.1"},
"sessionId": "abc123"
}
}
Capabilities are declared once. The transport binding is responsible for delivering these messages; the content is transport-independent, the binding is transport-specific.
Delivery guarantees (QoS) — Not all messages are equal:
| Level |
Meaning |
Use case |
| At-most-once |
Fire and forget. No retry. |
Progress notifications, telemetry |
| At-least-once |
Retry until acknowledged. Possible duplicates. |
Tool list fetch, resource read |
| Exactly-once (transport) |
Retry with protocol-level deduplication. Application-level exactly-once still requires idempotency keys. |
Tool invocation with side effects (send email, debit account) |
The protocol specifies the level. The transport binding implements it: MQTT uses QoS 0/1/2; HTTP uses Idempotency-Key for exactly-once; stdio assumes reliable delivery within the same machine.
Error model — Errors are structured, transport-independent:
{
"jsonrpc": "2.0",
"id": 1,
"error": {
"code": -32602,
"message": "Invalid params",
"data": {
"type": "https://mcp.example/errors/invalid-params",
"detail": "Parameter 'expression' is required"
}
}
}
The error code and structure are part of the protocol. How the transport signals "this is an error" varies: HTTP uses 400 status; MQTT publishes to an error response topic; stdio writes to stdout. Same JSON, different framing.
10.2 Request Correlation and Distributed Tracing
The protocol core defines two cross-cutting concerns that every binding must express, regardless of transport: request correlation and distributed tracing. These are not HTTP-specific concepts. They are protocol-level requirements that each binding implements in its native idiom.
Request correlation (X-Request-ID) — A unique identifier attached to every request, allowing a single logical operation to be traced across logs, services, and tool boundaries. The protocol core requires that every request carry a correlation ID. The binding decides how.
Distributed tracing (traceparent, tracestate) — The W3C Trace Context standard provides a globally unique trace identifier and a parent span, enabling distributed tracing across services. A tool invocation that fans out across three MCP servers should be traceable as a single operation. The protocol core adopts the W3C trace context model; each binding maps it to its transport's metadata mechanism.
The protocol rule:
Client:
May send a correlation ID.
Should send traceparent/tracestate only through tracing instrumentation.
Should not manually invent traceparent/tracestate values.
Gateway:
Must guarantee a correlation ID exists (generate if missing, preserve if present, replace if malformed).
Should continue trace context if traceparent is valid.
Should create a new trace context if traceparent is missing or invalid.
Should preserve tracestate only when it belongs to valid trace context.
Services:
Must log and forward the correlation ID.
Must propagate traceparent/tracestate when tracing is enabled.
Should rely on tracing instrumentation to create child spans.
Should include correlation ID and trace/span identifiers in log entries.
Transport-specific bindings:
The correlation ID and trace context are protocol concepts. Each binding carries them in the mechanism native to its transport:
- HTTP:
X-Request-ID and traceparent/tracestate headers. Standard W3C propagation.
- MQTT 5: User properties on the
CONNECT and PUBLISH packets. MQTT 5's Correlation Data field handles request-response correlation; user properties carry traceparent and tracestate.
- NATS: Headers in the message envelope (NATS supports headers natively). Reply subjects for correlation (NATS request-reply pattern), custom headers for trace context.
- gRPC: Metadata keys on the gRPC call. OpenTelemetry's gRPC propagation handles
traceparent natively.
- WebSocket: First message or subprotocol negotiation carries the correlation ID and trace context.
- ZeroMQ: A property-delimited prefix frame or a
ZAP-style metadata frame attached to the first message in a conversation.
- stdio: A JSON-RPC extension field in the message envelope (e.g.,
"meta": {"correlationId": "...", "traceparent": "..."}).
- Shared memory: A fixed-offset header in the ring buffer frame alongside the message type and length.
The protocol specifies what must be correlated and traced. Each binding specifies how. This separates observability (a protocol concern) from transport mechanics (a binding concern), which is exactly the separation the transport-agnostic model demands.
10.3 Protocol Versioning — Semver for Change, API Versions for Compatibility
The protocol core carries a version identifier. The question is what kind. Two approaches serve different purposes:
Semantic versioning (MAJOR.MINOR.PATCH) communicates what changed. A breaking change bumps MAJOR (the protocol semantics are different). A new capability bumps MINOR (new features, backward-compatible). A bug fix bumps PATCH. Clients and servers inspect the version to know whether they can interoperate at all.
API versioning (v1, v2, v3) communicates which contract you're speaking. /v1/tools/list and /v2/tools/list are different endpoints that coexist. Clients choose which contract to use. Servers may support multiple contracts simultaneously. Migration is gradual, not cutover.
The protocol needs both, and each serves a distinct purpose:
-
Semver is the protocol version. It appears in the initialize handshake (or the MQTT connect topic, or the gRPC metadata, or the shared memory header). It tells the other side whether the message semantics it's about to send are understood. Semver is for the protocol core: the resource model, the operation set, the session lifecycle rules. A client that speaks 2.1.0 can work with a server that speaks 2.3.0 (same major), but a server that speaks 3.0.0 has broken compatibility and the client must decide whether to upgrade or find another server.
-
API versioning is the binding version. It appears in the HTTP URL path (/v1/tools/list), the MQTT topic prefix (mcp/v1/{session-id}/tools/list), the gRPC package name (mcp.v1.MCP), or the shared memory schema version field. API versioning is for the wire contract: the field layout, the topic structure, the endpoint shape. Two clients on different API versions can talk to the same server if it supports both. The server routes by version prefix; the protocol core handles the message.
The rule: Semver governs interop (can we talk at all?). API versioning governs coexistence (can we talk alongside each other?).
In practice:
HTTP binding:
/v1/tools/list ← API version in the URL path
X-MCP-Protocol-Version: 2.3.0 ← Protocol semver in a header
MQTT 5 binding:
Topic: mcp/v1/{session-id}/tools/list ← API version in topic prefix
User property: protocol-version=2.3.0 ← Protocol semver in metadata
gRPC binding:
Package: mcp.v1 ← API version in package name
Metadata: protocol-version=2.3.0 ← Protocol semver in call metadata
stdio binding:
JSON-RPC: {"method": "initialize", "params": {"protocolVersion": "2.3.0"}}
API version is implicit (the protocol defines the message shape; version is the semver)
Shared memory binding:
Ring buffer header: [version=1][protocol=2.3.0][type][length]
API version and protocol semver both in the frame header
The initialize handshake uses semver to negotiate. The binding uses API versioning to route. A server that supports v1 and v2 simultaneously can serve both without breaking existing clients. When v1 is deprecated, clients migrate at their own pace. The protocol core doesn't change; only the binding's wire format changes.
MCP today uses a single version string (2025-11-25 for the latest, 2024-11-05 for the original) that conflates protocol version and API version. The date-based versioning tells you when the spec was published, not whether a client can interoperate with a server. Two date strings across two revisions, and there is still no way to tell whether a 2025-11-25 client can talk to a 2024-11-05 server without reading the spec. Date-based versioning is a calendar, not a compatibility contract.
10.3b Long-Running Operations — Tasks at the Protocol Level
The 2025-11-25 MCP specification introduces Tasks as an experimental feature: durable state machines for operations that cannot complete within a single request-response cycle. A client submits a request with a task parameter. The server returns a task ID. The client polls tasks/get, tasks/result, or subscribes to status notifications. Tasks have a lifecycle: working, completed, failed, cancelled, input_required. They have TTLs and poll intervals.
This is a protocol-level concern, not a transport-level concern, and it belongs in the protocol core. A tool call that initiates a database migration, a batch rendering job, or a multi-step approval workflow is long-running regardless of which transport carries the request. The task lifecycle (submit, poll, receive result, cancel) is the same whether the request arrives over HTTP, MQTT, or stdio. The transport binding only affects how the client discovers the task state: HTTP via polling or SSE notifications, MQTT via retained messages on a status topic, gRPC via server streaming.
In a transport-agnostic design:
- HTTP binding:
POST /tasks creates a task, returns 202 Accepted with a Location header for polling. GET /tasks/{id} checks status. DELETE /tasks/{id} cancels. SSE notifications push state changes.
- MQTT binding: The client publishes to
mcp/{session-id}/tasks/submit. The server publishes status updates to mcp/{session-id}/tasks/{id}/status. Task results appear on mcp/{session-id}/tasks/{id}/result. Retained messages give immediate state without polling.
- stdio binding: The client sends a JSON-RPC request with a
task parameter. The server responds with a task ID and optionally streams progress notifications over the same connection. Process lifetime bounds the task: if the server process exits, the task is cancelled.
The key design principle: the task lifecycle is a protocol concept. The delivery mechanism is a transport concern. HTTP polls, MQTT pushes, stdio streams. The client should not need to know which transport it is using to understand that a task is in progress, has completed, or has failed.
10.3c Protocol Utilities — Cross-Cutting Concerns at the Binding Level
The 2025-11-25 MCP specification defines several cross-cutting protocol utilities: cancellation (notifications/cancelled), progress tracking (progressToken and progress notifications), ping (connection health), completion (argument autocompletion), logging (structured messages with severity levels), and pagination (cursor-based list traversal).
These are protocol-level concerns that any alternative must address. The question is whether they belong in the protocol core or in the binding layer. The same pattern applies here as with authorization, session management, and error signaling: the semantics are protocol-level, but the expression is transport-specific.
Cancellation. The protocol needs a way for a client to say "stop what you're doing." The transport decides how the stop signal arrives. HTTP sends DELETE /tasks/{id} or a cancellation notification on the SSE stream. MQTT publishes to mcp/{session-id}/tasks/{id}/cancel. gRPC cancels the stream. stdio sends a cancellation notification on the same channel. The client should not need to know which mechanism delivers the cancellation. It should say "cancel task X" and have the protocol core translate that into the transport's idiom.
Progress. Long-running operations need progress reporting. The protocol defines a progress token and a percentage. The transport delivers it. HTTP sends SSE events. MQTT publishes to a progress topic. gRPC streams partial responses. stdio sends notifications. Same data, different delivery.
Ping. Connection health checks are transport-native. HTTP has GET /health. MQTT has PINGREQ/PINGRESP. gRPC has health checking. stdio has process liveness. WebSocket has ping/pong frames. The protocol should not define its own ping mechanism; it should use the transport's native health check and translate it into a protocol-level liveness signal.
Completion. Argument autocompletion for prompts, tools, and resources is a semantic feature that maps cleanly across transports. HTTP gets GET /tools/{name}/complete?q=.... MQTT gets a request-reply on mcp/{session-id}/tools/{name}/complete. The protocol defines the completion request and response; the binding defines the routing.
Logging. Structured log messages with severity levels are useful for debugging. HTTP sends them as SSE notifications or a log endpoint. MQTT publishes to a log topic. stdio writes to stderr. The protocol defines what constitutes a log message. The binding defines how it reaches the consumer.
Pagination. Cursor-based list traversal is a data concern, not a transport concern. tools/list?cursor=abc in HTTP, mcp/{session-id}/tools/list with a cursor field in MQTT. The protocol defines the pagination model. The binding defines the encoding.
In each case, the pattern is the same: the protocol core specifies what the utility does (cancel a task, report progress, check liveness, complete an argument, emit a log, paginate a list). Each binding specifies how the utility is expressed in that transport's idiom. This is the transport-agnostic thesis applied to protocol utilities: define the semantic operation once, implement the transport expression many times.
10.4 The Transport Binding Interface
Every transport binding implements five primitives. The protocol core doesn't care how, only that they exist.
| Primitive |
Signature |
Description |
connect(target) |
→ session |
Establish a session with a server at the given target |
send(session, message, qos) |
→ ack |
Deliver a message with the requested delivery guarantee |
receive(session) |
→ message stream |
Consume messages from the server |
close(session) |
→ void |
Terminate the session |
capabilities() |
→ transport features |
Declare what this binding supports (QoS levels, streaming, etc.) |
A server implements its protocol logic once. Transport bindings are adapters: they take protocol messages and map them to the transport's native primitives. Adding a new transport means implementing these five functions, not changing the protocol.
10.5 HTTP Binding — RESTful Native Semantics
HTTP has a rich vocabulary. The binding uses it fully rather than tunneling through it.
Connect — POST /session with InitializeRequest body. Server returns 201 Created with Set-Cookie: session=...; HttpOnly; Secure; SameSite=Strict. The cookie is standard HTTP session management, not a custom header.
Send — Each send call maps to a native HTTP request:
| Protocol Operation |
HTTP Method |
URL |
Status |
tools/list |
GET |
/tools |
200 OK |
tools/get |
GET |
/tools/{name} |
200 OK |
tools/call |
POST |
/tools/{name}/call |
200 OK or 202 Accepted |
resources/list |
GET |
/resources |
200 OK |
resources/read |
GET |
/resources/{uri} |
200 OK |
resources/subscribe |
POST |
/resources/{uri}/subscribe |
200 OK (SSE) |
skills/list |
GET |
/skills |
200 OK |
skills/get |
GET |
/skills/{name} |
200 OK (envelope with parts URLs) |
session/terminate |
DELETE |
/session |
204 No Content |
Cacheability — GET requests are cacheable. The server sets Cache-Control and ETag. CDNs and browsers cache tool lists and skill catalogs without hitting the server. Tool invocations use Cache-Control: no-store.
GET /tools HTTP/1.1
HTTP/1.1 200 OK
Cache-Control: max-age=3600, stale-while-revalidate=600
ETag: "v2.4-2026-05-01"
{"tools": [{"name": "weather", "description": "Get current weather"}]}
QoS via Idempotency — At-least-once: retry the request (safe methods are naturally idempotent). Exactly-once: include Idempotency-Key header; server deduplicates.
POST /tools/email/send HTTP/1.1
Idempotency-Key: 8f3a91b2-4c7d-4e1f-a9b6-c2d8e5f0a1b3
{"to": "user@example.com", "subject": "Hello"}
Streaming — receive() uses Server-Sent Events. The client requests streaming by setting Accept: text/event-stream. Without it, the response is a single JSON object. Streaming is opt-in, not a hidden fork.
GET /stream HTTP/1.1
Accept: text/event-stream
HTTP/1.1 200 OK
Content-Type: text/event-stream
event: notification
data: {"method": "notifications/progress", "params": {"progress": 0.5}}
Batching — POST /batch with Content-Type: multipart/mixed. Each sub-request has its own HTTP semantics and status code. One failure doesn't poison the batch.
Sessions — Session identity travels via Cookie or Authorization: Bearer. Sessions expire naturally via Max-Age. 401 Unauthorized means "re-initialize," not "URL doesn't exist."
10.6 MQTT Binding — Topic-Based Native Semantics
MQTT's native idioms are publish/subscribe, topics, QoS levels, retained messages, and Last Will. The binding uses all of them.
Connect — The client connects to the MQTT broker. The server is also a client of the same broker. The connect primitive establishes the MQTT connection; session initialization follows as a protocol-level message on the topic.
Topics — The abstract resource model maps directly to MQTT topics. The client publishes requests and subscribes to responses. The server subscribes to requests and publishes responses.
Direction: Client → Server (requests)
mcp/{session-id}/tools/list
mcp/{session-id}/tools/{name}/get
mcp/{session-id}/tools/{name}/call
mcp/{session-id}/resources/list
mcp/{session-id}/resources/{uri}/read
mcp/{session-id}/skills/list
mcp/{session-id}/skills/{name}
mcp/{session-id}/session/initialize
mcp/{session-id}/session/terminate
Direction: Server → Client (responses and notifications)
mcp/{session-id}/tools/list/response
mcp/{session-id}/tools/{name}/get/response
mcp/{session-id}/tools/{name}/call/response
mcp/{session-id}/notifications/{type}
mcp/{session-id}/stream
QoS — MQTT QoS maps directly to protocol delivery guarantees:
| Protocol QoS |
MQTT QoS |
Behavior |
| At-most-once |
QoS 0 |
Publish, no ack |
| At-least-once |
QoS 1 |
Publish, wait for PUBACK |
| Exactly-once |
QoS 2 |
Four-way handshake: PUBLISH → PUBREC → PUBREL → PUBCOMP |
A tools/call that debits an account publishes with QoS 2 for transport-level exactly-once delivery (the application must still provide idempotency keys or deduplication for business-level guarantees). A progress notification publishes with QoS 0. The broker handles retry and deduplication; the protocol doesn't reinvent it.
Retained messages — The server publishes its tool catalog to mcp/{session-id}/tools/list with the retain flag. Any client that subscribes to that topic immediately receives the last published catalog. Capability discovery is always available without a round-trip initialize, though protocol version and client capability negotiation would still need a handshake.
Last Will — On connect, the server sets a Last Will message on mcp/{session-id}/status with payload "offline". If the server disconnects ungracefully, the broker publishes this message. The client gets unambiguous server-death detection: no ambiguity between "SSE stream ended" and "server crashed."
Multiple transports simultaneously — A server can expose the same protocol over HTTP and MQTT. A web-based client uses HTTP. An IoT sensor uses MQTT. Both talk to the same protocol core. The server implements the core once; transport bindings are adapters.
10.7 stdio Binding — Process-Native Semantics
The simplest binding. Already correct in MCP, included here to show how it fits the transport-agnostic model.
Connect — The client launches the server as a subprocess. connect(target) spawns the process. No network, no authentication: the trust boundary is the OS process model.
Send — JSON-RPC messages over stdin, one per line. The send primitive writes to the process's stdin using newline-delimited JSON.
Receive — Read JSON-RPC responses and notifications from stdout. The receive primitive reads from stdout, line by line.
Close — Terminate the process. The session ends with the process.
Capabilities — Stdio is reliable (same machine, no network), ordered (pipe semantics), and connection-oriented (process lifetime = session lifetime). No streaming (the process is the stream). No caching (local process). No batching (send one message, get one response).
10.8 gRPC Binding — Structured RPC with Native Streaming
gRPC gives generated clients, bidirectional streaming, and protobuf efficiency. The binding maps the protocol core to gRPC's service definition.
service MCP {
// Tools
rpc ListTools(Empty) returns (ToolList);
rpc GetTool(GetRequest) returns (ToolDefinition);
rpc CallTool(ToolCallRequest) returns (stream ToolCallEvent);
// Resources
rpc ListResources(Empty) returns (ResourceList);
rpc ReadResource(ResourceRequest) returns (ResourceContent);
rpc SubscribeResource(ResourceRequest) returns (stream ResourceUpdate);
// Skills
rpc ListSkills(Empty) returns (SkillList);
rpc GetSkill(SkillRequest) returns (SkillDefinition);
// Session
rpc Initialize(InitRequest) returns (InitResult);
rpc Terminate(Empty) returns (Empty);
}
Streaming — Each call/invoke/subscribe RPC returns a server stream. gRPC's native streaming replaces SSE without the fragility.
QoS — gRPC deadlines and retries handle exactly-once and at-least-once semantics natively.
Sessions — gRPC metadata carries session identity. Channel connectivity state (idle, connecting, ready, transient-failure, shutdown) provides per-RPC lifecycle signals, but cross-RPC session state requires application-level context propagation.
10.9 WebSocket Binding — Persistent Bidirectional Channels
WebSocket provides persistent, bidirectional communication over a single TCP connection, upgraded from HTTP. The binding uses this natively.
Connect — Client opens ws://server/mcp (or wss://). The HTTP upgrade request carries the InitializeRequest as a query parameter or the first message on the opened socket.
Send — Client sends JSON-RPC messages as WebSocket text frames. Each frame is one message.
Receive — Client listens on the socket. Server-pushed messages arrive as text frames, no SSE required.
Sessions — The WebSocket is the session. When the socket closes, the session ends. gRPC and HTTP need session headers; WebSocket doesn't.
Streaming — Native to WebSocket. Server-to-client streaming is just sending multiple frames. No SSE, no separate endpoint.
QoS — WebSocket has no built-in QoS. The protocol binding must handle idempotency and retry:
- At-most-once: send and forget
- At-least-once: client tracks message IDs, resends on disconnect
- Exactly-once: idempotency keys in the JSON-RPC
params.id field
10.10 NATS Binding — Subject-Based Native Semantics
NATS subjects are the natural routing layer for MCP's resource model. The binding uses subject hierarchies, wildcard subscriptions, and JetStream for durable semantics.
Connect — Both client and server connect to the NATS server (or cluster). The connect primitive establishes the NATS connection; session initialization follows as a protocol-level message.
Subjects — The abstract resource model maps directly to NATS subjects:
Direction: Client → Server (requests)
mcp.{session-id}.tools.list
mcp.{session-id}.tools.{name}.get
mcp.{session-id}.tools.{name}.call
mcp.{session-id}.resources.list
mcp.{session-id}.resources.{uri}.read
mcp.{session-id}.skills.list
mcp.{session-id}.skills.{name}
Direction: Server → Client (responses and notifications)
mcp.{session-id}.tools.list.response
mcp.{session-id}.tools.{name}.call.response
mcp.{session-id}.notifications.{type}
Request-reply — NATS' built-in request-reply pattern maps directly to MCP's request-response semantics. No correlation IDs or reply subjects to manage: the protocol handles it.
JetStream for durability — MCP invocations that charge money or side-effect the world use JetStream streams for durable delivery with deduplication and replay. Progress notifications use core NATS for best-effort delivery.
Wildcard subscriptions — A client subscribes to mcp.{session-id}.tools.> and receives every tool event. No separate notification channel, no SSE endpoint.
10.11 ZeroMQ Binding — Pattern-Native Semantics
ZeroMQ's socket types map directly to MCP's communication modes. The binding uses each socket pattern for what it's designed for.
Connect — The client creates the appropriate ZeroMQ sockets and connects (or binds) to the server's endpoint. No broker, no configuration file.
Socket patterns:
REQ/REP — synchronous tool calls (invoke tools/call, get a response)
PUB/SUB — server-to-client notifications (progress updates, resource changes)
PUSH/PULL — task distribution (fan out tool invocations across workers)
ROUTER/DEALER — asynchronous bidirectional messaging (MCP's full protocol, with correlation built in)
MCP currently squeezes all of these through a single HTTP POST endpoint. ZeroMQ's socket types give each mode its own wire pattern: the right tool for each job.
Topology flexibility — Same binary, same protocol logic. Change the endpoint string to change the topology: ipc:// for local, tcp:// for a team server, ROUTER/DEALER for production fan-out. No separate implementation per transport.
Sessions — ZeroMQ has no built-in session concept. The binding implements session lifecycle at the protocol level: an initialize message creates a session identity, a terminate message ends it. Connection state and session state are separate: a ROUTER socket can survive client disconnects.
10.12 Shared Memory / Ring Buffer Binding — Zero-Copy Native Semantics
When client and server share address space, the transport binding can use memory directly. The binding uses a lock-free ring buffer with a schema-defined frame layout.
Connect — One process creates the shared memory region (POSIX shm_open or Windows CreateFileMapping), the other opens it. The connect primitive maps the region into the process's address space. No network, no kernel transitions for data.
Send — The producer writes a frame into the ring buffer at the write cursor. A frame is: [length][message-type][json-bytes]. A memory barrier (or the ring buffer's sequence counter) signals the consumer.
Receive — The consumer reads frames from the read cursor. The same barrier/sequence ensures the consumer only sees committed frames. No polling needed on modern hardware: the consumer can use futex_wait or WaitOnAddress to sleep until signaled.
Close — Unmap the shared region. Session terminates.
Capabilities — Shared memory is: reliable (same machine, no network), ordered (sequential writes), zero-copy with binary schemas (no JSON serialization with FlatBuffers/Cap'n Proto), and sub-microsecond latency in optimized designs (can avoid syscalls on the hot path). Caching is irrelevant (the data is already local). Streaming is built-in (the ring buffer is the stream).
Schema flexibility — The ring buffer can carry JSON-RPC frames (for simplicity) or binary frames using FlatBuffers/Cap'n Proto (for zero-copy deserialization). The binding chooses at capability negotiation time.
10.13 The Composite Table — Eight Transports, Eight Concerns
| Concern |
HTTP |
MQTT |
stdio |
gRPC |
WebSocket |
NATS |
ZeroMQ |
Shared Memory |
| Bidirectional |
SSE hack |
Native |
Full duplex |
Native streams |
Native |
Native |
Depends on pattern |
Native |
| Streaming |
SSE |
QoS levels |
Pipe |
gRPC streams |
Native frames |
JetStream |
PUB/SUB |
Ring buffer |
| Caching |
GET: yes |
Retained |
No |
No |
No |
JetStream |
No |
Direct |
| Session |
Custom header |
Broker session |
Process life |
Metadata |
Socket life |
NATS connection |
Protocol-level |
Protocol-level |
| Discovery |
URL |
Topic |
CLI args |
Service config |
URL |
Subject |
ZMQ_DISCOVERY |
shm name |
| Auth |
TLS + header |
Broker auth |
OS process |
TLS + token |
TLS + header |
JWT + account |
CurveZMQ |
OS permissions |
| Batch |
multipart/mixed |
Topic (ordered) |
Newline-delim |
gRPC stream |
Frame batch |
JetStream |
PUSH/PULL |
Contiguous frames |
| Back-pressure |
HTTP 429 |
QoS / drop |
Pipe buffer |
Flow control |
Close frame |
JetStream limits |
HWM |
Ring size |
The table makes the point: no single transport covers all concerns. The protocol core is what stays constant; the binding adapts. A protocol that only defines two transports has blind spots the size of MQTT, gRPC, and shared memory.
§11
11. What MCP Gets Right (Revisited)
The criticism above is structural, not ideological. MCP identified a real problem and proposed a workable solution. The good parts deserve recognition even as the gaps are noted. But recognizing what works requires being honest about what makes it work, because the good parts and the gaps share the same root.
The Protocol Semantics Are Sound
The problem statement is correct. LLMs need a standard way to discover and invoke external capabilities. The three-primitive model (tools, resources, prompts/skills) is well-chosen. Each primitive does something genuinely different: tools execute, resources read, skills compose. Collapsing them into a single "capability" abstraction would lose information that matters to both the client and the human operating it.
JSON-RPC 2.0 is a defensible choice for the message envelope. Simple, widely implemented, language-agnostic. The message format is not the problem.
The capability model is sound. Declaring what you support at initialization, then operating within those bounds, is the right instinct. It prevents "try and fail" patterns and creates auditable session contracts.
Stdio transport for local tools is elegant. Process-launch, stdin/stdout, zero configuration. This is Unix philosophy done right.
These are the protocol semantics. They are the "what" of MCP: what the messages mean, what the primitives do, what the capability contract looks like. And they are good.
The Gaps Are All Transport
Every criticism in the preceding sections targets the same root cause: MCP's protocol semantics are coupled to transport decisions that do not earn their independence.
The single-endpoint antipattern (§4) is a transport decision that obscures the protocol's clear operation semantics behind a single URL. Content negotiation as a transport fork (§5) is a transport decision that uses an HTTP content-level mechanism to select between two fundamentally different communication patterns. Status codes that lie (§6) are a transport decision that buries protocol-level errors inside an opaque body. Custom session headers (§7) are a transport decision that reinvents a mechanism every other HTTP-based protocol already solved. The SSE architecture (§8) is a transport decision that builds a custom streaming protocol because HTTP alone cannot do what the application needs. Batching without boundaries (§9) is a protocol gap that would be easier to address if the transport provided native flow control.
The pattern is consistent. The protocol semantics are sound. The transport decisions create the problems. This is precisely what a transport-agnostic redesign would preserve and protect: the semantics stay, the transport decisions become pluggable bindings, and each binding expresses the protocol's meaning in its transport's native idiom.
The Stakes
The real world has MQTT sensors, gRPC microservices, WebSocket browsers, NATS event streams, ZeroMQ local meshes, and shared-memory local tools. MCP's protocol semantics could serve all of these environments. Its transport binding currently serves two. The gap is not theoretical. It is the distance between "a protocol that works for local tools and network servers" and "a protocol that works everywhere tools and context are needed."
§12
12. The Road Not Taken
Frost's poem is often misread. The narrator does not take the road less traveled. Both roads are "worn... really about the same." The difference is only that one is chosen and the other is not, and that choice makes all the difference.
MCP chose the road that LSP walked: stdio and HTTP, single endpoint, RPC semantics, custom sessions. It is a well-worn path. It works. It shipped. It crossed the coordination threshold that matters more than architectural purity. Acknowledging this is not hedging. It is being honest about what adoption requires.
The other road, the one not taken, is the transport-agnostic protocol core: message semantics defined once, expressed natively over HTTP, MQTT, gRPC, WebSocket, NATS, ZeroMQ, and shared memory. Each binding uses its medium's conventions. Each transport contributes what it is best at. The protocol stays the same.
The technology for this exists today. The architecture is well-understood. The missing piece is the specification work: defining the abstract operations, the message semantics, the QoS levels, and then writing bindings for each transport that preserve the protocol's meaning in the transport's native idiom. This is not theoretical work. The bindings in §10 show that it can be done. The question is whether the ecosystem will invest in doing it.
The practical path forward is not to replace MCP. It is to refactor it. The protocol semantics (tools, resources, capabilities, initialization) are sound. The transport bindings are where the coupling hurts. A specification effort that extracted the protocol core from the HTTP binding, defined the abstract operations and their QoS requirements, and then wrote a new HTTP binding that used RESTful semantics, a MQTT binding that used publish/subscribe, a gRPC binding that used protobuf and streaming, and so on, would be an incremental improvement that preserves everything MCP got right and fixes everything it got wrong.
This is the road not taken. It is still there. And the signal that it is worth taking is precisely the adoption that MCP has already achieved. A protocol that nobody uses cannot be improved. A protocol that millions of developers depend on can be. The opportunity to make MCP transport-agnostic exists because MCP succeeded. The road not taken is open because the road that was taken proved the destination was worth reaching.
ADDENDUM
Addendum: The Missing Middle — Actor Orchestration Above MCP and A2A
I decided to include this, as an addendum, mostly because it was a natural next step for this body of work. If I can get my thoughts together to flesh out the intersection of Actors and MCP/A2A I will post a follow-up. In any event, below are my initial thoughts on the matter.
The preceding essay argues that MCP's protocol core should be separated from its transport bindings. Define the message semantics once, express them natively over HTTP, MQTT, stdio, gRPC, and the rest. The protocol stays the same. The transport adapts.
This addendum argues that there is a second separation worth making, one layer above the protocol. It is not about how messages travel. It is about how they are routed, supervised, composed, and trusted within an agent's execution context. Neither MCP nor Google's Agent-to-Agent (A2A) protocol addresses this layer. The actor model fills the gap.
The Problem Neither Protocol Solves
MCP connects a client to a server. A client calls a tool. A server provides resources. The interaction is point-to-point, request-response, and stateless between calls (session state is transport metadata, not protocol state).
A2A connects an agent to another agent. An agent advertises capabilities. Another agent discovers and invokes them. The interaction is peer-to-peer, potentially stateful, and negotiated.
Both protocols address connection. Neither addresses composition. A tool call that needs three sub-operations, each with different trust levels, different timeouts, and different failure modes, has no explicit model in either protocol. The composition is left to application code. Every agent framework reinvents it.
Consider what happens when an LLM decides it needs to:
- Read a file from a resource server
- Execute a shell command via a tool server
- Summarize the result using a remote agent
MCP handles steps 1 and 2 as independent, stateless calls to two different servers. A2A could handle step 3 as an agent-to-agent negotiation. But the orchestration, the routing of step 1's output into step 2's input, the supervision of step 2's failure mode, the trust boundary between reading a file and executing a command, that orchestration lives nowhere. It is application code. It is not protocol. It is not specified. It is not composable.
This is the missing middle.
The Layered Stack
A complete system for LLM-to-external-capability interaction has four layers:
┌─────────────────────────────────┐
│ Application / Agent │ Claude, Cursor, your custom agent
├─────────────────────────────────┤
│ Actor Orchestration Layer │ Routing, supervision, composition, trust inheritance
├───────────────┬─────────────────┤
│ MCP │ A2A │ Tool calling Agent-to-agent
├───────────────┴─────────────────┤
│ Transport │ stdio, HTTP, MQTT, gRPC, NATS...
└─────────────────────────────────┘
The essay's core argument addresses the bottom layer: the transport should be a pluggable binding, not an inseparable part of the protocol. This addendum addresses the second layer: the orchestration above the protocols should be actor-model-based for the same reason the transport should be pluggable. The message is the unit of work. Where it executes is a routing decision, not a protocol concern.
Per-Task Actors
When a tool is invoked, an actor is spawned for that invocation. The actor is the task. Its lifecycle is the task's duration. When it finishes, it sends the result and terminates.
This eliminates MCP's session management problem. There is no Mcp-Session-Id header because there is no session that outlives the task. There is no custom expiration signaling because the actor's termination IS the expiration. Process death is task completion. The Erlang/OTP lesson applies: if your lifecycle model is "thing lives until it's done," you do not need a session protocol. The runtime IS the protocol.
Per-task actors also solve the "batch without boundaries" problem. A batch of ten requests becomes ten actors under one supervisor. If request 3 fails, the supervisor decides what happens to requests 4 through 10. Retry? Escalate? Cancel the remaining? The decision is in the supervisor's logic, not scattered across error handling in application code. The supervisor IS the boundary that MCP's batching lacks.
Per-User Actors
A longer-lived actor maintains user context across multiple tool calls. Its mailbox is the session state. No custom header required. The actor accumulates capability, preference, and trust context over its lifetime.
This is what MCP tries to solve with Mcp-Session-Id, except the actor model gives it to you by construction. An actor's state is its state. You do not need to serialize it into a header and deserialize it on the other side. You do not need to worry about session expiration because the actor's supervisor handles that. You do not need to reinvent session semantics per transport binding because sessions are not a transport concern. They are an execution concern, and the execution layer handles them.
Capability-Aware Routing
MCP routes by address. You call a server at its URL. A2A routes by capability discovery. You find an agent that advertises what you need. The actor orchestration layer routes by capability and context simultaneously.
The routing decision is not just "which actor handles this message type" but "which actor has the right capabilities and trust level for this user, at this point in the workflow, given what has already happened." A tool call for reading a file and a tool call for executing a shell command go to different actors not because the address is different but because the trust domain is different. Context-aware routing is what transforms a flat client-server call into a composed workflow.
This also means capability negotiation can happen at runtime, per-request, rather than once at initialization. A new actor type can register mid-session. A per-user actor can accumulate capabilities over time. The routing fabric adapts. MCP's initialize handshake declares everything up front and freezes it for the session. The actor model lets the capability space evolve as the work evolves.
Trust Inheritance
The essay's §1.1 defines three trust domains: information (markdown, JSON, CSV), execution (Python scripts, shell commands), and artifact (URL-referenced resources not in the envelope). In the actor model, trust flows down the supervision tree.
An actor spawned by an information-trusted parent inherits that trust domain. It can fetch markdown references, read CSV files, parse JSON schemas. It cannot execute shell scripts. It cannot delegate to an execution-trusted actor without explicit elevation.
An actor in an execution context can spawn sub-actors that execute code. But the elevation is explicit. It requires a trust boundary crossing that is logged, supervised, and revocable. The parent actor's supervisor approves the elevation. If the sub-actor fails or misbehaves, the supervisor contains the blast radius.
Trust flows down. Never up. A low-trust actor cannot escalate its own privileges. It can request elevation, but the decision belongs to its supervisor. This is the same principle as Unix's privilege separation, applied to the composition of tool calls within an agent's execution context.
Unification with A2A
If the routing fabric is capability-aware and location-transparent, the distinction between "calling a local tool" and "calling a remote agent" becomes invisible to the caller. The message is the same. The routing fabric decides the path.
An A2A message is just an actor message that crosses a cluster boundary. The orchestration layer does not need to know whether the recipient is a subprocess on the same machine, a tool server on the local network, or a remote agent in another cloud. The addressing scheme is uniform. The trust and supervision model is uniform. The failure handling is uniform.
This unification is not theoretical. It is the same location transparency that the essay's transport-agnostic core argues for, applied one layer up. Transport bindings should not leak into the protocol. Execution locations should not leak into the orchestration. In both cases, the principle is the same: the message is the unit of work, and where it goes is a routing decision.
The Erlang Question
Per-task actors, supervision trees, location transparency, message passing, trust inheritance via the process tree. This is Erlang/OTP's entire thesis, proven in production for three decades in telecommunications, messaging, and distributed databases.
Is the actor orchestration layer just Erlang with a different name?
The pattern is the same. The application is new. Erlang's actors manage phone switches and message routers. These actors manage LLM tool calls and agent workflows. The difference is not in the pattern but in the routing decision. An Erlang actor routes based on message type and process registry. An orchestration actor routes based on capability, trust domain, and contextual state. The "which actor handles this" question has a richer answer in the LLM orchestration domain because the capability space is richer, the trust boundaries are more nuanced, and the context of the calling agent matters in ways that a phone switch does not need to consider.
Whether this is convenient alignment, eventuality, or evolution is a question for implementers to answer. The pattern is sound regardless. The actor model has been proven at scale. Applying it to the LLM orchestration problem is not a speculative leap. It is a recognition that the problem has the same shape as problems the actor model has already solved.
Connections Back to the Essay
The essay argues that MCP's protocol core should be transport-agnostic. Define the message semantics once. Bind them to transports. The protocol stays the same. The transport adapts.
This addendum argues that the orchestration layer above it should be actor-model-based for the same reason. The message is the unit of work. Where it executes is a routing decision. How it is supervised is a supervision tree decision. What trust it carries is an inheritance decision. None of these are transport concerns. None of them are protocol concerns. They are orchestration concerns, and the actor model addresses them by construction rather than by convention.
The two arguments reinforce each other. Transport-agnostic protocol means the message format does not change when the transport changes. Actor orchestration means the routing and supervision do not change when the execution location changes. Together, they push all the variability down (transport bindings) and all the control up (supervision and trust), leaving the protocol core in the middle, clean and stable, doing what protocols should do: defining what the messages mean, not how they get there or who handles them.
APPENDIX
Appendix — Sources and References
Every external standard, specification, protocol, concept, and published work cited in this essay, with canonical URLs.
Specifications and RFCs
| Reference |
URL |
Sections |
| JSON-RPC 2.0 |
https://www.jsonrpc.org/specification |
§1, §4, §6, §9, §10 |
| RFC 793 — Transmission Control Protocol (Postel's Law original) |
https://www.rfc-editor.org/rfc/rfc793 |
§2 |
| RFC 1122 — Requirements for Internet Hosts (restates Postel's Law) |
https://www.rfc-editor.org/rfc/rfc1122 |
§2 |
| RFC 6455 — WebSocket Protocol |
https://www.rfc-editor.org/rfc/rfc6455 |
§3, §10 |
| HTTP Semantics (RFC 9110) |
https://www.rfc-editor.org/rfc/rfc9110 |
§5, §8 |
| HTTP Caching (RFC 9111) |
https://www.rfc-editor.org/rfc/rfc9111 |
§5 |
| RFC 7636 — PKCE for OAuth |
https://www.rfc-editor.org/rfc/rfc7636 |
§1.1, §2 |
| RFC 8414 — OAuth 2.0 Authorization Server Metadata |
https://www.rfc-editor.org/rfc/rfc8414 |
§2 |
| RFC 9000 — QUIC |
https://www.rfc-editor.org/rfc/rfc9000 |
§3 |
| RFC 9113 — HTTP/2 |
https://www.rfc-editor.org/rfc/rfc9113 |
§3, §10 |
| RFC 6762 — Multicast DNS |
https://www.rfc-editor.org/rfc/rfc6762 |
§3, §10 |
| RFC 6763 — DNS-Based Service Discovery |
https://www.rfc-editor.org/rfc/rfc6763 |
§3, §10 |
| RFC 9728 — OAuth 2.0 Protected Resource Metadata |
https://www.rfc-editor.org/rfc/rfc9728 |
§2 |
| W3C Trace Context (traceparent, tracestate) |
https://www.w3.org/TR/trace-context/ |
§10.2 |
| W3C Server-Sent Events |
https://html.spec.whatwg.org/multipage/server-sent-events.html |
§5, §8 |
| OpenAPI Specification v3.1 |
https://spec.openapis.org/oas/v3.1.0 |
§2 |
| AsyncAPI Specification v3.0 |
https://www.asyncapi.com/docs/reference/specification/v3.0.0 |
§2 |
| GraphQL Specification |
https://spec.graphql.org/ |
§2, §9 |
| MQTT 5.0 Specification (OASIS) |
https://docs.oasis-open.org/mqtt/mqtt/v5.0/mqtt-v5.0.html |
§3, §10 |
| NATS Protocol |
https://docs.nats.io/reference/reference-protocols/ |
§3, §10 |
| ZeroMQ |
https://zeromq.org/ |
§3, §10 |
| gRPC |
https://grpc.io/docs/ |
§2, §3, §10 |
| OAuth 2.1 Draft |
https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1 |
§2 |
| OpenID Connect Discovery 1.0 |
https://openid.net/specs/openid-connect-discovery-1_0.html |
§2 |
| MCP Specification (2025-11-25) |
https://modelcontextprotocol.io/specification/2025-11-25 |
§1, §1.1, §2, §5, §7, §8 |
| LSP — Language Server Protocol |
https://microsoft.github.io/language-server-protocol/ |
§1 |
Protocols and Transport Technologies
| Reference |
URL |
Sections |
| HTTP (RFC 9110-9114) |
https://www.rfc-editor.org/rfc/rfc9110 |
§2, §4, §5, §7, §8, §10 |
| MQTT v5 |
https://docs.oasis-open.org/mqtt/mqtt/v5.0/mqtt-v5.0.html |
§3, §7, §10 |
| NATS |
https://nats.io/ |
§3, §10 |
| ZeroMQ |
https://zeromq.org/ |
§3, §10 |
| gRPC |
https://grpc.io/ |
§2, §10 |
| WebSocket |
https://www.rfc-editor.org/rfc/rfc6455 |
§3, §10 |
| AMQP 1.0 Specification (OASIS) |
https://docs.oasis-open.org/amqp/core/v1.0/amqp-core-overview-v1.0.html |
§3 |
| Unix Domain Sockets |
https://man7.org/linux/man-pages/man7/unix.7.html |
§3 |
| D-Bus Specification |
https://dbus.freedesktop.org/doc/dbus-specification.html |
§3 |
| FlatBuffers |
https://flatbuffers.dev/ |
§10 |
| Cap'n Proto |
https://capnproto.org/ |
§10 |
Patterns, Concepts, and Frameworks
| Reference |
URL |
Sections |
| Fielding's REST Dissertation |
https://web.archive.org/web/2024/https://ics.uci.edu/~fielding/pubs/dissertation/top.htm |
§2 |
| Richardson Maturity Model |
https://martinfowler.com/articles/richardsonMaturityModel.html |
§2 |
| The Twelve-Factor App |
https://12factor.net/ |
§2 |
| CQRS Pattern |
https://martinfowler.com/bliki/CQRS.html |
§1, §2 |
| Event Sourcing |
https://martinfowler.com/eaaDev/EventSourcing.html |
§2 |
| Fallacies of Distributed Computing |
https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing |
§2 |
| LMAX Disruptor |
https://lmax-exchange.github.io/disruptor/ |
§3 |
| Actor Model (Hewitt, 1973) |
https://en.wikipedia.org/wiki/Actor_model |
Addendum |
| Erlang/OTP |
https://www.erlang.org/ |
§10, Addendum |
| OpenTelemetry |
https://opentelemetry.io/ |
§10.2 |
| Idempotency-Key (Stripe) |
https://docs.stripe.com/api/idempotent_requests |
§2 |
| Agent Skills Specification |
https://agentskills.io/specification |
§1.1 |
| Google A2A Protocol |
https://github.com/a2aproject/A2A |
Addendum |
| DPDK |
https://www.dpdk.org/ |
§3 |
| JACK Audio Connection Kit |
https://jackaudio.org/ |
§3 |
| PipeWire |
https://www.pipewire.org/ |
§3 |
| systemd |
https://systemd.io/ |
§3 |
System Interfaces
| Reference |
URL |
Sections |
| POSIX shm_open |
https://pubs.opengroup.org/onlinepubs/9699919799/functions/shm_open.html |
§3, §10 |
| Linux futex(2) |
https://man7.org/linux/man-pages/man2/futex.2.html |
§3, §10 |
| Windows CreateFileMappingW |
https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-createfilemappingw |
§3, §10 |
| Windows WaitOnAddress |
https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress |
§3, §10 |
Books and Papers
| Reference |
URL |
Sections |
| Robert Frost, "The Road Not Taken" (1916) |
https://www.poetryfoundation.org/poems/44272/the-road-not-taken |
§12 |
| Bertrand Meyer, Object-Oriented Software Construction (Open/Closed Principle) |
https://en.wikipedia.org/wiki/Object-Oriented_Software_Construction |
§2 |
Acknowledgments
A LLM was used to help refine the language of the existing write-up and assist with copy-editing of the content.