I Built a Private MCP Server to Give Claude Memory Across Sessions. Here Is What Broke.

A private MCP server solves Claude’s blank-session problem by connecting claude.ai, Claude Desktop, and Claude Code to a shared knowledge base through a single, authenticated endpoint. I shipped codex-mcp v0.1 on 2026-05-17, 13 days ahead of schedule, using two separate serverless deployments behind OAuth 2.1: one for the RAG substrate, one for the private memory surface. All 10 acceptance criteria passed before ship; a pipeline test two hours later caught two bugs the smoke test couldn’t reach.

Every Claude session starts blank.

That is not a complaint about the product. It is an architecture constraint, and like most architecture constraints, the interesting question is not whether it exists but what you build around it.

I shipped codex-mcp v0.1 on 2026-05-17, 13 days ahead of the target date I had written into my planning node, and it is the most structurally useful thing I have built in the harness so far. Not because it is clever, but because it closes the most expensive failure mode in my agent stack: every agent operating from a blank context, reinventing what was already known three sessions ago.

This is the build log.

What Is the Problem With Stateless Claude Context?

When I ask Claude to comment on a new LinkedIn post, it does not know what I built last week. When I ask it to draft a blog post, it does not know what I wrote last month. When a scheduled automation runs overnight, it does not know what the Claude Code session decided yesterday.

The workaround most people use is pasting context manually: dumping notes into the conversation, uploading files, re-explaining preferences every session. This works. It is also exactly the kind of busywork that defeats the purpose of having an agent.

The harder version of the problem is that each agent surface (Claude Code at the terminal, claude.ai in the browser, scheduled automations) draws from a separate context. They are not just stateless within sessions, they are isolated across surfaces. A decision Claude Code made and logged to a local decisions ledger is invisible to the automation running six hours later.

I had been patching this with a shared context digest (~15 KB, refreshed on-demand) that I would feed into sessions manually. That reduced the problem. It did not solve it.

What MCP actually enables here

The Model Context Protocol is a standardized way for LLM-facing applications to query external data sources during inference. When a client (claude.ai, Claude Desktop, Claude Code) has an MCP server configured, it can call tools on that server mid-conversation without the user pasting anything.

The specific thing I needed was not just any MCP server. I needed one that:

Served my private knowledge base (321 nodes as of the v0.1 ship date), not a public corpus
Required authentication so the memory surface was not public
Ran on infrastructure I controlled, isolated from the main RAG substrate

That last constraint matters. My knowledge base includes operational details: decision logs, infrastructure notes, draft framing. Hosting that on the same deployment as a public search endpoint would mean one misconfiguration exposes the lot. Isolation is not paranoia; it is basic hygiene for anything that serves agent memory.

How Does the Two-Deployment MCP Architecture Work?

The deployment architecture separates two concerns onto two separate serverless deployments:

The original RAG substrate, which serves the automation stack’s search path
A dedicated MCP deployment, which serves only the MCP endpoint, behind OAuth 2.1

These are not the same deployment. That separation is deliberate. If the MCP endpoint has a problem, it does not take down the automation stack. If the automation stack has a write spike, it does not affect MCP query latency. Blast radius containment through topology, not configuration.

The MCP endpoint exposes two primary tools: codex_search (semantic search across all nodes) and codex_fetch_node (direct retrieval by node ID). A connected claude.ai session can call these mid-conversation without any manual context-pasting.

The OAuth 2.1 surface follows the standard MCP authorization spec: /.well-known/oauth-protected-resource returns the authorization server metadata; the client handles the token exchange. The endpoint is not publicly browsable.

The smoke test passed. The pipeline test found two bugs.

All 10 acceptance criteria for v0.1 passed before ship. That gave me enough confidence to write the node as verified rather than target.

Two hours later, I ran an end-to-end pipeline health test and found two bugs the smoke test did not catch.

Bug 1: codex_fetch_node returned empty for nodes authored in the same session as the test.

The root cause: newly authored nodes are pushed to the deployment by a sync script, but the MCP server caches the embedding index at startup. Nodes written after the last index refresh were invisible to codex_fetch_node. The smoke test only queried nodes that existed before startup. The pipeline test queried nodes authored during the test session. Two different failure modes.

Bug 2: Search-by-ID fallback was missing.

codex_search takes a query string and returns semantic matches. I had assumed it would also handle exact node ID strings. It did not: the ID would match via semantic similarity if the node text contained the ID string, but there was no direct lookup path. In practice this worked most of the time. In edge cases where a new node had not yet had text indexed, the ID search failed silently.

Both bugs went into the v0.2 backlog. Neither is a regression: v0.1 never promised dynamic index refresh or ID-first lookup. But both would have caused friction in real sessions before I caught them.

The lesson is not that smoke tests are insufficient (they are sufficient for what they test). The lesson is that the pipeline test should be designed to stress the assumptions the smoke test makes, not to re-verify the same paths.

What changes when this is live

The behavioral difference is subtle from the outside. From the inside, it is significant.

When claude.ai has the MCP server configured, I can start a session without pasting anything and ask “what did I ship last week?” The server queries the knowledge base for nodes authored in the last seven days and returns them in context. The agent knows about codex-mcp before I mention it.

The subtler version: when I ask for help with a problem I have already partially solved, the agent can retrieve the existing node and extend it rather than re-deriving from first principles. The difference between “I know you built this, here is what the node says” and “I don’t know your history, please explain” is not a small one across dozens of sessions per week.

The failure mode this does NOT solve: decisions made in one agent surface (Claude Code) are still not automatically visible to another (the automation stack). The MCP server makes the knowledge base queryable, not the decisions ledger. Cross-agent decision sync is the next architectural layer, and it is in the v0.2 backlog.

Why Is the Isolation Pattern a Reusable Principle?

Separate deployments for separate trust boundaries is not a new idea. It is new in the context of MCP server design because most MCP server tutorials start from “here is how to expose data” and skip the question of what trust model the server operates under.

The pattern that worked: one deployment per access control tier. The public-facing RAG substrate and the private-memory MCP server share the same serverless stack but run as separate deployments with separate deploy keys. Credentials for one do not grant access to the other.

If you are building an MCP server for agent memory and you are using the same deployment as a public or semi-public service, the question is not whether you will have a misconfiguration eventually. It is whether the blast radius of that misconfiguration includes your private context.

Isolation is cheaper to architect upfront than to retrofit after an incident.

What is next

The v0.1 MCP server is live. The two pipeline-test bugs are logged. The v0.2 scope adds dynamic index refresh (so new nodes are immediately queryable without a restart) and a direct node-ID lookup path.

The goal is not a clever MCP server. The goal is an agent stack where the context that matters is always present without manual work. The MCP server is one component of that. The decisions-ledger sync is another. The context digest refresh is a third.

Individually, each of these is a medium-difficulty build. Together, they are the difference between an agent that starts blank and an agent that starts from where you left off. The tools I’ve built on top of this architecture are available at chudi.dev/products.

I Built a Private MCP Server to Give Claude Memory Across Sessions. Here Is What Broke.

Why this matters

What Is the Problem With Stateless Claude Context?

What MCP actually enables here

How Does the Two-Deployment MCP Architecture Work?

The smoke test passed. The pipeline test found two bugs.

What changes when this is live

Why Is the Isolation Pattern a Reusable Principle?

What is next

Sources & Further Reading

Further reading

What do you think?

The 95% Model Sometimes Lies About Finishing. Anthropic's System Card Documents Both.

I Added WebMCP to SvelteKit: 90 Min, 3 Files.

What Is WebMCP?

Claude Code Has 8 Hook Events. None of Them Can See the Agent's Output.

What a Solo Builder's Claude Code Operating System Actually Looks Like