Why Claude users want persistent memory
Claude is genuinely useful for long-running projects: writing a book, managing a codebase, running a research workflow, handling customer support. In all of these, you build up context over time — preferences, decisions already made, background facts — that should not have to be re-explained from scratch every session.
Claude Desktop and the Claude API support attaching tools and context via the Model Context Protocol. A growing number of developers are building MCP servers that give Claude access to external capabilities: file systems, databases, web search. Memory is the most requested capability of all: developers want Claude to remember who you are, what you've already discussed, and what preferences you've expressed — across conversations, automatically.
This guide walks through building exactly that: a Python MCP memory server that exposes remember, recall, and inject_context tools backed by the Kronvex persistent memory API. By the end, Claude Desktop will remember facts between sessions without any manual context pasting.
What is Model Context Protocol?
Model Context Protocol (MCP) is an open standard introduced by Anthropic in late 2024. It defines how AI models can interact with external tools, data sources, and services through a structured server/client architecture. An MCP server exposes a set of tools (functions the model can call) and optionally resources (data the model can read). The model's host application — Claude Desktop, a custom agent loop, or any MCP-compatible client — connects to these servers and makes their capabilities available to the model.
MCP servers communicate with the host over stdio (for local servers) or HTTP/SSE (for remote servers). Claude Desktop uses stdio by default: the server runs as a subprocess, and Claude communicates with it via JSON-RPC over stdin/stdout.
MCP vs function calling: MCP servers are different from simple function calling. They are persistent processes with their own state and can serve multiple tool types. They also support resources — structured data that Claude can pull into its context window directly, not just call as a function.
The Python SDK for MCP (mcp on PyPI) makes it straightforward to define tools and resources with a few decorators. We'll use it here alongside the Kronvex Python SDK to wire Claude's memory to a durable, semantically-searchable store.
The problem: Claude forgets between sessions
Every Claude conversation starts with a blank context window. There is no built-in mechanism for Claude to remember that you prefer TypeScript over Python, that your project uses a specific database schema, or that you already decided against a particular architecture last week. Each session is amnesia by design — context windows are stateless.
The common workarounds are all manual and fragile:
- System prompt stuffing — manually maintaining a "about me" block you paste into every conversation. Tedious, never up to date, and eats tokens for static information.
- Copy-pasting conversation history — manually selecting relevant prior conversations and pasting them in. Does not scale beyond a few sessions and is error-prone.
- Maintaining external notes — keeping a Notion or markdown file that you reference. Requires discipline to maintain and does not auto-surface the right memories for the current task.
None of these approaches have semantic search — the ability to automatically surface the relevant memory for the current conversation topic, not just recall everything you've ever told Claude. What we need is a proper memory layer: persistent, searchable by meaning, and integrated at the protocol level so Claude uses it automatically.
The scale problem: Even if you manage manual context pasting today, it breaks down completely when you have hundreds of past conversations. Semantic search isn't a nice-to-have — it's the only way to retrieve the right 5 memories from a corpus of 500 without reading them all.
Solution: a Kronvex-backed MCP memory server
The architecture is straightforward: we build an MCP server that wraps the Kronvex API. Claude calls the server's remember tool to store a fact, the recall tool to search memories by semantic similarity, and reads the context resource to get a pre-formatted memory block it can inject at the start of any session.
Kronvex handles everything behind those three operations: generating embeddings with text-embedding-3-small, storing them in pgvector, running cosine similarity search at recall time, and scoring results by a combination of semantic similarity, recency, and access frequency. The MCP server is just a thin adapter — under 100 lines of Python.
| MCP Tool / Resource | Maps to Kronvex | When Claude uses it |
|---|---|---|
remember(text) |
POST /api/v1/agents/{id}/remember |
When Claude learns a new fact to keep |
recall(query) |
POST /api/v1/agents/{id}/recall |
When Claude needs relevant past context |
resource: context |
POST /api/v1/agents/{id}/inject-context |
Session start, to prime the context window |
Python MCP server implementation
Dependencies
You need the MCP Python SDK and either the Kronvex SDK or plain httpx. Install both:
pip install mcp kronvex
# or, without the SDK:
pip install mcp httpx
The server
Create a file called memory_server.py. The full implementation below exposes remember and recall as MCP tools, and memory://context as a resource:
"""
Kronvex MCP Memory Server
Exposes remember/recall tools and a context resource for Claude Desktop.
"""
import os
import json
import httpx
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp import types
# Configuration
KRONVEX_API_KEY = os.environ["KRONVEX_API_KEY"]
KRONVEX_BASE_URL = "https://api.kronvex.io"
AGENT_ID = os.environ.get("KRONVEX_AGENT_ID", "claude-desktop-default")
app = Server("kronvex-memory")
# Shared httpx client (reused across calls)
http = httpx.Client(
base_url=KRONVEX_BASE_URL,
headers={"X-API-Key": KRONVEX_API_KEY},
timeout=10.0,
)
@app.list_tools()
async def list_tools() -> list[types.Tool]:
return [
types.Tool(
name="remember",
description=(
"Store a fact or piece of information in long-term memory. "
"Call this whenever the user shares something worth remembering "
"across future sessions: preferences, decisions, background facts, "
"or recurring context."
),
inputSchema={
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The fact or information to remember.",
},
"memory_type": {
"type": "string",
"description": "Optional category: preference, fact, decision, context.",
"default": "fact",
},
},
"required": ["text"],
},
),
types.Tool(
name="recall",
description=(
"Search long-term memory for information relevant to the current topic. "
"Returns the most relevant memories ranked by semantic similarity, "
"recency, and access frequency."
),
inputSchema={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to search for in memory.",
},
"limit": {
"type": "integer",
"description": "Maximum number of memories to return (default 5).",
"default": 5,
},
},
"required": ["query"],
},
),
]
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
if name == "remember":
text = arguments["text"]
memory_type = arguments.get("memory_type", "fact")
payload = {"content": text, "memory_type": memory_type}
resp = http.post(f"/api/v1/agents/{AGENT_ID}/remember", json=payload)
resp.raise_for_status()
return [types.TextContent(
type="text",
text=f"Remembered: {text}",
)]
elif name == "recall":
query = arguments["query"]
limit = arguments.get("limit", 5)
payload = {"query": query, "limit": limit}
resp = http.post(f"/api/v1/agents/{AGENT_ID}/recall", json=payload)
resp.raise_for_status()
memories = resp.json().get("memories", [])
if not memories:
return [types.TextContent(type="text", text="No relevant memories found.")]
lines = []
for m in memories:
score = m.get("confidence_score", m.get("similarity", 0))
lines.append(f"[{score:.2f}] {m['content']}")
return [types.TextContent(type="text", text="\n".join(lines))]
raise ValueError(f"Unknown tool: {name}")
@app.list_resources()
async def list_resources() -> list[types.Resource]:
return [
types.Resource(
uri="memory://context",
name="Memory Context",
description="Pre-formatted memory block for the current agent. "
"Read this at the start of a session to prime your context.",
mimeType="text/plain",
)
]
@app.read_resource()
async def read_resource(uri: str) -> str:
if str(uri) == "memory://context":
resp = http.post(
f"/api/v1/agents/{AGENT_ID}/inject-context",
json={"query": "general context", "limit": 10},
)
resp.raise_for_status()
return resp.json().get("context", "No memories found.")
raise ValueError(f"Unknown resource: {uri}")
if __name__ == "__main__":
import asyncio
asyncio.run(stdio_server(app))
Agent ID design: Use a stable, human-readable agent ID like claude-desktop-baptiste or claude-project-myapp. The Kronvex API isolates memories by agent ID — different IDs mean completely separate memory stores. This lets you have project-specific memory by changing KRONVEX_AGENT_ID per project.
Step-by-step setup with Claude Desktop
Step 1: Get your Kronvex API key
Sign up at kronvex.io/auth. The demo tier gives you a free key (kv-...) with 1 agent and 100 memories — more than enough to test. For ongoing use, the Starter plan at €49/month includes 15,000 memories.
Step 2: Save the server file
Save memory_server.py somewhere permanent. A good location is ~/.claude/mcp/memory_server.py:
mkdir -p ~/.claude/mcp
# Save the file there, then install dependencies:
pip install mcp kronvex httpx
Step 3: Configure Claude Desktop
Open Claude Desktop's configuration file. On macOS it's at ~/Library/Application Support/Claude/claude_desktop_config.json, on Windows at %APPDATA%\Claude\claude_desktop_config.json. Add the following entry under mcpServers:
{
"mcpServers": {
"kronvex-memory": {
"command": "python",
"args": ["/Users/you/.claude/mcp/memory_server.py"],
"env": {
"KRONVEX_API_KEY": "kv-YOUR_KEY_HERE",
"KRONVEX_AGENT_ID": "claude-desktop-default"
}
}
}
}
Step 4: Restart Claude Desktop and verify
Quit and reopen Claude Desktop. In a new conversation, you should see the memory tools available in the tool picker (the plug icon). Type:
Remember that I prefer TypeScript over JavaScript for all new projects,
and that my main database is PostgreSQL.
Claude will call the remember tool twice. Now close the conversation, open a new one, and ask:
What do you know about my tech stack preferences?
Claude will call recall("tech stack preferences") and return your stored memories — across a completely fresh conversation window.
Pro tip: Add a system prompt to Claude Desktop that instructs it to automatically recall context at the start of each session and remember any new facts you share. In Claude Desktop settings, add: "At the start of each conversation, use the recall tool to retrieve relevant context. When the user shares preferences or facts worth keeping, use the remember tool."
Advanced: using inject-context as a resource
The memory://context resource uses Kronvex's inject-context endpoint, which returns memories formatted specifically for insertion into a system prompt. Unlike recall (which searches for a specific query), inject-context retrieves a broad set of your most relevant and recent memories, formatted as a cohesive context block.
The response from Kronvex looks like this:
{
"context": "## What I know about you\n\n- You prefer TypeScript over JavaScript for new projects\n- Your main database is PostgreSQL\n- You're building a B2B SaaS product targeting EU customers\n- You prefer concise code over commented code\n- Last week you decided to use Railway for deployment\n\nContext generated at 2026-03-22 · 5 memories · Kronvex",
"memory_count": 5,
"agent_id": "claude-desktop-default"
}
You can configure Claude Desktop to automatically read this resource at session start by adding it to your system prompt template. This is particularly powerful for ongoing projects where you want Claude to always have full context without explicit recall calls.
Query-specific context injection
For production use, you can extend the resource to accept a query parameter that targets the memory retrieval to the current conversation topic:
@app.read_resource()
async def read_resource(uri: str) -> str:
# Support memory://context?query=database+schema
from urllib.parse import urlparse, parse_qs
parsed = urlparse(str(uri))
params = parse_qs(parsed.query)
query = params.get("query", ["general context"])[0]
limit = int(params.get("limit", [10])[0])
if parsed.scheme == "memory" and parsed.netloc == "context":
resp = http.post(
f"/api/v1/agents/{AGENT_ID}/inject-context",
json={"query": query, "limit": limit},
)
resp.raise_for_status()
return resp.json().get("context", "No memories found.")
raise ValueError(f"Unknown resource: {uri}")
Deleting specific memories
You can add a forget tool to let Claude remove outdated facts from memory. This is important for preferences that change — you don't want stale memories overriding current ones:
types.Tool(
name="forget",
description="Remove a specific memory by searching for it and deleting the match.",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string", "description": "The memory to forget."}
},
"required": ["query"],
},
),
# In call_tool:
elif name == "forget":
query = arguments["query"]
# First recall to find the memory ID
recall_resp = http.post(
f"/api/v1/agents/{AGENT_ID}/recall",
json={"query": query, "limit": 1},
)
recall_resp.raise_for_status()
memories = recall_resp.json().get("memories", [])
if not memories:
return [types.TextContent(type="text", text="No matching memory found.")]
memory_id = memories[0]["id"]
del_resp = http.delete(f"/api/v1/agents/{AGENT_ID}/memories/{memory_id}")
del_resp.raise_for_status()
return [types.TextContent(type="text", text=f"Forgotten: {memories[0]['content']}")]
Conclusion
The Model Context Protocol gives Claude a clean extension point for persistent memory. Rather than building your own vector database, embedding pipeline, and recall logic, you can wire the Kronvex API into a minimal MCP server in under 100 lines of Python. Claude Desktop then handles the tool calls automatically — remembering facts when you share them, surfacing relevant context when you need it, and maintaining a growing knowledge base across every conversation.
The same server works for any MCP-compatible client: Claude Code, custom agent loops built with the Claude API, or other MCP hosts as they emerge. The memory store in Kronvex is the stable layer — swap the MCP client without losing any memories.
The full server code above is production-ready for personal use and small teams. For multi-user deployments where different users need isolated memory stores, use per-user agent IDs (e.g., claude-desktop-{user_id}) and a separate KRONVEX_AGENT_ID per session.