CLAUDE.md Is Not Memory — Use a Persistent Memory API Instead

1. The file-based memory trap

Two kinds of posts keep going viral in AI developer communities. The first: someone sharing their elaborate Obsidian vault with /resume, /compress, and /preserve skills that hand off context between Claude Code sessions. The second: a developer showing their session-34 handoff file, one of 43 markdown documents carefully maintained to simulate continuity across conversations.

The ingenuity is real. Developers are building genuine systems here — numbered sessions, archive files, frontmatter metadata, bash scripts that auto-compress context at session end. These aren't hacks. They're engineering solutions to a real infrastructure problem.

The instinct is exactly right: AI agents need memory. Without it, every session starts from zero. You re-explain your stack, re-establish conventions, re-share the decisions you made three sprints ago. At 30 minutes of context-rebuilding per day, that's 2.5 hours a week — per developer.

But the implementation doesn't scale. File-based memory works for one person on one project. The moment you add a second project, a second developer, or want to ship a product where your users' agents need memory — the markdown approach collapses.

If you've ever built a /resume skill, a CLAUDE.md file longer than 50 lines, or an Obsidian vault for your AI agent — this article is for you.

2. Why CLAUDE.md breaks at scale

CLAUDE.md is read every session in full. It is a static context injection file — not a retrieval system. When Claude Code starts, the entire contents of CLAUDE.md are prepended to the context window before you type a single word. This is useful for stable project configuration, but it was never designed to be a memory layer.

Here's what breaks in practice:

No semantic search: you can't query "what did we decide about auth?" — the whole file is injected regardless of relevance to the current task
No growth without manual editing: someone has to maintain it — new decisions don't get added automatically
Context window cost: a 300-line CLAUDE.md wastes tokens on irrelevant info every single call, across every session
Single-user by design: 1 file = 1 agent = 0 multi-tenant — you cannot use this pattern to give each of your product's users their own memory
No recency: a decision from 6 months ago weighs exactly the same as one from yesterday — there is no temporal dimension

Feature	CLAUDE.md	Kronvex
Retrieval	Full file injected	Semantic search (cosine similarity)
Growth	Manual edits only	Auto-stores via API
Search	None	pgvector similarity + recency + frequency
Multi-agent	No	Yes
Multi-user	No	Yes, isolated per API key
Context cost	Always full file	Only relevant memories
Scale	1 developer, 1 project	Unlimited agents, unlimited users

3. The Obsidian approach — clever but limited

The Obsidian + Claude Code setups that circulate online are genuinely impressive. CLAUDE.md pointing to vault files, frontmatter for session metadata, /resume to load the last handoff, /compress to summarize at session end, /preserve to archive the important bits. Some developers have 10, 20, 30+ session files managed with discipline most teams don't apply to their actual codebase.

These systems are impressive engineering — but they're solving an infrastructure problem with a note-taking tool.

The hard ceiling these approaches hit:

Requires local file access: the agent must have filesystem access to the vault — this is fine for a local dev tool, impossible for a cloud-deployed agent
No multi-user: 1 vault = 1 person = impossible to ship as a product where each user gets their own persistent memory
Keyword-only search: Obsidian's native search is not semantic — "auth token" won't find a note about "JWT expiry" unless the words match exactly
Brittle: handoff notes, numbered sessions, archive files — all require manual maintenance that breaks the moment you miss a step
Works for "my personal workflow" — fails for "my product's users": if you're building an AI agent for others to use, your users can't maintain an Obsidian vault on your behalf

4. What real persistent memory looks like

Three API calls. No file system. No manual maintenance. Works for 1 user or 10,000.

Python

import httpx

# Store after every meaningful interaction
httpx.post("https://api.kronvex.io/api/v1/agents/my-agent/remember",
    headers={"X-API-Key": "kv-your-key"},
    json={"content": "User prefers Python over Node. Always suggest async/await."})

# Recall before LLM call
r = httpx.post("https://api.kronvex.io/api/v1/agents/my-agent/recall",
    headers={"X-API-Key": "kv-your-key"},
    json={"query": "language preferences", "top_k": 5})

# Or inject formatted context block
ctx = httpx.post("https://api.kronvex.io/api/v1/agents/my-agent/inject-context",
    headers={"X-API-Key": "kv-your-key"},
    json={"query": "current session", "max_tokens": 600})

Under the hood, remember embeds your content with text-embedding-3-small and stores it in PostgreSQL + pgvector (EU-hosted). recall runs a cosine similarity search and ranks results by a composite confidence score:

Formula

confidence = similarity × 0.6 + recency × 0.2 + frequency × 0.2

Recency uses a sigmoid with a 30-day inflection point. Frequency is log-scaled by access count. This means the most relevant, most recently-confirmed, and most-frequently-accessed memories surface first — while stale or rarely-touched memories fade without being deleted. Your agent gets the right context, not the most recent or the largest.

5. Setting up Kronvex for Claude Code (10 min)

1 Get a free API key

Sign up at kronvex.io/auth. The demo tier gives you 3 agents and 500 memories — no credit card required. Your key starts with kv-.

2 Set up the MCP server or call the REST API directly

The fastest integration for Claude Code is via MCP — see the full MCP guide for a step-by-step walkthrough. You can also call the REST API directly from any tool, script, or post-session hook.

3 Update your CLAUDE.md to bootstrap the workflow

Your CLAUDE.md becomes a lightweight bootstrap instruction instead of a growing knowledge dump:

Markdown

# Memory
Before each session, call inject-context with the current task.
After each session, store key decisions with remember.
API key: kv-... (from env KRONVEX_API_KEY)
Agent ID: my-project

The free demo tier gives you 100 memories — enough to migrate your existing CLAUDE.md content and test the workflow end-to-end.

6. Migration guide: markdown → API

If you have an existing CLAUDE.md or Obsidian vault, here's a script that reads your markdown and bulk-imports it into Kronvex. Each header-delimited section becomes a separate memory — small enough to retrieve precisely, large enough to carry meaning.

Python

import httpx, re

# Parse your CLAUDE.md into logical chunks
with open("CLAUDE.md") as f:
    content = f.read()

# Split by headers or paragraphs
chunks = [c.strip() for c in re.split(r'\n#{1,3} ', content) if len(c.strip()) > 50]

# Bulk import
for chunk in chunks:
    httpx.post(
        "https://api.kronvex.io/api/v1/agents/my-project/remember",
        headers={"X-API-Key": "kv-your-key"},
        json={"content": chunk}
    )
    print(f"Stored: {chunk[:60]}...")

Run this once. Your CLAUDE.md can now shrink to 10 lines. The agent will retrieve the relevant sections semantically — only what matters for the current query, ranked by recency and frequency. No more full-file injection, no more token waste.

7. What to store vs what not to store

Not everything deserves a persistent memory. The rule: store anything you'd have to repeat across sessions. Don't store things that belong to a single conversation.

✅ Good to store

Architecture decisions: "We use PostgreSQL, not MySQL — decided March 2026" — the what and the why
Coding conventions: "All React components are functional, no class components"
User preferences: "User Baptiste prefers concise responses, no preamble"
Recurring patterns: "Database migrations use Alembic, always test on staging first"
Resolved bugs: "Auth token expires in 15min not 1h — fixed 2026-03-15"

❌ Don't store

Ephemeral session context: current file contents, temporary variables — this belongs in conversation history
Large raw code blocks: use RAG/vector search over your codebase for that, not memory entries
Secrets, API keys, passwords: use environment variables, never memory
Highly volatile data: anything that changes every session — store the pattern, not the instance

Replace your CLAUDE.md with real memory

100 free memories. No credit card. Works with Claude Code, Cursor, and any HTTP client.

Get Your Free API Key →

CLAUDE.md Is Not Memory — Here's What Is