The amnesia problem
You've built an AI agent. It's smart, fast, and well-prompted. But there's a problem you can't prompt your way out of: it forgets everything between sessions.
Every conversation starts cold. The user has to re-explain who they are, what they're working on, how they like to communicate. Your agent — no matter how capable — behaves like someone with no long-term memory. It's technically impressive and experientially broken.
Here's what this looks like in practice:
This isn't a model quality problem. It's an infrastructure problem. The model is fine — it just has no access to anything that happened before.
Why stuffing context into every prompt doesn't scale
The obvious workaround: dump everything into the system prompt. Load the user's profile, past conversations, preferences, history. Let the LLM figure it out.
This works at prototype scale. It breaks in production for three reasons:
- Cost. At GPT-4 pricing, a 50k-token context per call across thousands of users gets expensive fast. You're paying to process the same information on every single request.
- Latency. Large contexts mean slower time-to-first-token. Users notice anything above 2–3 seconds. A 50k-token context won't help.
- Relevance. Not all history is relevant to the current message. Dumping everything forces the model to attend to noise — which degrades quality, not improves it.
What memory-enabled agents actually do differently
A memory-enabled agent doesn't need the user to re-explain themselves. Here's the same interaction with Kronvex:
The agent remembered the user is a CTO, in fintech, evaluating for their support team, and prefers concise answers. It didn't ask. It just knew.
The three memory types that matter
Not all memory is the same. Kronvex structures memory into three types that map to how humans actually remember things:
- Semantic memory — facts about the user or world that stay true over time. "User is a CTO at fintech." "Budget is €50k." "Prefers bullet points."
- Episodic memory — events and interactions. "On March 3 the user asked about pricing. On March 5 they requested a demo." This is your conversation history, structured.
- Procedural memory — how the user wants things done. "Always use numbered steps." "Don't ask follow-up questions." "Reply in French." These are persistent behavioral preferences.
Using the right type improves recall precision. If a user asks "what do I usually work on?", you want semantic. If they ask "what did we discuss last week?", you want episodic. If you want to auto-apply their formatting preferences at session start, fetch procedural.
How to add memory in 3 API calls
Here's a complete working example. This is all you need to get from amnesiac to context-aware.
import requests API_KEY = "kx_live_your_key" BASE = "https://api.kronvex.io" HEADERS = {"X-API-Key": API_KEY, "Content-Type": "application/json"} # ── STEP 1: Create an agent (once, store the ID) ─────────── agent = requests.post(f"{BASE}/agents", headers=HEADERS, json={"name": "support-bot"}).json() agent_id = agent["id"] # ── STEP 2: Store what you learn about the user ──────────── # After the first session, store key facts requests.post(f"{BASE}/agents/{agent_id}/remember", headers=HEADERS, json={ "content": "CTO at Series B fintech. Evaluating support tooling. Prefers concise answers.", "memory_type": "semantic", "session_id": f"user_{user_id}" }) # ── STEP 3: Inject context before every LLM call ────────── def chat(user_message): ctx = requests.post(f"{BASE}/agents/{agent_id}/inject-context", headers=HEADERS, json={"message": user_message, "top_k": 5} ).json() system = ctx["context_block"] + "\n\nYou are a helpful assistant." # → Your LLM call with system + user_message return call_llm(system, user_message)
That's it. Three calls. Your agent now accumulates context across every session, per user, and retrieves only what's relevant at <80ms average latency.
What about RAG? Isn't that the same thing?
RAG (Retrieval-Augmented Generation) and agent memory solve different problems. RAG is about retrieving from a shared knowledge base — your docs, your product catalogue, your FAQ. It's the same for all users.
Agent memory is about per-user context — what this specific user told you, how they like to work, what happened in their past sessions. It's different for every user and grows over time.
Most production AI products need both. Kronvex handles the per-user memory layer. Your vector DB or search handles the shared knowledge base.
The trust argument
Beyond the technical benefits, there's a product argument that often gets missed: memory is how users develop trust in an AI tool.
When a tool remembers you, it signals that your time and context matter. When it forgets you, it signals the opposite. Users form long-term relationships with tools that demonstrate continuity. They churn from tools that feel like they're talking to a stranger every time.
In B2B, this matters even more. A sales rep who has to re-brief an AI tool every morning won't use it for long. A support agent that starts every ticket cold will always feel less useful than a human who remembers the customer.
/inject-context. Instead of calling /recall and formatting results yourself, it returns a formatted string you drop directly into your system prompt. One call before every LLM request.Real-world use cases
Customer support bots
A support bot with memory remembers previous tickets, the customer's subscription plan, recurring issues, and their preferred communication style. The result: faster resolution, less frustration, higher NPS. Store resolved tickets as episodic with a 180-day TTL, and account information as semantic with no expiration.
Personal AI assistants
A personal AI assistant compounds value on every interaction. It knows you prefer bullet-point summaries, that you're working on a specific project, and that you have an important meeting on Friday. Use semantic for long-lived preferences, episodic for recent tasks and events, and procedural for recurring workflows the assistant should automate.
AI sales agents
A sales agent with memory knows the full conversation history with each prospect, their past objections, which demos were already run, and where the deal is in the pipeline. It can pick up exactly where the last conversation ended — like an experienced human sales rep. episodic memory tracks interactions; semantic memory encodes the prospect's profile and identified needs.
session_id to implement the right to erasure.Choosing the right memory architecture
Agent memory is not a binary problem. The right architecture combines several approaches depending on your needs:
- Short-term context — LLM context window (last N messages)
- Static knowledge — RAG on a document base (docs, FAQ, catalogue)
- Style and domain — base model fine-tuning
- Per-user personalised memory — dedicated memory API (Kronvex)
For the vast majority of production use cases in 2026 — support agents, personal assistants, AI salespeople — the winning combination is: a strong base LLM + persistent memory API. Fine-tuning and RAG come as complements, not replacements.
Memory is no longer optional. It is the baseline infrastructure for any AI agent that aims to be genuinely useful over time.
Getting started
Kronvex runs on three endpoints: /remember, /recall, /inject-context. One API key. EU-hosted. Free demo plan available.
The full API reference is at docs.kronvex.io. Or start with the use cases page to see how teams in sales, support, onboarding and dev tooling are implementing this today.
Your agents are capable enough. Give them a memory worth keeping.
Give your agent memory today
Free demo account. No credit card. Three endpoints.