LIVE DEMO → Home Product
Features Use Cases Compare Enterprise
Docs
Documentation Quickstart MCP Server Integrations Benchmark
Pricing Blog DASHBOARD → LOG IN →
← All articles
TUTORIAL · COMPLETE WALKTHROUGH

Building a Production Support Agent with Memory

Mar 12, 2026 · 12 min read TUTORIAL

By the end of this walkthrough you'll have a working customer support agent that knows your product docs, remembers every user, and gets better with every conversation. FastAPI backend, Kronvex memory, LLM of your choice.

Most AI support agents are stateless. They're fast, they know your docs, but every conversation starts from scratch. The user has to re-explain who they are, what plan they're on, what they tried last week. It feels like talking to a goldfish with a great knowledge base.

This walkthrough adds memory. By the end you'll have an agent that remembers every user across sessions, personalizes answers based on their history, and stores new context automatically after every conversation.

Stack: FastAPI · Kronvex · OpenAI GPT-4o · your existing vector DB for RAG. The memory layer is swappable — the patterns work with any LLM.


What we're building

USER
User sends message"I have the same billing issue as last week"
POST /chat{ user_id, message, session_id }
CONTEXT
RAG — product knowledgeTop 3 relevant doc chunks from your knowledge base
+
Kronvex — user memoryinject_context() → past interactions, preferences, known issues
LLM
GPT-4o generates responseWith full context: docs + user history + current message
Response + storeAnswer sent · interaction stored as episodic memory

Prerequisites

💡
Don't have a vector DB yet? Skip the RAG step for now — you can start with memory only and add RAG later. The agent is still 10x better than stateless even without docs.

Step-by-step

Install dependencies
One command gets you everything needed.
BASH
pip install fastapi uvicorn openai kronvex httpx python-dotenv
Configure environment
Create a .env file — never commit this.
ENV.env
KRONVEX_API_KEY=kx_live_your_key_here
KRONVEX_AGENT_ID=agent_support_001
OPENAI_API_KEY=sk-your_openai_key
# Optional — your RAG endpoint
RAG_ENDPOINT=https://your-rag-api.example.com/search
Build the memory + context layer
This is the core module. It handles both RAG retrieval and Kronvex memory injection, then merges them into a single system prompt block.
PYTHONcontext.py
import os
import httpx
from kronvex import KronvexClient

kx = KronvexClient(api_key=os.getenv("KRONVEX_API_KEY"))
AGENT_ID = os.getenv("KRONVEX_AGENT_ID")
RAG_ENDPOINT = os.getenv("RAG_ENDPOINT")


async def get_rag_chunks(query: str, top_k: int = 3) -> str:
    """Retrieve relevant doc chunks from your vector DB."""
    if not RAG_ENDPOINT:
        return ""
    async with httpx.AsyncClient() as client:
        r = await client.post(RAG_ENDPOINT, json={
            "query": query, "top_k": top_k
        }, timeout=4.0)
        chunks = r.json().get("results", [])
        return "\n\n".join(c.get("text", "") for c in chunks)


async def build_context(user_id: str, message: str) -> str:
    """Build the full context block: RAG docs + user memories."""
    sections = []

    # 1. RAG — shared product knowledge
    try:
        docs = await get_rag_chunks(message)
        if docs:
            sections.append(f"[PRODUCT DOCUMENTATION]\n{docs}")
    except Exception:
        pass  # RAG failure is non-fatal

    # 2. Kronvex — user-specific memory
    try:
        ctx = kx.inject_context(
            message=message,
            agent_id=user_id,
            threshold=0.65,
            top_k=5
        )
        if ctx.memories_used > 0:
            sections.append(ctx.context_block)
    except Exception:
        pass  # Memory failure is non-fatal

    return "\n\n".join(sections)


async def store_interaction(
    user_id: str,
    user_message: str,
    agent_response: str,
    session_id: str
):
    """Store the interaction as episodic memory after responding."""
    # Only store substantive exchanges (skip "thanks", "ok", etc.)
    if len(user_message) < 20:
        return

    summary = (
        f"User said: {user_message[:200]}\n"
        f"Agent replied: {agent_response[:300]}"
    )
    kx.remember(
        content=summary,
        agent_id=user_id,
        memory_type="episodic",
        session_id=session_id,
        ttl_days=90
    )
Build the FastAPI endpoint
The main /chat route. It handles context injection, LLM generation, and memory storage — all in one request cycle.
PYTHONmain.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import AsyncOpenAI
from context import build_context, store_interaction
import asyncio, os

app = FastAPI()
oai = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

SYSTEM_BASE = """You are a helpful customer support agent for Kronvex.
You have access to product documentation and the user's conversation history.
Be concise, direct, and personal. If you have past context about this user, use it.
Do NOT repeat information the user already knows."""


class ChatRequest(BaseModel):
    user_id: str
    message: str
    session_id: str = "default"
    conversation: list[dict] = []  # last N turns


class ChatResponse(BaseModel):
    reply: str
    memories_used: int = 0
    rag_used: bool = False


@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest):
    # 1. Build context (RAG + memory) — non-blocking
    context_block = await build_context(req.user_id, req.message)

    # 2. Compose system prompt
    system = SYSTEM_BASE
    if context_block:
        system += f"\n\n{context_block}"

    # 3. Build messages array
    messages = [{"role": "system", "content": system}]
    messages.extend(req.conversation[-6:])  # last 3 turns
    messages.append({"role": "user", "content": req.message})

    # 4. Call LLM
    completion = await oai.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        max_tokens=600,
        temperature=0.4
    )
    reply = completion.choices[0].message.content

    # 5. Store interaction async — don't block the response
    asyncio.create_task(
        store_interaction(req.user_id, req.message, reply, req.session_id)
    )

    return ChatResponse(
        reply=reply,
        memories_used=1 if context_block else 0,
        rag_used="PRODUCT DOCUMENTATION" in context_block
    )


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
Add user-specific memory at onboarding
When a user signs up or logs in for the first time, store their key facts. This gives the agent immediate context before any conversation happens.
PYTHONonboarding.py
def store_user_profile(user_id: str, user_data: dict):
    """Call this on signup / first login."""
    facts = [
        (f"User plan: {user_data['plan']}",         "semantic"),
        (f"User name: {user_data['name']}",          "semantic"),
        (f"Joined: {user_data['created_at']}",       "semantic"),
        (f"Company: {user_data.get('company','—')}", "semantic"),
    ]
    for content, mtype in facts:
        kx.remember(
            content=content,
            agent_id=user_id,
            memory_type=mtype,
            pinned=True  # pinned = never expires, always recalled
        )
📌
Use pinned=True for critical facts — plan, name, company. These bypass TTL and are always returned first in any recall. Reserve pinned for facts that are always relevant, not just frequently mentioned.
Run it
Start the server and test with curl.
BASH
uvicorn main:app --reload
BASH — TEST
# First message — agent has no memory yet
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_alice",
    "session_id": "sess_001",
    "message": "I keep hitting rate limits on the /recall endpoint"
  }'

# New session — agent remembers from last time
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_alice",
    "session_id": "sess_002",
    "message": "Still having that issue"
  }'
# → Agent responds: "Still hitting the /recall rate limits?
#   Alice, you're on the Starter plan (1k req/day limit)..."

Production checklist

Before you ship this to real users, a few things to add:


What the memory looks like after a few conversations

Here's what Kronvex stores for a user after 3 support sessions:

JSON — USER MEMORIES (alice)
[
  {
    "content": "User plan: Pro",
    "memory_type": "semantic",
    "pinned": true,
    "access_count": 14
  },
  {
    "content": "Session sess_001: User hit rate limits on /recall. On Starter plan at the time.",
    "memory_type": "episodic",
    "session_id": "sess_001",
    "ttl_days": 90,
    "access_count": 3
  },
  {
    "content": "Session sess_002: User upgraded to Pro. Rate limit issue resolved.",
    "memory_type": "episodic",
    "session_id": "sess_002",
    "ttl_days": 90,
    "access_count": 1
  },
  {
    "content": "User prefers bullet-point answers over long paragraphs.",
    "memory_type": "procedural",
    "pinned": true,
    "access_count": 8
  }
]

When Alice opens a new conversation, inject_context() retrieves the most relevant of these (by semantic similarity to her first message) and injects them as a system prompt block — automatically, before the LLM ever sees her message.


Next steps

Ready to build?

Get your free API key — 100 memories, 1 agent, no credit card. Under 5 minutes to first memory stored.

Get free API key →
Related articles
Free access
Get your API key

100 free memories. No credit card required.

Already have an account? Sign in →