The LangGraph memory problem

LangGraph ships with a built-in checkpointer called MemorySaver. It stores graph state so that execution can be resumed mid-flow — for example, if a human-in-the-loop step pauses and waits for approval. This is genuinely useful for building stateful multi-step agents.

But there is a fundamental limitation that trips up almost every team building production agents with LangGraph: MemorySaver stores state in process memory. The moment your Python process restarts — a new deployment, a container scaling event, a server reboot — every stored checkpoint is gone. Your agents wake up with no memory of anything that happened before.

The LangGraph documentation is clear about this, but it's easy to miss. The relevant warning reads: "MemorySaver is an in-memory checkpointer. This means it will be lost when the process restarts. It is not suitable for production use." Yet it is the default example in every tutorial, which leads developers to prototype with it and then discover the limitation only when they hit production.

The scope of the problem: MemorySaver also only persists state within a single thread ID. Even if your process never restarts, two separate conversations with the same user — each using a different thread ID — cannot share memory. There is no built-in mechanism for cross-thread or cross-session facts.

This affects two distinct use cases that are often conflated. The first is checkpointing: saving the exact execution state of a running graph so it can be resumed. The second is long-term memory: storing semantic facts about a user, a domain, or a context so that future conversations can benefit from past interactions. LangGraph solves checkpointing with SqliteSaver and PostgresSaver. It does not solve long-term semantic memory at all.

What "persistent" actually means in LangGraph context

When developers say they want "persistent memory in LangGraph," they typically mean one of two things, and it's worth being precise.

Thread-level persistence means the graph state is durable within a single conversation thread. If the process restarts mid-conversation, the agent can pick up exactly where it left off. LangGraph's SqliteSaver and PostgresSaver checkpointers provide this. They write graph checkpoints to a database, making them survive process restarts. This is the correct solution for long-running agent workflows, human approval steps, and fault tolerance.

Cross-session memory is different. It means the agent can recall facts from previous conversations that happened in different thread IDs, potentially days or weeks ago. "This user told me they prefer concise answers." "This customer's integration uses webhook authentication." "Last month's review flagged a compliance risk in module X." None of this fits in a graph checkpoint — checkpoints capture execution state, not semantic knowledge.

The two-layer model: Production LangGraph agents typically need both layers. Use PostgresSaver for checkpoint persistence (fault tolerance, resume-on-restart), and an external memory store like Kronvex for cross-session semantic memory (user preferences, historical facts, learned context).

The confusing part is that the LangGraph docs discuss both under the umbrella of "memory," but they serve completely different purposes. A graph checkpoint is a binary snapshot of Python objects. A semantic memory store is a queryable vector database that surfaces relevant facts based on meaning. You need both, and they should be different systems.

The solution: external memory store with Kronvex

Kronvex is a persistent memory API for AI agents. It exposes three endpoints that map cleanly onto the memory lifecycle:

Each agent in your system gets its own isolated memory space, identified by a UUID you assign. For a multi-tenant SaaS product where you're building AI features for your customers, each of your end users becomes a separate Kronvex agent — their memories are completely isolated from one another.

The integration point in a LangGraph graph is a node. You add a memory_node to your graph that fires before the main LLM call. It queries Kronvex for relevant memories based on the user's latest message, then injects the results into the state as context. Another node (or a post-processing hook) fires after the response to extract and store new facts worth remembering.

INSTALL
pip install "kronvex[langgraph]" langgraph langchain-openai
# includes langgraph as optional dependency

Code: LangGraph agent with Kronvex memory

Before: LangGraph with MemorySaver (memory lost on restart)

The standard LangGraph quickstart uses MemorySaver as a checkpointer. State persists within a thread, but dies with the process. There is no way to recall what happened in session 1 during session 2.

Python — Standard LangGraph (in-memory, non-persistent)
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
from langchain_openai import ChatOpenAI
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

llm = ChatOpenAI(model="gpt-4o-mini")

def call_model(state: AgentState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

# Build graph
builder = StateGraph(AgentState)
builder.add_node("agent", call_model)
builder.set_entry_point("agent")
builder.add_edge("agent", END)

# MemorySaver: thread-level, in-process only
# All checkpoints vanish on process restart
memory = MemorySaver()
graph = builder.compile(checkpointer=memory)

# Session 1
config = {"configurable": {"thread_id": "user-001-session-1"}}
graph.invoke({"messages": [{"role": "user", "content": "I prefer Python over TypeScript"}]}, config)

# Session 2 — new thread ID, different process restart
# Agent has NO memory of session 1
config2 = {"configurable": {"thread_id": "user-001-session-2"}}
result = graph.invoke({"messages": [{"role": "user", "content": "What language should I use?"}]}, config2)
# Agent cannot recall the Python preference from session 1

After: LangGraph + Kronvex (cross-session persistent memory)

The pattern is simple: add two nodes to your graph. A memory_recall_node fires before the LLM call to inject relevant context. A memory_store_node fires after to persist new facts. The Kronvex API key and agent ID are the only configuration needed.

Python — LangGraph + Kronvex persistent memory
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated, Optional
from langchain_openai import ChatOpenAI
import operator
from kronvex.integrations.langgraph import make_recall_node, make_store_node

KRONVEX_API_KEY = "kv-your-key"
KRONVEX_AGENT_ID = "your-agent-id"

memory_recall_node = make_recall_node(KRONVEX_API_KEY, KRONVEX_AGENT_ID, top_k=5)
memory_store_node  = make_store_node(KRONVEX_API_KEY, KRONVEX_AGENT_ID)

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    memory_context: Optional[str]

llm = ChatOpenAI(model="gpt-4o-mini")

# ── Graph nodes ──────────────────────────────────────────────────

def call_model(state: AgentState):
    """Call the LLM, injecting memory context into the system prompt."""
    messages = state["messages"].copy()
    if state.get("memory_context"):
        system = {
            "role": "system",
            "content": f"You are a helpful assistant.\n\n{state['memory_context']}"
        }
        messages = [system] + messages
    response = llm.invoke(messages)
    return {"messages": [response]}

# ── Build graph ──────────────────────────────────────────────────

builder = StateGraph(AgentState)
builder.add_node("recall", memory_recall_node)
builder.add_node("agent", call_model)
builder.add_node("store", memory_store_node)

builder.set_entry_point("recall")
builder.add_edge("recall", "agent")
builder.add_edge("agent", "store")
builder.add_edge("store", END)

graph = builder.compile(checkpointer=MemorySaver())

# Session 1
config = {"configurable": {"thread_id": "user-001-session-1"}}
graph.invoke({
    "messages": [{"role": "user", "content": "I prefer Python over TypeScript for all my projects"}],
    "memory_context": None
}, config)

# Session 2 — different thread, different process, different day
# Kronvex recalls the Python preference automatically
config2 = {"configurable": {"thread_id": "user-001-session-2"}}
result = graph.invoke({
    "messages": [{"role": "user", "content": "What language should I use for the new microservice?"}],
    "memory_context": None
}, config2)
# Agent now answers: "Based on your past preference for Python..."

Production note: In the memory_store_node, replace the simplified "store everything" approach with an LLM extraction step. Prompt a fast model (gpt-4o-mini) with the conversation turn and ask it to extract discrete facts worth remembering. Store each fact as a separate memory. This keeps your recall results clean and avoids flooding the memory store with noisy context.

Advanced: inject_context for automatic recall

The /recall endpoint returns structured memory objects with confidence scores. For many use cases, you want a simpler interface: give me a formatted context block I can drop directly into a system prompt, already deduplicated and ordered by relevance.

Kronvex's /inject-context endpoint does exactly this. You pass a query string and a maximum character budget, and it returns a ready-to-use context string. This is particularly useful in LangGraph because it eliminates the formatting logic from your node code.

Python — Using inject_context in a LangGraph node
import httpx

KRONVEX_API_KEY = "kv-your-key"
KRONVEX_AGENT_ID = "your-agent-id"

def inject_context(query: str, max_tokens: int = 800) -> str:
    """Return a formatted memory block ready for a system prompt."""
    r = httpx.post(
        f"https://api.kronvex.io/api/v1/agents/{KRONVEX_AGENT_ID}/inject-context",
        headers={"X-API-Key": KRONVEX_API_KEY},
        json={"query": query, "max_tokens": max_tokens, "threshold": 0.45}
    )
    data = r.json()
    return data.get("context", "")  # Returns "" if no relevant memories

def memory_recall_node(state: AgentState):
    """Use inject_context to get a fully formatted memory block."""
    latest_message = state["messages"][-1]["content"]
    context = inject_context(latest_message)
    return {"memory_context": context if context else None}

def call_model(state: AgentState):
    """System prompt now includes pre-formatted memory context."""
    messages = list(state["messages"])
    system_content = "You are a helpful assistant."
    if state.get("memory_context"):
        system_content += f"\n\n## Long-term memory\n{state['memory_context']}"
    messages = [{"role": "system", "content": system_content}] + messages
    response = llm.invoke(messages)
    return {"messages": [response]}

The inject_context endpoint handles the deduplication that would otherwise cause problems when multiple related memories surface for the same query. It also respects the max_tokens budget, trimming and prioritizing by confidence score so you don't accidentally overflow your context window.

Architecture diagram

Here is how the two layers fit together in a production LangGraph deployment.

  User message
       │
       ▼
  ┌────────────────────────────────────────────────┐
  │             LangGraph Graph                    │
  │                                                │
  │  ┌──────────────────┐                          │
  │  │  memory_recall   │ ◄── Kronvex /recall      │
  │  │  node            │     (semantic search,    │
  │  │                  │      cross-session)      │
  │  └────────┬─────────┘                          │
  │           │ injects context into state         │
  │           ▼                                    │
  │  ┌──────────────────┐                          │
  │  │   agent node     │ ◄── LLM (GPT-4o, etc.)  │
  │  │   (LLM call)     │     system prompt =      │
  │  │                  │     base + memories      │
  │  └────────┬─────────┘                          │
  │           │                                    │
  │           ▼                                    │
  │  ┌──────────────────┐                          │
  │  │  memory_store    │ ──► Kronvex /remember    │
  │  │  node            │     (persist new facts)  │
  │  └────────┬─────────┘                          │
  │           │                                    │
  │  ┌────────▼─────────┐                          │
  │  │  PostgresSaver   │  ← Thread-level state    │
  │  │  checkpointer    │    (resume on restart)   │
  │  └──────────────────┘                          │
  └────────────────────────────────────────────────┘
       │
       ▼
  Response to user

  ─────────────────────────────────────────────────
  Kronvex (EU, pgvector)   PostgresSaver (your DB)
  Cross-session facts      Graph execution state
  Semantic search          Exact checkpoint replay
      

The key insight is that these two persistence layers are orthogonal. PostgresSaver (or SqliteSaver in development) handles the question: "where did I leave off in this execution?" Kronvex handles the question: "what do I know about this user or domain?" They solve different problems and should not be conflated.

Comparison: MemorySaver vs SqliteSaver vs Kronvex

Feature MemorySaver SqliteSaver Kronvex
Survives process restart
Cross-thread memory
Semantic similarity search
No infrastructure to manage Local file only Hosted API
Multi-tenant isolation Manual Agent ID scoped
EU data residency (GDPR) Wherever deployed Wherever deployed Frankfurt, EU
Confidence scoring Similarity + recency + frequency
Use case Dev / prototyping Single-machine apps Production agents, B2B SaaS
Free tier 1 agent, 100 memories

Recommended stack: Use MemorySaver during local development (no setup required), switch to PostgresSaver for checkpoint persistence in production, and add Kronvex for cross-session semantic memory. This covers all three persistence requirements without over-engineering the early prototype phase.

FAQ

Does using Kronvex replace LangGraph's checkpointer?
No. They solve different problems. A LangGraph checkpointer (MemorySaver, SqliteSaver, PostgresSaver) saves the execution state of the graph — which nodes have run, the current values of state fields, pending tool calls. Kronvex stores semantic memories: facts, preferences, and events that should be recalled in future conversations. You use both together. The checkpointer handles fault tolerance and resume; Kronvex handles long-term knowledge.
What is the latency overhead of adding a Kronvex recall node?
A typical /recall call completes in 40–80ms (p50). This is acceptable because the LLM call itself (the dominant latency in any agent turn) typically takes 500ms–3s. The recall happens in parallel with any other pre-LLM processing you do, so in practice the wall-clock overhead is often under 20ms on the critical path. For latency-sensitive applications, you can also pre-fetch memories asynchronously using asyncio while other setup work runs.
How do I handle multi-tenant agents in LangGraph?
Assign one Kronvex agent ID per end user (or per tenant unit, depending on your isolation model). Pass the agent ID at request time through your graph state or through a dependency injection pattern. Each Kronvex agent has a fully isolated memory store — /recall on agent ID A never returns memories from agent ID B. This is enforced at the API level, not just by convention.
Can I use Kronvex with LangGraph's async graph execution?
Yes. Replace the httpx.post calls with await httpx.AsyncClient().post(...) (or use the async Kronvex Python SDK). LangGraph supports async node functions natively. Define your memory nodes as async def and use await for the Kronvex API calls. This allows the recall and store operations to be non-blocking, which is especially important in high-throughput multi-agent systems where many graphs may be running concurrently.