OpenAI Agents SDK Memory:
How to Add Persistent State
The OpenAI Agents SDK is designed to be stateless — clean, composable, and easy to reason about. But stateless means your agent forgets everything the moment a session ends. Here's how to wire in persistent memory without fighting the framework.
Why the OpenAI Agents SDK is stateless by design
The OpenAI Agents SDK (released in early 2025) is built around a straightforward execution model: each Runner.run() call starts from scratch. You pass in a list of messages, the agent processes them, and it returns a response. There is no built-in mechanism to persist state between calls.
This is a deliberate design choice. Stateless agents are easier to test, easier to scale horizontally, and easier to reason about in a distributed environment. When OpenAI introduced the Agents SDK alongside the Responses API, they separated the concern of what the agent knows right now (the input messages) from what it has learned over time (which they left to the developer).
previous_response_id chain for a single conversation thread, but this only works within a session and doesn't survive process restarts or cross-user scenarios. It is threading, not memory.
What you lose without persistence
Without a memory layer, every session is a blank slate. The practical consequences compound quickly:
- Context window bloat — to approximate memory, developers stuff previous conversations into the system prompt. At ~$15/M tokens for GPT-4o, this adds up fast.
- Poor user experience — the agent asks the same onboarding questions on every session, forgets user preferences, and cannot reference past decisions.
- No learning curve — the agent cannot get better at serving a specific user over time, making it feel generic regardless of how many interactions have occurred.
- Broken workflows — multi-step processes that span days (e.g., a procurement agent managing an RFP) fall apart when state is lost between sessions.
Integration architecture
The pattern is straightforward: intercept the agent's inputs and outputs to read and write to Kronvex. The SDK's hook system makes this clean.
# ┌─────────────────────────────────────────────────────────────────┐
# │ User request │
# └────────────────────────────┬────────────────────────────────────┘
# │
# ┌─────────▼──────────┐
# │ recall() — fetch │ ← Kronvex (pgvector)
# │ top-k memories │
# └─────────┬──────────┘
# │ inject as system context
# ┌─────────▼──────────┐
# │ OpenAI Agent SDK │
# │ Runner.run() │
# └─────────┬──────────┘
# │
# ┌─────────▼──────────┐
# │ remember() — save │ ← Kronvex (pgvector)
# │ exchange + facts │
# └─────────┬──────────┘
# │
# ┌─────────▼──────────┐
# │ Response to user │
# └────────────────────┘
Full code example
The OpenAI Agents SDK exposes lifecycle hooks via RunHooks. We implement on_agent_start to inject context and on_agent_end to persist the exchange.
pip install "kronvex[openai-agents]" openai-agents
from kronvex.integrations.openai_agents import KronvexHooksfrom agents import Agent, Runner from kronvex.integrations.openai_agents import KronvexHooks import os agent = Agent( name="SupportAgent", instructions="You are a helpful customer support agent.", model="gpt-4o", ) hooks = KronvexHooks( api_key=os.environ["KRONVEX_API_KEY"], agent_id=os.environ["KRONVEX_AGENT_ID"], session_id="user-42", # scope per user ) # Session 1 result = await Runner.run( agent, messages=[{"role": "user", "content": "I'm on the Pro plan, invoice #4821."}], hooks=hooks, ) # Session 2 — new process, same user result2 = await Runner.run( agent, messages=[{"role": "user", "content": "What plan am I on?"}], hooks=hooks, ) # → "You're on the Pro plan. I can also see your invoice #4821."
Before / after comparison
The difference is visible at the first return session:
| Scenario | Without memory | With Kronvex |
|---|---|---|
| User returns after 3 days | Treated as new user | Context instantly restored |
| User preference ("no bullet points") | Lost after session | Recalled on every session |
| Multi-session workflow | Breaks without full history in prompt | Seamless continuation |
| 100 sessions in context | ~200k tokens / $3 per request | Top-6 relevant snippets only |
Production considerations
EU hosting and GDPR
Kronvex stores all memory vectors in an EU-hosted PostgreSQL instance (Supabase Frankfurt region). For B2B agents handling personal data, this matters: Article 46 transfer mechanisms, DPA availability, and data residency SLAs are all covered. No extra configuration needed.
Rate limits and quotas
The Kronvex API enforces per-plan limits on remember calls (writes) and recall calls (reads) separately. For high-throughput agents, batch your remember calls or use the async client to avoid blocking the response path.
Memory hygiene
- Use
memory_type="episodic"for conversation turns withttl_days=30to auto-expire stale exchanges - Use
memory_type="semantic"for durable user facts (plan, preferences, company) with no TTL - Use
memory_type="procedural"for agent-learned workflows and decision patterns - Set
top_k=6as a starting point — more memories means larger prompts and higher latency
Runner.run_streamed() or async tool calls, use the AsyncKronvex client to avoid blocking the event loop. Import it with from kronvex import AsyncKronvex.
Quick start
Three steps to add memory to any OpenAI agent:
# Sign up at https://kronvex.io — free plan, no credit card
# Create an agent in the dashboard → copy the agent ID
pip install "kronvex[openai-agents]" openai-agents export KRONVEX_API_KEY="kv-your-key" export KRONVEX_AGENT_ID="your-agent-id"
from agents import Agent, Runner from kronvex.integrations.openai_agents import KronvexHooks import os agent = Agent(name="MyAgent", instructions="You are a helpful assistant.", model="gpt-4o") hooks = KronvexHooks( api_key=os.environ["KRONVEX_API_KEY"], agent_id=os.environ["KRONVEX_AGENT_ID"], session_id="user-42", ) result = await Runner.run(agent, messages=[...], hooks=hooks)