AutoGen Persistent Memory for Agents

AutoGen's memory limitation

Microsoft's AutoGen is one of the most widely deployed multi-agent frameworks in production. ConversableAgent, its core building block, makes it straightforward to create agents that exchange messages, use tools, and collaborate on complex tasks. But AutoGen has a fundamental limitation that every serious deployment hits within weeks: agents reset completely between runs.

When you start an AutoGen chat, each agent begins with a fresh message history. Anything learned in a previous session — a user's preferences, a decision about architecture, a recurring fact about the domain — is gone. The only state that persists is what you explicitly hard-code into system prompts, and system prompts don't scale as a memory mechanism beyond a handful of static facts.

AutoGen v0.4 introduced the ChatMemory interface and some basic in-process memory patterns, but these still don't survive process restarts, don't support semantic search across past interactions, and aren't designed for multi-agent shared memory scenarios. For production use, you need an external persistent memory layer.

What "persistent memory" means in practice: An AutoGen agent with persistent memory can say "last time you ran this pipeline, you decided to skip the validation step for CSV files under 10MB" — even if that decision was made three weeks ago in a different process instance. This is the difference between a stateless tool and a collaborator that learns over time.

Two approaches to persistent memory in AutoGen

There are two main patterns for adding long-term memory to AutoGen agents. Each has different trade-offs depending on how much control you need over when memories are stored and retrieved.

Approach 1: Custom memory middleware (hook-based)

AutoGen v0.4+ supports message hooks — callbacks that run before or after each message is processed. You can use these hooks to automatically intercept outgoing messages and store facts, and to inject recalled memories into the system prompt before each agent turn. This approach is transparent to the agent: it doesn't need to explicitly call memory tools.

Aspect	Hook-based (automatic)	Tool-based (explicit)
Control	Automatic, no agent prompt changes needed	Agent decides when to remember/recall
Precision	Every message stored (may include noise)	Only what agent explicitly deems worth keeping
Setup effort	Low — wrap existing agents	Medium — update system prompts with tool instructions
Best for	Logging all interactions, adding context to existing agents	Curated knowledge, preference tracking, decision storage

Approach 2: External API as a registered tool

The more explicit approach gives the agent direct control: register remember and recall as AutoGen tools, and instruct the agent in its system prompt to use them when appropriate. The agent decides what to remember and when to search its memory — which produces cleaner, more curated memories at the cost of more careful system prompt engineering.

In practice, the two approaches are complementary. For production systems, we recommend combining both: hook-based storage for comprehensive interaction logging, and explicit tool calls for high-value facts the agent explicitly flags as worth remembering.

Implementing Kronvex memory in AutoGen agents

Installation

bash

pip install pyautogen kronvex
# or without the SDK:
pip install pyautogen httpx

Basic hook-based memory wrapper

The simplest integration wraps any ConversableAgent with memory hooks. On each outgoing assistant message, we store the content in Kronvex. At the start of each turn, we inject recalled context into the system message:

Python — memory_wrapper.py

"""
Kronvex memory wrapper for AutoGen ConversableAgent.
Automatically stores and recalls memories around each agent turn.
"""
import os
import httpx
from autogen import ConversableAgent

KRONVEX_API_KEY = os.environ["KRONVEX_API_KEY"]
KRONVEX_BASE_URL = "https://api.kronvex.io"

_http = httpx.Client(
    base_url=KRONVEX_BASE_URL,
    headers={"X-API-Key": KRONVEX_API_KEY},
    timeout=10.0,
)


def remember(agent_id: str, content: str, memory_type: str = "interaction") -> None:
    """Store a piece of content in Kronvex memory."""
    _http.post(
        f"/api/v1/agents/{agent_id}/remember",
        json={"content": content, "memory_type": memory_type},
    ).raise_for_status()


def recall(agent_id: str, query: str, limit: int = 5) -> list[dict]:
    """Search Kronvex memory for relevant past context."""
    resp = _http.post(
        f"/api/v1/agents/{agent_id}/recall",
        json={"query": query, "limit": limit},
    )
    resp.raise_for_status()
    return resp.json().get("memories", [])


def inject_context(agent_id: str, query: str = "general context") -> str:
    """Get pre-formatted memory context for system prompt injection."""
    resp = _http.post(
        f"/api/v1/agents/{agent_id}/inject-context",
        json={"query": query, "limit": 8},
    )
    resp.raise_for_status()
    return resp.json().get("context", "")


class MemoryAgent(ConversableAgent):
    """
    ConversableAgent with Kronvex persistent memory.

    Automatically stores outgoing messages and injects recalled context
    into the system prompt before each turn.
    """

    def __init__(self, *args, agent_id: str, memory_query: str = "current task context", **kwargs):
        # Inject recalled context into the system message before initialization
        if kwargs.get("system_message"):
            ctx = inject_context(agent_id, memory_query)
            if ctx:
                kwargs["system_message"] = (
                    kwargs["system_message"]
                    + f"\n\n## Long-term memory\n{ctx}"
                )
        super().__init__(*args, **kwargs)
        self._kv_agent_id = agent_id
        self._kv_memory_query = memory_query

        # Hook: store every assistant message in memory
        self.register_hook(
            "process_message_before_send",
            self._kv_store_message,
        )

    def _kv_store_message(self, message, sender, silent):
        """Hook: called before each outgoing message. Stores it in Kronvex."""
        if isinstance(message, dict):
            content = message.get("content", "")
        else:
            content = str(message)
        if content and len(content) > 20:  # Skip trivial messages
            try:
                remember(self._kv_agent_id, content, memory_type="interaction")
            except Exception:
                pass  # Never let memory failures break the agent
        return message

Using the memory wrapper

Python — basic usage

import os
from autogen import UserProxyAgent
from memory_wrapper import MemoryAgent

# The assistant agent now has persistent memory
assistant = MemoryAgent(
    name="Assistant",
    agent_id="autogen-assistant-user123",  # Unique per user/project
    memory_query="software engineering task context",
    system_message="You are a software engineering assistant. Use your long-term memory to maintain continuity across sessions.",
    llm_config={"config_list": [{"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="ALWAYS",
    code_execution_config=False,
)

# Start a conversation — assistant has access to all past sessions
user_proxy.initiate_chat(
    assistant,
    message="What have we decided about the database architecture so far?",
)

Never fail on memory errors: The try/except in _kv_store_message is critical. Memory operations should never crash your agent. Kronvex has 99.9% uptime, but network hiccups happen — wrap all memory calls defensively and let the agent continue without memory rather than fail.

Explicit tool-based memory

For more curated memory, register remember and recall as AutoGen tools that the agent can call explicitly when it decides something is worth preserving:

Python — tool-based memory

import os
from autogen import ConversableAgent, UserProxyAgent, register_function
from memory_wrapper import remember, recall

AGENT_ID = "autogen-assistant-user123"

assistant = ConversableAgent(
    name="Assistant",
    system_message="""You are a software engineering assistant with long-term memory.

Use the remember() tool to store:
- User preferences and constraints
- Architectural decisions and their rationale
- Important facts about the project domain
- Anything the user says is worth keeping

Use the recall() tool when you need context from past sessions.
Always recall before answering questions about past decisions or preferences.
""",
    llm_config={"config_list": [{"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    code_execution_config=False,
)

# Register memory tools on both agents (AutoGen requires this)
register_function(
    lambda content, memory_type="fact": remember(AGENT_ID, content, memory_type),
    caller=assistant,
    executor=user_proxy,
    name="remember",
    description="Store a fact or decision in long-term memory. Use memory_type: preference, fact, decision, or context.",
)

register_function(
    lambda query, limit=5: [m["content"] for m in recall(AGENT_ID, query, limit)],
    caller=assistant,
    executor=user_proxy,
    name="recall",
    description="Search long-term memory for relevant past context. Returns a list of relevant memories.",
)

user_proxy.initiate_chat(
    assistant,
    message="I've decided we should use PostgreSQL instead of MongoDB for the main store. Remember that.",
)

Multi-agent scenario: shared memory across a team

The real power of external memory emerges in multi-agent setups. When a manager agent and multiple specialist agents all share the same Kronvex agent ID, they share a common memory store. Facts remembered by the researcher are visible to the writer; decisions made by the planner are available to the executor.

Here's a practical example: a content pipeline with a manager, a researcher, and a writer agent, all sharing memory across runs:

Python — multi-agent shared memory

import os
from autogen import GroupChat, GroupChatManager, ConversableAgent, UserProxyAgent, register_function
from memory_wrapper import remember, recall, inject_context

TEAM_ID = "content-pipeline-team-projectA"  # Shared by all agents in this team

def make_agent(name: str, role_description: str) -> ConversableAgent:
    """Create an agent with shared team memory."""
    memory_ctx = inject_context(TEAM_ID, f"{name} context and past work")
    system_msg = f"""{role_description}

## Team memory (shared across all runs)
{memory_ctx or 'No prior context available.'}
"""
    return ConversableAgent(
        name=name,
        system_message=system_msg,
        llm_config={"config_list": [{"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]}]},
    )

manager = make_agent(
    "Manager",
    "You coordinate a content team. Delegate research to Researcher, writing to Writer. Track decisions in memory.",
)
researcher = make_agent(
    "Researcher",
    "You research topics and summarize findings. Store important facts and sources in team memory.",
)
writer = make_agent(
    "Writer",
    "You write articles based on research. Store style preferences and recurring guidelines in memory.",
)

user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    code_execution_config=False,
)

# Register shared memory tools on all agents
for agent in [manager, researcher, writer]:
    register_function(
        lambda content, memory_type="fact": remember(TEAM_ID, content, memory_type),
        caller=agent,
        executor=user_proxy,
        name="remember",
        description="Store a fact, decision, or finding in team shared memory.",
    )
    register_function(
        lambda query, limit=5: [m["content"] for m in recall(TEAM_ID, query, limit)],
        caller=agent,
        executor=user_proxy,
        name="recall",
        description="Search team memory for relevant past context.",
    )

group_chat = GroupChat(
    agents=[user_proxy, manager, researcher, writer],
    messages=[],
    max_round=20,
    speaker_selection_method="auto",
)

group_manager = GroupChatManager(
    groupchat=group_chat,
    llm_config={"config_list": [{"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]}]},
)

user_proxy.initiate_chat(
    group_manager,
    message="Write an article about vector databases for a technical audience. Use our past style guidelines.",
)

Shared vs. per-agent memory: Use the same agent_id when you want agents to share a memory pool (team knowledge, project context). Use different agent_id values when agents should have isolated memories (e.g., each serves a different user, or you want strict role separation). Kronvex enforces isolation at the API key level — no cross-key memory leaks.

Production tips: agent_id naming, memory_type tagging, TTL

Agent ID naming conventions

Your agent_id is the primary partition key for memory. Design it deliberately:

Python — agent_id patterns

# Per-user memory (each user gets isolated history)
agent_id = f"autogen-assistant-{user_id}"

# Per-project team memory (all agents in the project share context)
agent_id = f"autogen-team-{project_id}"

# Per-customer memory in a B2B SaaS (isolate by customer account)
agent_id = f"autogen-{customer_id}-support"

# Per-environment (don't mix production memories with dev/staging)
agent_id = f"autogen-{env}-{user_id}"  # env: prod, staging, dev

Memory type tagging

The memory_type field in Kronvex lets you categorize memories for better recall quality and filtering. Establish consistent taxonomy across your AutoGen system:

Python — memory type conventions

# User/customer preferences (long-lived, high recall priority)
remember(agent_id, "User prefers TypeScript over JavaScript", memory_type="preference")

# Architectural or strategic decisions (long-lived)
remember(agent_id, "Decided to use Railway for deployment instead of AWS ECS", memory_type="decision")

# Domain facts (medium-lived, project-specific)
remember(agent_id, "The main database has 3.2M users as of Q1 2026", memory_type="fact")

# Interaction context (shorter-lived, session summaries)
remember(agent_id, "Session 2026-03-22: completed database schema refactor", memory_type="context")

# Temporary working notes (should be cleaned up after task)
remember(agent_id, "Currently processing batch job #4421, resume at row 8000", memory_type="temporary")

TTL for temporary facts

Not all memories should live forever. Temporary working state — "currently processing batch #4421" or "draft version 3 is in review" — becomes stale quickly and can confuse agents if recalled months later. The Kronvex API supports a ttl_days parameter that automatically expires memories after a set number of days:

Python — TTL on temporary memories

import httpx

def remember_with_ttl(
    agent_id: str,
    content: str,
    memory_type: str = "temporary",
    ttl_days: int = 7,
) -> None:
    """Store a memory that expires automatically after ttl_days."""
    _http.post(
        f"/api/v1/agents/{agent_id}/remember",
        json={
            "content": content,
            "memory_type": memory_type,
            "ttl_days": ttl_days,
        },
    ).raise_for_status()


# Temporary task state: expires in 1 day
remember_with_ttl(
    agent_id,
    "Currently processing invoice reconciliation for March 2026",
    memory_type="temporary",
    ttl_days=1,
)

# Short-term project context: expires in 30 days
remember_with_ttl(
    agent_id,
    "Sprint 12 focus: payment flow redesign, deadline April 5",
    memory_type="context",
    ttl_days=30,
)

# Long-term preference: no TTL (omit the parameter entirely)
remember(agent_id, "Always use British English spelling in documents", memory_type="preference")

Recall strategy by task type

Different tasks benefit from different recall queries. Train your agents (via system prompt) to recall with task-specific queries rather than generic ones:

Python — task-specific recall

# Before writing a technical document:
memories = recall(agent_id, "writing style preferences technical documentation tone", limit=5)

# Before making an architectural decision:
memories = recall(agent_id, "architecture decisions technology stack constraints", limit=8)

# Before engaging with a specific customer:
memories = recall(agent_id, "customer preferences history past issues", limit=6)

# Before resuming a long-running task:
memories = recall(agent_id, "task progress current state last completed step", limit=5)

How Kronvex scores recall results: Each recalled memory gets a confidence score computed as similarity × 0.6 + recency × 0.2 + frequency × 0.2. Recency uses a sigmoid with a 30-day inflection point — memories from last week score significantly higher than memories from last year, even if semantically similar. This means your agents naturally surface recent context without you managing timestamps manually.

Async support

For high-throughput pipelines running many agents concurrently, use async Kronvex calls to avoid blocking the event loop:

Python — async memory calls

import asyncio
import httpx

_async_http = httpx.AsyncClient(
    base_url="https://api.kronvex.io",
    headers={"X-API-Key": os.environ["KRONVEX_API_KEY"]},
    timeout=10.0,
)

async def remember_async(agent_id: str, content: str, memory_type: str = "fact") -> None:
    await _async_http.post(
        f"/api/v1/agents/{agent_id}/remember",
        json={"content": content, "memory_type": memory_type},
    )

async def recall_async(agent_id: str, query: str, limit: int = 5) -> list[dict]:
    resp = await _async_http.post(
        f"/api/v1/agents/{agent_id}/recall",
        json={"query": query, "limit": limit},
    )
    return resp.json().get("memories", [])

# Fan-out recall across multiple agents simultaneously
async def load_team_context(team_id: str, queries: list[str]) -> list[list[dict]]:
    return await asyncio.gather(*[
        recall_async(team_id, q) for q in queries
    ])

Conclusion

AutoGen is a powerful framework for multi-agent orchestration, but its stateless design means you need an external layer for anything that needs to persist across runs. The two patterns above — hook-based automatic memory and explicit tool-based memory — cover the full spectrum from passive interaction logging to deliberate knowledge curation.

The Kronvex API is particularly well-suited for AutoGen integrations because it maps cleanly to AutoGen's agent model: one agent_id per agent (or team), with remember, recall, and inject-context as the three operations that cover every memory use case. The confidence scoring — similarity, recency, frequency — means your agents surface the right memories without you managing ranking logic.

For production systems, combine both approaches: use hook-based memory for comprehensive logging, explicit tool calls for high-signal facts, consistent agent_id naming, memory_type tagging for recall quality, and TTLs for temporary state. With that in place, your AutoGen agents stop being amnesiac tools and start being genuine long-term collaborators.

AutoGen Persistent Memory: How to Give Your AutoGen Agents Long-Term Context

Contents