OpenAI Assistants API vs Custom Memory â€” Kronvex

What Assistants API threads give you

When OpenAI launched the Assistants API, one of its headline features was persistent threads. Unlike the Chat Completions API where you manually manage and pass conversation history on every call, Assistants API threads are stored server-side. You create a thread once, add messages to it, and OpenAI handles the context window management automatically, truncating older messages when the thread grows beyond the model's context limit.

Concretely, threads give you:

Automatic conversation history — all messages in a thread are available to the assistant without you passing them manually
Managed context truncation — OpenAI truncates old messages when approaching the context limit, keeping the most recent exchanges
File attachments per thread — documents, images, and code files can be attached and referenced in the conversation
Built-in tools — Code Interpreter and file search are available out of the box without additional infrastructure
Simple API surface — create thread, add message, create run, poll for completion — that's the full pattern

For many use cases, this is genuinely sufficient. A customer support bot that handles one ticket per thread, a coding assistant where each session is self-contained, or a document Q&A tool — all work well with threads alone.

The limits of thread-based memory

The problems emerge as soon as your use case crosses any of these boundaries:

No cross-thread memory

Each thread is a silo. A user who tells your assistant "I prefer concise answers" in Thread A will need to repeat that preference when Thread B starts. There is no mechanism in the Assistants API to share context between threads. In practice, most real-world users don't have one conversation — they come back repeatedly over days, weeks, and months.

No semantic search across history

Thread context is injected linearly into the context window. There is no way to ask "what did this user tell me about their infrastructure stack across all past threads?" You can scroll through thread messages manually, but there is no programmatic semantic retrieval. At scale, with thousands of users and millions of messages, this becomes completely unworkable.

Silent truncation loses important facts

When a thread grows beyond the model's context limit, OpenAI silently drops the oldest messages. This means facts established early in a long conversation — "my company name is Acme Corp", "we use Kubernetes in production" — can silently disappear from the assistant's context. The assistant has no way to warn the user or retrieve these facts once truncated.

US data residency only

All Assistants API data, including thread contents, is stored on OpenAI's US infrastructure. If you're building for EU users, thread contents may include personal data — names, preferences, business information — that is subject to GDPR. Storing this data in the US without appropriate safeguards (SCCs, adequacy decisions) creates a legal compliance risk.

GDPR Article 17 — Right to erasure: If a user requests deletion of their data, you need to delete their thread contents from OpenAI's servers. OpenAI's API allows thread deletion, but you have no control over retention in OpenAI's internal systems. For a compliant EU product, you need a memory solution where you have full control over deletion.

No confidence scoring

Thread injection is all-or-nothing: either you include the thread history or you don't. There is no mechanism to surface which past facts are most relevant to the current query. A well-designed memory layer like Kronvex computes a confidence score combining semantic similarity, recency, and access frequency — so the most relevant memories surface at the top rather than forcing the model to scan through everything.

Feature	Assistants API Threads	Kronvex Memory
Scope	Single thread (conversation)	Across all sessions, all time
Retrieval	Linear, full history	Semantic similarity + confidence scoring
Truncation	Silent, oldest-first	Explicit TTL, never silently lost
Data location	US (OpenAI servers)	EU (Supabase Frankfurt)
Deletion control	Delete thread via API	Delete agent + all memories
Multi-agent sharing	✗	✓

When you need more

The trigger points that signal you've outgrown threads-only memory:

Users return repeatedly and expect the assistant to "remember" them across sessions. Support tickets, coaching sessions, project management — any use case with returning users.
Multiple agents share context about the same user. A sales agent and a support agent both need to know the customer's plan tier, preferences, and past issues — stored once, accessible everywhere.
GDPR deletion requests arrive and you need to prove complete erasure of a user's data from your AI layer.
Thread truncation is causing errors — users complain the assistant "forgot" something they mentioned early in a long conversation.
EU users are your primary market and legal has flagged the US data residency issue.
You need to query user context programmatically — e.g., "show me all users who mentioned PostgreSQL" for product analytics or proactive outreach.

Hybrid approach: keep Assistants for conversation, add Kronvex for persistent facts

You don't have to choose between the Assistants API and a dedicated memory layer. The optimal architecture for most production use cases is hybrid:

Assistants API threads handle the current conversation — the rolling context window of the active session, tool calls, file attachments, and structured outputs
Kronvex handles persistent user facts — preferences, context, history that should survive across threads and sessions

The integration point is the system prompt. Before creating each thread run, you call Kronvex to retrieve the most relevant memories for the current query and inject them into the assistant's instructions. The assistant sees both the current thread context and the relevant long-term memory.

Why not just use a long system prompt? You could store user preferences in a long static system prompt per user. But this doesn't scale: with many users, updating and storing individual system prompts becomes unwieldy. Kronvex's semantic retrieval means you only inject the memories relevant to the current query, not the user's entire history, which keeps token usage efficient.

Migration: extract facts from existing threads

If you have existing Assistants API threads with valuable conversation history, you can extract facts from them and store them in Kronvex. This is a one-time migration, not an ongoing sync — after migration, new facts are stored to Kronvex in real time.

The migration process:

List all threads for a user via the Assistants API
Fetch all messages from each thread
Run an extraction LLM call to identify facts worth persisting
Store extracted facts to Kronvex via agent.remember()

Code examples

Python — Hybrid: Assistants + Kronvex

from openai import OpenAI
from kronvex import Kronvex

openai_client = OpenAI(api_key="sk-your-key")
kv = Kronvex(api_key="kv-your-key")

ASSISTANT_ID = "asst_your_assistant_id"

def run_with_memory(user_id: str, user_message: str) -> str:
    agent = kv.agent(user_id)

    # 1. Retrieve relevant long-term memories from Kronvex
    context = agent.inject_context(user_message, top_k=5)

    # 2. Build system prompt with injected memory
    system_instructions = f"""You are a helpful B2B assistant.

PERSISTENT MEMORY (facts about this user from previous sessions):
{context or "No prior memory available."}

Use the above memory to personalize your responses. Always refer to
what you know about the user when relevant."""

    # 3. Create or retrieve thread for this session
    # In production, store thread_id per user in your DB
    thread = openai_client.beta.threads.create()

    # 4. Add user message to thread
    openai_client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=user_message
    )

    # 5. Run assistant with memory-augmented instructions
    run = openai_client.beta.threads.runs.create_and_poll(
        thread_id=thread.id,
        assistant_id=ASSISTANT_ID,
        additional_instructions=system_instructions
    )

    # 6. Get response
    messages = openai_client.beta.threads.messages.list(
        thread_id=thread.id, order="desc", limit=1
    )
    response_text = messages.data[0].content[0].text.value

    # 7. Store key facts to Kronvex (fire-and-forget in production)
    _extract_and_store(agent, user_message, response_text)

    return response_text


def _extract_and_store(agent, user_msg: str, assistant_reply: str):
    """Extract facts from conversation and store to Kronvex."""
    extraction_prompt = f"""Extract facts about the user worth remembering.
Return JSON list of strings. Return [] if nothing notable.
Only store user context, preferences, plans â€” not general knowledge.

User: {user_msg}
Assistant: {assistant_reply}

Facts:"""

    result = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": extraction_prompt}],
        response_format={"type": "json_object"}
    )
    import json
    try:
        data = json.loads(result.choices[0].message.content)
        facts = data.get("facts", data) if isinstance(data, dict) else data
        for fact in (facts if isinstance(facts, list) else []):
            if fact.strip():
                agent.remember(fact)
    except Exception:
        pass

Python — Migrate existing threads to Kronvex

def migrate_threads_to_kronvex(user_id: str, openai_thread_ids: list[str]):
    """One-time migration: extract facts from existing threads into Kronvex."""
    agent = kv.agent(user_id)

    for thread_id in openai_thread_ids:
        # Fetch all messages from thread
        messages = openai_client.beta.threads.messages.list(
            thread_id=thread_id,
            order="asc",
            limit=100
        )

        # Build conversation text
        conversation = "\n".join([
            f"{msg.role.upper()}: {msg.content[0].text.value}"
            for msg in messages.data
            if msg.content and msg.content[0].type == "text"
        ])

        if not conversation.strip():
            continue

        # Extract facts with LLM
        extraction = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"""Extract all durable facts about the user from this conversation.
Return a JSON list of concise fact strings.

Conversation:
{conversation[:4000]}  # Truncate very long threads

Facts (JSON list):"""
            }]
        )

        import json
        try:
            facts = json.loads(extraction.choices[0].message.content)
            for fact in (facts if isinstance(facts, list) else []):
                agent.remember(fact)
            print(f"Thread {thread_id}: stored {len(facts)} facts")
        except Exception as e:
            print(f"Thread {thread_id}: extraction failed: {e}")


# Run migration
migrate_threads_to_kronvex("user_42", ["thread_abc", "thread_def", "thread_ghi"])

Python — GDPR erasure: delete user data from both systems

def erase_user_data(user_id: str, openai_thread_ids: list[str]):
    """
    Article 17 right to erasure: delete all user data from both
    Assistants API threads and Kronvex memory.
    """
    # 1. Delete Kronvex agent and all memories
    agent = kv.agent(user_id)
    agent.delete()  # Deletes agent + all associated memories
    print(f"Kronvex: deleted all memories for {user_id}")

    # 2. Delete OpenAI threads
    for thread_id in openai_thread_ids:
        openai_client.beta.threads.delete(thread_id)
        print(f"OpenAI: deleted thread {thread_id}")

    print(f"Erasure complete for user {user_id}")

Document your erasure procedure. Under GDPR Article 17, you must be able to demonstrate that erasure was complete. Keep an audit log of deletion requests with timestamps. Kronvex's EU hosting means your memory data is in scope for EU data protection law; the Assistants API's US hosting is where the SCCs or DPA with OpenAI becomes important for your compliance documentation.

OpenAI Assistants API vs Custom Memory: When Thread History Isn't Enough

Contents