What Assistants API threads give you

When OpenAI launched the Assistants API, one of its headline features was persistent threads. Unlike the Chat Completions API where you manually manage and pass conversation history on every call, Assistants API threads are stored server-side. You create a thread once, add messages to it, and OpenAI handles the context window management automatically, truncating older messages when the thread grows beyond the model's context limit.

Concretely, threads give you:

For many use cases, this is genuinely sufficient. A customer support bot that handles one ticket per thread, a coding assistant where each session is self-contained, or a document Q&A tool — all work well with threads alone.

The limits of thread-based memory

The problems emerge as soon as your use case crosses any of these boundaries:

No cross-thread memory

Each thread is a silo. A user who tells your assistant "I prefer concise answers" in Thread A will need to repeat that preference when Thread B starts. There is no mechanism in the Assistants API to share context between threads. In practice, most real-world users don't have one conversation — they come back repeatedly over days, weeks, and months.

No semantic search across history

Thread context is injected linearly into the context window. There is no way to ask "what did this user tell me about their infrastructure stack across all past threads?" You can scroll through thread messages manually, but there is no programmatic semantic retrieval. At scale, with thousands of users and millions of messages, this becomes completely unworkable.

Silent truncation loses important facts

When a thread grows beyond the model's context limit, OpenAI silently drops the oldest messages. This means facts established early in a long conversation — "my company name is Acme Corp", "we use Kubernetes in production" — can silently disappear from the assistant's context. The assistant has no way to warn the user or retrieve these facts once truncated.

US data residency only

All Assistants API data, including thread contents, is stored on OpenAI's US infrastructure. If you're building for EU users, thread contents may include personal data — names, preferences, business information — that is subject to GDPR. Storing this data in the US without appropriate safeguards (SCCs, adequacy decisions) creates a legal compliance risk.

GDPR Article 17 — Right to erasure: If a user requests deletion of their data, you need to delete their thread contents from OpenAI's servers. OpenAI's API allows thread deletion, but you have no control over retention in OpenAI's internal systems. For a compliant EU product, you need a memory solution where you have full control over deletion.

No confidence scoring

Thread injection is all-or-nothing: either you include the thread history or you don't. There is no mechanism to surface which past facts are most relevant to the current query. A well-designed memory layer like Kronvex computes a confidence score combining semantic similarity, recency, and access frequency — so the most relevant memories surface at the top rather than forcing the model to scan through everything.

Feature Assistants API Threads Kronvex Memory
Scope Single thread (conversation) Across all sessions, all time
Retrieval Linear, full history Semantic similarity + confidence scoring
Truncation Silent, oldest-first Explicit TTL, never silently lost
Data location US (OpenAI servers) EU (Supabase Frankfurt)
Deletion control Delete thread via API Delete agent + all memories
Multi-agent sharing

When you need more

The trigger points that signal you've outgrown threads-only memory:

Hybrid approach: keep Assistants for conversation, add Kronvex for persistent facts

You don't have to choose between the Assistants API and a dedicated memory layer. The optimal architecture for most production use cases is hybrid:

The integration point is the system prompt. Before creating each thread run, you call Kronvex to retrieve the most relevant memories for the current query and inject them into the assistant's instructions. The assistant sees both the current thread context and the relevant long-term memory.

Why not just use a long system prompt? You could store user preferences in a long static system prompt per user. But this doesn't scale: with many users, updating and storing individual system prompts becomes unwieldy. Kronvex's semantic retrieval means you only inject the memories relevant to the current query, not the user's entire history, which keeps token usage efficient.

Migration: extract facts from existing threads

If you have existing Assistants API threads with valuable conversation history, you can extract facts from them and store them in Kronvex. This is a one-time migration, not an ongoing sync — after migration, new facts are stored to Kronvex in real time.

The migration process:

  1. List all threads for a user via the Assistants API
  2. Fetch all messages from each thread
  3. Run an extraction LLM call to identify facts worth persisting
  4. Store extracted facts to Kronvex via agent.remember()

Code examples

Python — Hybrid: Assistants + Kronvex
from openai import OpenAI
from kronvex import Kronvex

openai_client = OpenAI(api_key="sk-your-key")
kv = Kronvex(api_key="kv-your-key")

ASSISTANT_ID = "asst_your_assistant_id"

def run_with_memory(user_id: str, user_message: str) -> str:
    agent = kv.agent(user_id)

    # 1. Retrieve relevant long-term memories from Kronvex
    context = agent.inject_context(user_message, top_k=5)

    # 2. Build system prompt with injected memory
    system_instructions = f"""You are a helpful B2B assistant.

PERSISTENT MEMORY (facts about this user from previous sessions):
{context or "No prior memory available."}

Use the above memory to personalize your responses. Always refer to
what you know about the user when relevant."""

    # 3. Create or retrieve thread for this session
    # In production, store thread_id per user in your DB
    thread = openai_client.beta.threads.create()

    # 4. Add user message to thread
    openai_client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=user_message
    )

    # 5. Run assistant with memory-augmented instructions
    run = openai_client.beta.threads.runs.create_and_poll(
        thread_id=thread.id,
        assistant_id=ASSISTANT_ID,
        additional_instructions=system_instructions
    )

    # 6. Get response
    messages = openai_client.beta.threads.messages.list(
        thread_id=thread.id, order="desc", limit=1
    )
    response_text = messages.data[0].content[0].text.value

    # 7. Store key facts to Kronvex (fire-and-forget in production)
    _extract_and_store(agent, user_message, response_text)

    return response_text


def _extract_and_store(agent, user_msg: str, assistant_reply: str):
    """Extract facts from conversation and store to Kronvex."""
    extraction_prompt = f"""Extract facts about the user worth remembering.
Return JSON list of strings. Return [] if nothing notable.
Only store user context, preferences, plans — not general knowledge.

User: {user_msg}
Assistant: {assistant_reply}

Facts:"""

    result = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": extraction_prompt}],
        response_format={"type": "json_object"}
    )
    import json
    try:
        data = json.loads(result.choices[0].message.content)
        facts = data.get("facts", data) if isinstance(data, dict) else data
        for fact in (facts if isinstance(facts, list) else []):
            if fact.strip():
                agent.remember(fact)
    except Exception:
        pass
Python — Migrate existing threads to Kronvex
def migrate_threads_to_kronvex(user_id: str, openai_thread_ids: list[str]):
    """One-time migration: extract facts from existing threads into Kronvex."""
    agent = kv.agent(user_id)

    for thread_id in openai_thread_ids:
        # Fetch all messages from thread
        messages = openai_client.beta.threads.messages.list(
            thread_id=thread_id,
            order="asc",
            limit=100
        )

        # Build conversation text
        conversation = "\n".join([
            f"{msg.role.upper()}: {msg.content[0].text.value}"
            for msg in messages.data
            if msg.content and msg.content[0].type == "text"
        ])

        if not conversation.strip():
            continue

        # Extract facts with LLM
        extraction = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"""Extract all durable facts about the user from this conversation.
Return a JSON list of concise fact strings.

Conversation:
{conversation[:4000]}  # Truncate very long threads

Facts (JSON list):"""
            }]
        )

        import json
        try:
            facts = json.loads(extraction.choices[0].message.content)
            for fact in (facts if isinstance(facts, list) else []):
                agent.remember(fact)
            print(f"Thread {thread_id}: stored {len(facts)} facts")
        except Exception as e:
            print(f"Thread {thread_id}: extraction failed: {e}")


# Run migration
migrate_threads_to_kronvex("user_42", ["thread_abc", "thread_def", "thread_ghi"])
Python — GDPR erasure: delete user data from both systems
def erase_user_data(user_id: str, openai_thread_ids: list[str]):
    """
    Article 17 right to erasure: delete all user data from both
    Assistants API threads and Kronvex memory.
    """
    # 1. Delete Kronvex agent and all memories
    agent = kv.agent(user_id)
    agent.delete()  # Deletes agent + all associated memories
    print(f"Kronvex: deleted all memories for {user_id}")

    # 2. Delete OpenAI threads
    for thread_id in openai_thread_ids:
        openai_client.beta.threads.delete(thread_id)
        print(f"OpenAI: deleted thread {thread_id}")

    print(f"Erasure complete for user {user_id}")

Document your erasure procedure. Under GDPR Article 17, you must be able to demonstrate that erasure was complete. Keep an audit log of deletion requests with timestamps. Kronvex's EU hosting means your memory data is in scope for EU data protection law; the Assistants API's US hosting is where the SCCs or DPA with OpenAI becomes important for your compliance documentation.