What Assistants API threads give you
When OpenAI launched the Assistants API, one of its headline features was persistent threads. Unlike the Chat Completions API where you manually manage and pass conversation history on every call, Assistants API threads are stored server-side. You create a thread once, add messages to it, and OpenAI handles the context window management automatically, truncating older messages when the thread grows beyond the model's context limit.
Concretely, threads give you:
- Automatic conversation history — all messages in a thread are available to the assistant without you passing them manually
- Managed context truncation — OpenAI truncates old messages when approaching the context limit, keeping the most recent exchanges
- File attachments per thread — documents, images, and code files can be attached and referenced in the conversation
- Built-in tools — Code Interpreter and file search are available out of the box without additional infrastructure
- Simple API surface — create thread, add message, create run, poll for completion — that's the full pattern
For many use cases, this is genuinely sufficient. A customer support bot that handles one ticket per thread, a coding assistant where each session is self-contained, or a document Q&A tool — all work well with threads alone.
The limits of thread-based memory
The problems emerge as soon as your use case crosses any of these boundaries:
No cross-thread memory
Each thread is a silo. A user who tells your assistant "I prefer concise answers" in Thread A will need to repeat that preference when Thread B starts. There is no mechanism in the Assistants API to share context between threads. In practice, most real-world users don't have one conversation — they come back repeatedly over days, weeks, and months.
No semantic search across history
Thread context is injected linearly into the context window. There is no way to ask "what did this user tell me about their infrastructure stack across all past threads?" You can scroll through thread messages manually, but there is no programmatic semantic retrieval. At scale, with thousands of users and millions of messages, this becomes completely unworkable.
Silent truncation loses important facts
When a thread grows beyond the model's context limit, OpenAI silently drops the oldest messages. This means facts established early in a long conversation — "my company name is Acme Corp", "we use Kubernetes in production" — can silently disappear from the assistant's context. The assistant has no way to warn the user or retrieve these facts once truncated.
US data residency only
All Assistants API data, including thread contents, is stored on OpenAI's US infrastructure. If you're building for EU users, thread contents may include personal data — names, preferences, business information — that is subject to GDPR. Storing this data in the US without appropriate safeguards (SCCs, adequacy decisions) creates a legal compliance risk.
GDPR Article 17 — Right to erasure: If a user requests deletion of their data, you need to delete their thread contents from OpenAI's servers. OpenAI's API allows thread deletion, but you have no control over retention in OpenAI's internal systems. For a compliant EU product, you need a memory solution where you have full control over deletion.
No confidence scoring
Thread injection is all-or-nothing: either you include the thread history or you don't. There is no mechanism to surface which past facts are most relevant to the current query. A well-designed memory layer like Kronvex computes a confidence score combining semantic similarity, recency, and access frequency — so the most relevant memories surface at the top rather than forcing the model to scan through everything.
| Feature | Assistants API Threads | Kronvex Memory |
|---|---|---|
| Scope | Single thread (conversation) | Across all sessions, all time |
| Retrieval | Linear, full history | Semantic similarity + confidence scoring |
| Truncation | Silent, oldest-first | Explicit TTL, never silently lost |
| Data location | US (OpenAI servers) | EU (Supabase Frankfurt) |
| Deletion control | Delete thread via API | Delete agent + all memories |
| Multi-agent sharing | ✗ | ✓ |
When you need more
The trigger points that signal you've outgrown threads-only memory:
- Users return repeatedly and expect the assistant to "remember" them across sessions. Support tickets, coaching sessions, project management — any use case with returning users.
- Multiple agents share context about the same user. A sales agent and a support agent both need to know the customer's plan tier, preferences, and past issues — stored once, accessible everywhere.
- GDPR deletion requests arrive and you need to prove complete erasure of a user's data from your AI layer.
- Thread truncation is causing errors — users complain the assistant "forgot" something they mentioned early in a long conversation.
- EU users are your primary market and legal has flagged the US data residency issue.
- You need to query user context programmatically — e.g., "show me all users who mentioned PostgreSQL" for product analytics or proactive outreach.
Hybrid approach: keep Assistants for conversation, add Kronvex for persistent facts
You don't have to choose between the Assistants API and a dedicated memory layer. The optimal architecture for most production use cases is hybrid:
- Assistants API threads handle the current conversation — the rolling context window of the active session, tool calls, file attachments, and structured outputs
- Kronvex handles persistent user facts — preferences, context, history that should survive across threads and sessions
The integration point is the system prompt. Before creating each thread run, you call Kronvex to retrieve the most relevant memories for the current query and inject them into the assistant's instructions. The assistant sees both the current thread context and the relevant long-term memory.
Why not just use a long system prompt? You could store user preferences in a long static system prompt per user. But this doesn't scale: with many users, updating and storing individual system prompts becomes unwieldy. Kronvex's semantic retrieval means you only inject the memories relevant to the current query, not the user's entire history, which keeps token usage efficient.
Migration: extract facts from existing threads
If you have existing Assistants API threads with valuable conversation history, you can extract facts from them and store them in Kronvex. This is a one-time migration, not an ongoing sync — after migration, new facts are stored to Kronvex in real time.
The migration process:
- List all threads for a user via the Assistants API
- Fetch all messages from each thread
- Run an extraction LLM call to identify facts worth persisting
- Store extracted facts to Kronvex via
agent.remember()
Code examples
from openai import OpenAI
from kronvex import Kronvex
openai_client = OpenAI(api_key="sk-your-key")
kv = Kronvex(api_key="kv-your-key")
ASSISTANT_ID = "asst_your_assistant_id"
def run_with_memory(user_id: str, user_message: str) -> str:
agent = kv.agent(user_id)
# 1. Retrieve relevant long-term memories from Kronvex
context = agent.inject_context(user_message, top_k=5)
# 2. Build system prompt with injected memory
system_instructions = f"""You are a helpful B2B assistant.
PERSISTENT MEMORY (facts about this user from previous sessions):
{context or "No prior memory available."}
Use the above memory to personalize your responses. Always refer to
what you know about the user when relevant."""
# 3. Create or retrieve thread for this session
# In production, store thread_id per user in your DB
thread = openai_client.beta.threads.create()
# 4. Add user message to thread
openai_client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content=user_message
)
# 5. Run assistant with memory-augmented instructions
run = openai_client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=ASSISTANT_ID,
additional_instructions=system_instructions
)
# 6. Get response
messages = openai_client.beta.threads.messages.list(
thread_id=thread.id, order="desc", limit=1
)
response_text = messages.data[0].content[0].text.value
# 7. Store key facts to Kronvex (fire-and-forget in production)
_extract_and_store(agent, user_message, response_text)
return response_text
def _extract_and_store(agent, user_msg: str, assistant_reply: str):
"""Extract facts from conversation and store to Kronvex."""
extraction_prompt = f"""Extract facts about the user worth remembering.
Return JSON list of strings. Return [] if nothing notable.
Only store user context, preferences, plans — not general knowledge.
User: {user_msg}
Assistant: {assistant_reply}
Facts:"""
result = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": extraction_prompt}],
response_format={"type": "json_object"}
)
import json
try:
data = json.loads(result.choices[0].message.content)
facts = data.get("facts", data) if isinstance(data, dict) else data
for fact in (facts if isinstance(facts, list) else []):
if fact.strip():
agent.remember(fact)
except Exception:
pass
def migrate_threads_to_kronvex(user_id: str, openai_thread_ids: list[str]):
"""One-time migration: extract facts from existing threads into Kronvex."""
agent = kv.agent(user_id)
for thread_id in openai_thread_ids:
# Fetch all messages from thread
messages = openai_client.beta.threads.messages.list(
thread_id=thread_id,
order="asc",
limit=100
)
# Build conversation text
conversation = "\n".join([
f"{msg.role.upper()}: {msg.content[0].text.value}"
for msg in messages.data
if msg.content and msg.content[0].type == "text"
])
if not conversation.strip():
continue
# Extract facts with LLM
extraction = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"""Extract all durable facts about the user from this conversation.
Return a JSON list of concise fact strings.
Conversation:
{conversation[:4000]} # Truncate very long threads
Facts (JSON list):"""
}]
)
import json
try:
facts = json.loads(extraction.choices[0].message.content)
for fact in (facts if isinstance(facts, list) else []):
agent.remember(fact)
print(f"Thread {thread_id}: stored {len(facts)} facts")
except Exception as e:
print(f"Thread {thread_id}: extraction failed: {e}")
# Run migration
migrate_threads_to_kronvex("user_42", ["thread_abc", "thread_def", "thread_ghi"])
def erase_user_data(user_id: str, openai_thread_ids: list[str]):
"""
Article 17 right to erasure: delete all user data from both
Assistants API threads and Kronvex memory.
"""
# 1. Delete Kronvex agent and all memories
agent = kv.agent(user_id)
agent.delete() # Deletes agent + all associated memories
print(f"Kronvex: deleted all memories for {user_id}")
# 2. Delete OpenAI threads
for thread_id in openai_thread_ids:
openai_client.beta.threads.delete(thread_id)
print(f"OpenAI: deleted thread {thread_id}")
print(f"Erasure complete for user {user_id}")
Document your erasure procedure. Under GDPR Article 17, you must be able to demonstrate that erasure was complete. Keep an audit log of deletion requests with timestamps. Kronvex's EU hosting means your memory data is in scope for EU data protection law; the Assistants API's US hosting is where the SCCs or DPA with OpenAI becomes important for your compliance documentation.