LangSmith + Persistent Memory for LangChain â€” Kronvex

What LangSmith tracks

LangSmith is LangChain's observability platform. Every LLM call, every chain invocation, every tool use gets recorded as a trace. You can inspect input/output pairs, measure latency, compute token costs, and run evaluations against golden datasets. It answers the question: what happened in this run?

Concretely, LangSmith gives you:

Traces — a hierarchical record of every step in your chain, from the initial user message to the final LLM response, including all intermediate tool calls and retrieval steps
Evaluation datasets — curated input/output pairs you run your chain against to measure quality over time
Feedback collection — thumbs up/down or custom scores attached to individual runs
Regression detection — alerts when a new model version or prompt change degrades quality on your eval dataset
Cost tracking — token usage per run, per chain, per user, aggregated over time

LangSmith is excellent at what it does. It is indispensable for teams shipping LangChain-based applications to production. But it is strictly an observability and evaluation tool. It does not store facts about your users. It does not inject context into future runs. Every run starts fresh.

LangFuse users: Everything in this article applies equally to LangFuse, the open-source alternative to LangSmith. The architecture pattern is identical — observability tool (LangSmith or LangFuse) for tracing, Kronvex for persistent memory. The code examples use LangSmith but the concepts map directly.

The gap: observability vs. memory

Consider a B2B SaaS AI assistant. A user tells the agent: "We're migrating our database to PostgreSQL next quarter. Please take that into account for future recommendations." LangSmith will record this statement as part of the trace for that run. Perfect — you can see it happened. But the next time that user opens a conversation, the agent has no idea about the PostgreSQL migration. The trace is archived, not remembered.

This is the fundamental distinction between observability and memory:

Dimension	LangSmith (Observability)	Kronvex (Memory)
Purpose	Debug, evaluate, monitor	Persist, recall, inject
Query model	Structured filters (run ID, time, tags)	Semantic similarity search
Primary consumer	Developers, ML engineers	The LLM at inference time
Retention intent	Audit log, debugging history	Active context for future runs
GDPR deletion	Delete trace records	Delete agent memories (Article 17)

LangSmith answers "what happened?" Kronvex answers "what does this agent know?" Both questions are important. Neither tool answers the other's question.

Complementary architecture

The architecture for a production LangChain agent that uses both tools looks like this:

User sends a message. Your chain starts executing. LangSmith automatically traces every step via the LangChain callback system — no extra code needed if you've set LANGCHAIN_TRACING_V2=true.
Before calling the LLM, call agent.inject_context(query) on Kronvex. This returns the top N semantically relevant memories for the current user's query, formatted as a ready-to-inject context block.
Inject the context into the system prompt or as a human message prefix. The LLM now has access to what it "knows" about this user.
LLM responds. LangSmith traces the full call including the injected context. This is valuable: you can see exactly what memories were injected for any given run.
After the response, extract facts worth remembering from the conversation and call agent.remember(fact) on Kronvex. This embeds and stores the fact for future retrieval.
LangSmith captures the full trace, including latency breakdown, token cost, and the injected context. You can run evals on whether the injected memories actually improved response quality.

Key insight: Because LangSmith traces include the injected context, you get full auditability of what your agent "remembered" for any given run. This is invaluable for debugging memory-related issues in production.

Code: LangChain chain with LangSmith tracing and Kronvex memory

Here is a complete working example of a LangChain chain that traces with LangSmith and stores/retrieves memory via Kronvex:

Python — Setup

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from kronvex import Kronvex

# LangSmith tracing is enabled via environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-agent-with-memory"

# Kronvex client
kv = Kronvex(api_key="kv-your-api-key")

llm = ChatOpenAI(model="gpt-4o", temperature=0)
output_parser = StrOutputParser()

Python — Chain with memory injection

def run_agent(user_id: str, user_message: str) -> str:
    agent = kv.agent(user_id)

    # 1. Retrieve relevant memories (semantic search, top 5)
    context = agent.inject_context(user_message, top_k=5)

    # 2. Build prompt with injected memory
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a helpful B2B assistant.

{memory_context}

Use the above context to personalize your response. If no context is
provided, respond normally."""),
        ("human", "{user_message}")
    ])

    # 3. Build and run chain â€” LangSmith traces this automatically
    chain = prompt | llm | output_parser
    response = chain.invoke({
        "memory_context": context or "No prior context available.",
        "user_message": user_message
    })

    # 4. Extract and store key facts from this interaction
    # In production, you'd use an LLM to extract facts selectively
    # Here we store the user message as a simple example
    if len(user_message) > 20:  # skip trivial messages
        agent.remember(f"User said: {user_message}")

    return response


# Usage
response = run_agent("user_42", "We're migrating to PostgreSQL next quarter")
print(response)

# Next run â€” agent will remember the PostgreSQL migration
response2 = run_agent("user_42", "What database should I use for this new service?")
print(response2)  # Will reference PostgreSQL migration context

Python — Smarter fact extraction with LLM

from langchain_core.prompts import PromptTemplate

EXTRACT_FACTS_PROMPT = PromptTemplate.from_template("""
Extract key facts worth remembering from this conversation.
Return a JSON list of strings. Return an empty list [] if nothing is worth storing.
Only extract facts about the user's context, preferences, or plans â€” not general knowledge.

User message: {user_message}
Assistant response: {assistant_response}

Facts (JSON list):
""")

def extract_and_store_facts(agent, user_message: str, response: str):
    chain = EXTRACT_FACTS_PROMPT | llm | output_parser
    raw = chain.invoke({
        "user_message": user_message,
        "assistant_response": response
    })
    import json
    try:
        facts = json.loads(raw.strip())
        for fact in facts:
            if fact.strip():
                agent.remember(fact)
    except json.JSONDecodeError:
        pass  # LLM didn't return valid JSON, skip

Using LangSmith evals to test memory quality

One of LangSmith's most powerful features is the ability to run automated evaluations against a golden dataset. You can use this to measure whether your memory integration actually improves response quality — not just assume it does.

The eval setup for memory quality testing:

Python — LangSmith eval for memory quality

from langsmith import Client
from langsmith.evaluation import evaluate, LangChainStringEvaluator

ls_client = Client()

# Create an eval dataset: questions that require prior context to answer well
eval_dataset = ls_client.create_dataset("memory-quality-evals")
ls_client.create_examples(
    inputs=[
        {"user_id": "eval_user_1", "message": "What database did I say we're migrating to?"},
        {"user_id": "eval_user_1", "message": "What's our tech stack preference?"},
        {"user_id": "eval_user_2", "message": "What project were we discussing last time?"},
    ],
    outputs=[
        {"expected": "PostgreSQL"},
        {"expected": "Python, FastAPI"},
        {"expected": "The CRM integration project"},
    ],
    dataset_id=eval_dataset.id
)

# Seed memories for eval users
kv.agent("eval_user_1").remember("We are migrating our database to PostgreSQL next quarter")
kv.agent("eval_user_1").remember("Our tech stack is Python with FastAPI")
kv.agent("eval_user_2").remember("We are working on a CRM integration project")

# Define the chain to evaluate
def agent_with_memory(inputs):
    return {"output": run_agent(inputs["user_id"], inputs["message"])}

# Run evaluation
results = evaluate(
    agent_with_memory,
    data=eval_dataset.name,
    evaluators=[LangChainStringEvaluator("qa")],
    experiment_prefix="memory-injection-v1"
)
print(results)

This gives you a LangSmith experiment you can compare across versions. When you tweak top_k, change your fact extraction prompt, or update memory weighting, you can quantitatively measure the impact on answer quality for memory-dependent queries.

What to measure: Create two variants of your eval dataset — one where the correct answer requires prior memory (memory-dependent questions) and one where it doesn't (general knowledge). Compare both variants to confirm memory improves the first without degrading the second.

Production setup

When moving from prototype to production with this stack, there are several configuration decisions to make explicit:

Environment variables

Shell — .env

# LangSmith
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls__your_key
LANGCHAIN_PROJECT=your-project-name
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com

# Kronvex
KRONVEX_API_KEY=kv-your_key

# OpenAI (used by both LangChain and Kronvex for embeddings)
OPENAI_API_KEY=sk-your_key

Selective memory storage

Not every user message should be stored as a memory. Storing everything creates noise that degrades recall quality. In production, use a lightweight LLM call after each conversation turn to decide what's worth storing. A good heuristic: store facts about user context, preferences, and plans — not questions, greetings, or ephemeral requests.

Async memory writes

Memory writes don't need to block the response. Use the async Kronvex client to fire-and-forget the remember() call after sending the response to the user. This eliminates any latency impact of memory storage on the user-facing response time.

Python — Async memory write (non-blocking)

import asyncio
from kronvex import AsyncKronvex

async_kv = AsyncKronvex(api_key="kv-your-key")

async def handle_message(user_id: str, message: str) -> str:
    agent = async_kv.agent(user_id)

    # Memory recall blocks (we need the context before calling LLM)
    context = await agent.inject_context(message, top_k=5)

    # ... build prompt, call LLM ...
    response = "LLM response here"

    # Memory write is fire-and-forget â€” don't await, don't block
    asyncio.create_task(agent.remember(f"User said: {message}"))

    return response

Tag LangSmith runs with memory metadata

Add metadata to your LangSmith traces to indicate how many memories were injected per run. This lets you correlate memory injection count with response quality in your LangSmith dashboards:

Python — LangSmith run metadata

from langchain_core.callbacks import collect_runs

memories = agent.recall(user_message, top_k=5)

with collect_runs() as cb:
    response = chain.invoke({...})
    # Tag the run with memory count
    run_id = cb.traced_runs[0].id
    ls_client.create_feedback(
        run_id,
        key="memory_count",
        score=len(memories)
    )

LangSmith + Kronvex: Adding Persistent Memory to Your LangChain Observability Stack

Contents