The confusion: reinventing the wheel

Every week, someone posts a thread about how they built agent memory using Pinecone, Weaviate, or Qdrant. The architecture is always roughly the same: generate an embedding for each user message, upsert it into the vector database with some metadata, and at the start of each new conversation, query for the top-K most similar past messages and paste them into the system prompt.

It works. But it's also missing everything that makes memory useful at scale: recency weighting (a memory from 3 years ago ranks the same as one from yesterday), frequency tracking (things the user mentions constantly should surface more than one-offs), and a ready-to-use context injection step that doesn't require you to write custom formatting code on every project.

The result is that teams using raw vector databases for agent memory end up writing the same ~200 lines of infrastructure code on every project. The vector DB is the right foundation — but it's infrastructure, not a memory API. The confusion between the two is costing developers weeks of engineering time.

The key distinction: A vector database answers the question "what is most similar to this query?" A memory API answers "what should my agent know right now, given what it has learned, how recently it learned it, and how often it's been relevant?" These are different questions with different answers.

What a vector database actually does

Vector databases (Pinecone, Weaviate, Qdrant, Milvus, or PostgreSQL + pgvector) do one core thing very well: approximate nearest-neighbor (ANN) search over high-dimensional embedding spaces. You give them a vector, they return the top-K closest vectors in the index.

What they are deliberately not responsible for:

None of these are criticisms. A vector database is correct to not handle these concerns — they're application logic, and it's infrastructure. The problem arises when developers treat the vector DB as a complete memory solution and discover, six weeks in, that they've been writing application logic that every agent memory system needs.

Python — raw pgvector memory recall
# What you write when using pgvector directly:
import asyncpg
import openai

async def recall(agent_id: str, query: str, n: int = 5):
    embedding = await get_embedding(query)  # OpenAI call

    # Raw cosine similarity — no recency, no frequency
    rows = await db.fetch("""
        SELECT content, 1 - (embedding <=> $1::vector) as similarity
        FROM memories
        WHERE agent_id = $2
        ORDER BY embedding <=> $1::vector
        LIMIT $3
    """, embedding, agent_id, n)

    # You still need to:
    # 1. Filter by minimum similarity threshold
    # 2. Add recency weighting
    # 3. Add frequency weighting
    # 4. Format results for the system prompt
    # 5. Handle empty results gracefully
    # 6. Increment access_count for retrieved memories
    # That's another 40-60 lines of code.

    return [row['content'] for row in rows]  # Raw, unweighted

What a memory API adds on top

A memory API is what you get when you take the vector database foundation and wrap it with all the application logic that every agent memory system needs. The core additions are:

1. Recency scoring

A memory from yesterday is more relevant than one from a year ago, even if they have identical semantic similarity to your query. Recency scoring applies a time decay function — typically a sigmoid — that boosts recent memories and penalizes stale ones. The inflection point and slope are tunable.

2. Frequency scoring

A memory that has been recalled and found useful 50 times is more reliably relevant than one retrieved once. Frequency scoring tracks access counts per memory and applies a log-scaled boost. This creates a positive feedback loop: memories that keep proving useful rank higher over time.

3. inject-context

The most time-consuming step in DIY memory is turning raw vector matches into a usable system prompt block. inject-context automates this: it runs the full retrieval pipeline (embed query → similarity search → confidence scoring → formatting) and returns a ready-to-use string you can drop directly into your system prompt. No custom formatting code, no edge case handling, no prompt engineering for the memory block itself.

4. Quota management

Production B2B products need to enforce per-customer limits on memory storage. A memory API handles this natively: each API key has a plan with defined agent and memory limits, enforced at the API level. With raw pgvector, you write this quota logic yourself and debug it when a customer hits the ceiling.

Python — Kronvex memory API (same result, 7 lines)
from kronvex import Kronvex

client = Kronvex(api_key="kv-...")
agent = client.agent("user-123")

# Store
await agent.remember("User prefers dark mode and compact layouts")

# Recall with confidence scoring (similarity + recency + frequency)
memories = await agent.recall("display preferences")

# inject-context: ranked + formatted, ready for system prompt
context = await agent.inject_context("What are this user's UI preferences?")

The confidence scoring formula

Kronvex uses a composite confidence score to rank recalled memories. The formula weights three signals:

confidence = similarity × 0.6 + recency × 0.2 + frequency × 0.2
0.6
SIMILARITY WEIGHT
cosine distance via pgvector
0.2
RECENCY WEIGHT
sigmoid, 30-day inflection
0.2
FREQUENCY WEIGHT
log-scaled access count

Similarity (0.6 weight): The base signal. Cosine distance between the query embedding and the memory embedding, computed via pgvector. This accounts for 60% of the score — semantic relevance is the primary signal.

Recency (0.2 weight): A sigmoid function with a 30-day inflection point. A memory created today scores near 1.0. A memory from 60 days ago scores near 0.5. A memory from 6 months ago scores near 0. The sigmoid shape means recency doesn't decay linearly — there's a plateau of "recent enough" and a cliff when memories become stale.

Frequency (0.2 weight): Log-scaled access count. A memory retrieved 10 times scores higher than one retrieved once, but the marginal boost decreases logarithmically so that heavily-accessed memories don't completely dominate. Each time a memory is retrieved via recall, its access count increments automatically.

Why not use LLM reranking? Some memory systems use a secondary LLM call to rerank results after initial retrieval. This produces higher-quality rankings in some cases, but adds 200–500ms latency and ~$0.002 per recall operation. Kronvex's formula-based approach is deterministic, sub-50ms, and has zero additional LLM cost in the recall path.

When raw pgvector is enough

Not every use case needs a full memory API. Raw pgvector (or Pinecone, or Qdrant) is the right choice in these scenarios:

If your use case is any of the above, use pgvector directly. It is a mature, reliable technology and you'll build exactly what you need without unnecessary abstraction.

If, on the other hand, you are building an agent that needs to remember things about users across sessions — and surface those memories intelligently on future turns — then you're building a memory system, not a vector search system. The distinction matters in terms of what code you'll end up writing and maintaining.

Rule of thumb: If you find yourself writing code to handle "this memory is from 6 months ago, should I still surface it?" — you're building a memory system and a memory API will save you significant time.

Migration: from raw pgvector to Kronvex (3 steps)

If you have an existing agent memory implementation using raw pgvector (or Pinecone, Weaviate, or Qdrant), migrating to Kronvex is straightforward. The data model maps directly and the API surface is minimal.

1

Export existing memories and re-ingest via remember()

Query all existing memories from your vector store. For each memory, call agent.remember(text) on the corresponding Kronvex agent. Kronvex will re-embed the text server-side — you don't need to pass vectors. Batch ingest using the async SDK to parallelise.

2

Replace recall() calls

Replace your SELECT ... ORDER BY embedding <=> query_vec LIMIT n queries with agent.recall(query). The return shape is a list of memory objects with content and confidence fields. If you were previously doing threshold filtering manually, Kronvex applies a default minimum similarity threshold for you.

3

Replace system prompt injection with inject-context()

Replace your custom prompt-building function with a single call to agent.inject_context(query). This returns a formatted string ready to prepend to your system prompt. You can delete your formatting logic entirely.

Python — migration in practice
# Step 1: Export from pgvector and re-ingest
import asyncio
from kronvex import AsyncKronvex

client = AsyncKronvex(api_key="kv-...")

async def migrate_user(user_id: str, memories: list[str]):
    agent = client.agent(user_id)
    await asyncio.gather(*[agent.remember(m) for m in memories])

# Step 2: Replace recall
# BEFORE:
# results = await db.fetch(
#     "SELECT content FROM memories WHERE agent_id=$1
#      ORDER BY embedding <=> $2 LIMIT 5", agent_id, query_vec
# )

# AFTER:
results = await agent.recall("user display preferences")
# results[0] = {"content": "...", "confidence": 0.87, ...}

# Step 3: Replace inject-context
# BEFORE: custom_prompt = build_memory_prompt(results)
# AFTER:
context_block = await agent.inject_context("user display preferences")
# Use context_block directly in your system prompt

Conclusion

Vector databases and memory APIs are not the same thing, and treating them as interchangeable is the single most common architectural mistake in agent development today.

A vector database gives you approximate nearest-neighbor search. That's infrastructure. A memory API gives you confidence scoring, recency weighting, frequency tracking, inject-context, quota management, and agent isolation on top of that infrastructure. That's the application layer.

If you're building document RAG or custom recommendation systems, use a vector database directly — pgvector is excellent and free if you already run PostgreSQL. If you're building agent memory and you find yourself writing the same recency and frequency scoring code for the third time, that code should be a service, not a custom implementation.

The confidence formula — similarity × 0.6 + recency × 0.2 + frequency × 0.2 — is not magic. You can implement it yourself in about 40 lines. The question is whether you want to be in the business of maintaining that implementation across every project, or whether you want to make one agent.inject_context() call and move on.

Try Kronvex free

Demo key in 30 seconds. EU-hosted, GDPR-native, no infrastructure to manage. 100 memories free — no credit card.

Get your free API key →