Vector Databases for AI Agents:
pgvector vs Pinecone vs Weaviate
Not all vector databases are equal when the use case is agent memory rather than RAG retrieval. The requirements are different — you need structured metadata, recency weighting, access frequency tracking, and SQL joins. Here's a detailed comparison of the main options.
Why AI agents need vector search
A traditional database answers the question: "Does this record exist?" A vector database answers a different question: "What is most similar to this query?" For agent memory, that distinction is fundamental.
Consider a support agent that has stored thousands of past interactions. When a user asks "I'm having trouble with my invoice", an exact-match search finds nothing useful. But a semantic vector search finds: "User couldn't download their receipt", "billing portal access issue", "payment method update failed" — all relevant, none containing the word "invoice".
This is the core problem that vector search solves for agents: intent matching at recall time, not keyword matching. The user's current query is embedded into the same vector space as stored memories, and cosine similarity finds the nearest neighbors regardless of exact wording.
Comparison table
Evaluated specifically for the agent memory use case — not generic document retrieval:
| Feature | pgvector | Pinecone | Weaviate | Qdrant |
|---|---|---|---|---|
| Fully managed | Via Supabase/RDS | Yes | Cloud + self-host | Cloud + self-host |
| Self-hostable | Yes | No | Yes | Yes |
| EU region available | Yes (Frankfurt) | EU-West only | Yes | Yes |
| SQL joins on metadata | Native SQL | No | GraphQL only | Payload filters |
| ACID transactions | Yes | No | No | No |
| Hybrid search (vector + keyword) | With tsvector | Yes | Yes (BM25) | Yes (sparse) |
| p99 latency (10k vectors) | <5ms local | ~20ms managed | ~15ms managed | ~10ms managed |
| Estimated cost (1M vectors) | ~$25/mo (Supabase) | ~$70/mo | ~$45/mo | ~$35/mo |
| Native recency/frequency weighting | SQL computed columns | No | No | No |
| Co-located with structured data | Same DB | Separate service | Separate service | Separate service |
Why pgvector wins for agent memory
The comparison above shows that Pinecone, Weaviate, and Qdrant are excellent for their primary use case — pure vector retrieval at scale. But agent memory is not pure vector retrieval.
1. Structured data is already in PostgreSQL
Your api_keys table, your agents table, your usage quota tracking — all of this already lives in PostgreSQL. When the memory store is the same database, a single query can join vectors with structured metadata:
SELECT m.content, m.memory_type, m.access_count, m.created_at, 1 - (m.embedding <=> $1) AS similarity FROM memories m JOIN agents a ON m.agent_id = a.id WHERE a.api_key_id = $2 AND m.session_id = $3 AND (m.expires_at IS NULL OR m.expires_at > now()) ORDER BY similarity DESC LIMIT 20;
Doing this with Pinecone requires two round trips: first query the vector index, then fetch structured data from a separate datastore. With pgvector, it is one query, one network hop, one transaction.
2. Recency and frequency are SQL expressions
The Kronvex confidence score formula is:
-- confidence = similarity × 0.6 + recency × 0.2 + frequency × 0.2 -- recency: sigmoid with 30-day inflection -- frequency: log-scaled access count SELECT content, (similarity * 0.6) + (1 / (1 + EXP(-(EXTRACT(EPOCH FROM (now() - created_at)) / 86400 - 30) / 10))) * 0.2 + (LN(1 + access_count) / LN(100)) * 0.2 AS confidence FROM candidates ORDER BY confidence DESC LIMIT 6;
This kind of composite scoring — blending vector similarity with temporal decay and usage frequency — cannot be expressed in Pinecone's metadata filter system or Weaviate's GraphQL. It requires a real expression language. SQL provides that.
3. ACID guarantees matter for agent state
When a user upgrades their plan and you update both the api_keys table and delete old memories (to free up quota), that operation needs to be atomic. If it partially succeeds, your agent's state is inconsistent. PostgreSQL gives you full ACID transactions. Purpose-built vector databases generally do not.
How Kronvex uses pgvector + confidence scoring
Kronvex is built on top of Supabase PostgreSQL with the pgvector extension. Each memory is stored as a 1536-dimensional vector (OpenAI text-embedding-3-small) alongside its structured metadata.
When recall() is called, the query text is embedded in real time, and the database performs an approximate nearest-neighbor (ANN) search using the IVFFlat index. The top-20 candidates are then re-ranked using the confidence score formula above, and the top-k results are returned.
Code example: raw pgvector vs Kronvex API
To illustrate what Kronvex abstracts away, here is the same "store and recall a memory" operation implemented directly against pgvector, then with the Kronvex SDK:
import asyncpg, openai, json from datetime import datetime openai_client = openai.AsyncOpenAI() async def embed(text: str) -> list[float]: r = await openai_client.embeddings.create( model="text-embedding-3-small", input=text ) return r.data[0].embedding async def remember_raw(pool, agent_id: str, content: str): vec = await embed(content) await pool.execute(""" INSERT INTO memories (agent_id, content, embedding, created_at) VALUES ($1, $2, $3::vector, $4) """, agent_id, content, json.dumps(vec), datetime.utcnow()) async def recall_raw(pool, agent_id: str, query: str, top_k: int = 5): # No confidence scoring — pure cosine similarity only vec = await embed(query) rows = await pool.fetch(""" SELECT content, 1 - (embedding <=> $1::vector) AS sim FROM memories WHERE agent_id = $2 ORDER BY sim DESC LIMIT $3 """, json.dumps(vec), agent_id, top_k) return [r["content"] for r in rows] # Missing: session scoping, TTL, access_count updates, # confidence scoring, memory types, quota enforcement...
from kronvex import Kronvex kv = Kronvex("kv-your-key") agent = kv.agent("your-agent-id") # Store — embedding + metadata handled automatically agent.remember( "User prefers formal tone, no bullet points", memory_type="semantic", session_id="user-42", ) # Recall — cosine sim + recency + frequency combined result = agent.recall( query="how should I format my response?", top_k=5, session_id="user-42", ) for mem in result.memories: print(f"{mem.confidence:.2f} — {mem.content}") # 0.87 — User prefers formal tone, no bullet points