RAG: Vector Databases

Overview

The vector store is a deployment decision, not a quality decision. Recall and latency depend more on chunking, embeddings, and reranking than on which database holds the vectors. Pick the store that fits the operational shape of the rest of the stack. For retrieval rules, see rag-retrieval. For ChromaDB specifics, see chromadb. For the head-to-head between the two most-asked options, see pinecone-vs-pgvector.

Pick Pinecone for managed simplicity

Pinecone is the “just give me a vector API” choice. Serverless billing, no nodes to size, hybrid built in.

Strong fit: small team, no infra people, bursty traffic.
Weak fit: steady high QPS where the bill bites, strict residency or on-prem requirements.
Namespaces partition tenants cheaply; use them instead of separate indexes.

Pinecone is right when the time you would spend tuning Qdrant is worth more than the Pinecone bill.

Pick Qdrant for self-host plus performance

Qdrant is the strongest self-hosted vector store in 2026 for filtered ANN. Written in Rust, payload-aware index, predictable p99.

Strong fit: tens of millions of vectors, heavy metadata filters, latency budget under 50 ms p99.
Payload indexes are the unique advantage. A keyword index on tenant_id plus a vector index gives true pre-filtered ANN.
Managed Qdrant Cloud is available when self-host is the blocker.

Qdrant wins when filtered recall under load is the requirement.

Pick Weaviate for hybrid search by default

Weaviate ships dense plus BM25 hybrid as a first-class query, with built-in tokenization and language analyzers.

Strong fit: workloads where hybrid retrieval is the baseline (technical docs, code, mixed-vocabulary corpora). See rag-retrieval.
Modules add embedding providers, rerankers, and generative steps inside the database; useful for prototypes.

Weaviate is the pick when the brief is “give me a hybrid search engine, not a vector primitive.”

Pick pgvector when Postgres is already in the stack

pgvector turns the database you already operate into a vector store. One backup story, one connection pool, one set of access controls.

Strong fit: existing Postgres, vector count under tens of millions per index. See postgres.
Build HNSW indexes with CREATE INDEX ... USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64);.
Use SET LOCAL hnsw.ef_search = 100; per query to tune recall against latency.
Pair with full-text search for the BM25 arm of hybrid retrieval. See postgres-full-text-search.

pgvector is the lowest-friction option when consolidation matters more than peak QPS.

Pick ChromaDB for local-first dev and small services

ChromaDB is the right embedded vector store for prototypes, internal tools, and single-process apps. See chromadb for the full ruleset.

Strong fit: notebooks, CLIs, small SaaS apps, on-device assistants.
Weak fit: multi-process workloads on the same data file, high-QPS production.

Outgrow it to Qdrant, Weaviate, or pgvector when scale demands. The migration is a Parquet dump and a bulk insert.

Use cosine for normalized vectors, dot product for speed

Once embeddings are unit-normalized, cosine equals dot product, and dot product is faster on every engine.

Normalize at write and query time. Never mix normalized and unnormalized vectors in one index.
Set the metric to dot or ip after normalization on Pinecone, Qdrant, Weaviate, pgvector, and ChromaDB.
Euclidean carries no useful signal on text embeddings; magnitudes are noise after normalization.

See embeddings for the normalization rule and the Matryoshka truncation pattern.

Tune HNSW with two knobs, not ten

HNSW is the default ANN index across every store. Three parameters do the real work.

m (graph degree): 16 is a sane default; raise to 32 for higher recall and a bigger index.
ef_construction (build-time accuracy): 64 to 200; raise for quality, slower builds.
ef_search (query-time accuracy): 50 to 200 per query; raise for recall, lower for speed.

Build with high ef_construction once, sweep ef_search per workload, pick the smallest ef_search that clears the eval bar. See rag-eval.

Plan the migration before you need it

Every vector store has an exit. The cost of switching is the cost of re-embedding plus the cost of re-indexing.

Dump vectors, IDs, and metadata to Parquet on a schedule. The dump is the migration artifact.
Run the new store in shadow mode: write to both, query the old, diff recall@k daily.
Flip query traffic when the new store matches or beats the old on the eval suite.

The cheapest way to ship a vector-store migration is to plan the dump on day one.

LLM Best Practices

Explorer

Overview

Pick Pinecone for managed simplicity

Pick Qdrant for self-host plus performance

Pick Weaviate for hybrid search by default

Pick pgvector when Postgres is already in the stack

Pick ChromaDB for local-first dev and small services

Use cosine for normalized vectors, dot product for speed

Tune HNSW with two knobs, not ten

Plan the migration before you need it

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

RAG: Vector Databases

Overview

Pick Pinecone for managed simplicity

Pick Qdrant for self-host plus performance

Pick Weaviate for hybrid search by default

Pick pgvector when Postgres is already in the stack

Pick ChromaDB for local-first dev and small services

Use cosine for normalized vectors, dot product for speed

Tune HNSW with two knobs, not ten

Plan the migration before you need it

Related

Graph View

Table of Contents

Backlinks