ChromaDB: Scale Limits and Migration

Overview

ChromaDB is an embedded vector database optimized for ease of use at moderate scale. It is the right choice for prototypes, internal tools, and production workloads with a few hundred thousand vectors. Past that, query latency grows, memory pressure increases, and operational risk rises. This page gives the specific signals that indicate you have outgrown ChromaDB and the migration path to Qdrant or pgvector.

Watch four metrics to detect the scale boundary early

ChromaDB does not fail loudly when it reaches its limits. It slows down and returns lower-quality results.

Query p99 latency above 200ms at n_results=10 on a filtered query. On a well-tuned instance, that number should stay below 50ms up to a few hundred thousand vectors.
Memory RSS above 4 GB for a single collection. HNSW graphs are kept in memory; at 1M vectors with 1024-dim float32, the graph alone is several gigabytes.
Index rebuild time above 10 minutes. ChromaDB rebuilds the HNSW index on startup; long rebuild means the process cannot restart quickly after a crash.
Recall@10 below 0.85 on your golden retrieval set. As the collection grows, HNSW approximate search can miss good results without visible errors.

Set up monitoring on all four before you hit the limit, not after.

At roughly 500K vectors, plan the migration

ChromaDB works comfortably up to a few hundred thousand vectors for most workloads. The upper bound is not a hard number; it depends on dimension, filter selectivity, and query concurrency. Use 500K as the planning trigger.

Below 100K vectors: ChromaDB is almost always fine.
100K to 500K vectors: watch the four metrics above; optimize filters and query patterns.
Above 500K vectors: plan migration before you need it. A forced migration under production load is a multi-day incident.

The upper bound drops faster when filters are highly selective or when concurrent query load is high.

Move to Qdrant when you need filtered ANN at scale

Qdrant is the strongest replacement when the reason for migration is filtered ANN performance. Its payload indexes are purpose-built for the metadata-plus-vector query pattern.

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
 
client = QdrantClient(host="qdrant", port=6333)
client.create_collection(
    collection_name="docs_support",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)
 
points = [
    PointStruct(id=i, vector=vec, payload=meta)
    for i, (vec, meta) in enumerate(zip(vectors, metadatas))
]
client.upsert(collection_name="docs_support", points=points)

Qdrant stores payload indexes on disk, scales horizontally across replicas, and supports on-disk quantization for collections exceeding available RAM. See rag-vector-databases for the full comparison.

Move to pgvector when you already run Postgres

If postgres is already in your stack, pgvector with an HNSW index is often the right long-term home. You eliminate a second database, gain SQL joins between vectors and relational data, and inherit your existing backup and monitoring infrastructure.

CREATE EXTENSION IF NOT EXISTS vector;
 
CREATE TABLE docs_support (
    id TEXT PRIMARY KEY,
    document TEXT,
    tenant_id TEXT,
    created_at BIGINT,
    embedding vector(1024)
);
 
CREATE INDEX ON docs_support USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);
 
SELECT id, document, 1 - (embedding <=> $1::vector) AS score
FROM docs_support
WHERE tenant_id = 'acme'
ORDER BY embedding <=> $1::vector
LIMIT 10;

For the trade-off analysis between managed Pinecone and pgvector, see pinecone-vs-pgvector. For Postgres-specific performance tuning, see postgres-indexes.

Migrate with a parallel write period, not a big-bang cutover

The safest migration pattern: write to both databases during transition, verify recall parity, then cut over reads.

Export the full ChromaDB collection to Parquet. See chromadb-persistence for the export code.
Bulk-load into Qdrant or pgvector.
Dual-write new documents to both stores.
Run your golden retrieval eval set against both; compare recall@10.
Switch read traffic when the new store matches or exceeds ChromaDB recall.
Stop writing to ChromaDB; archive the data directory.

Do not skip step 4. Model behavior changes can hide in a 3-percent recall drop that only surfaces in production after the old store is gone.

Avoid these migration anti-patterns

Re-embedding during migration: migrate vectors as-is. Re-embedding mixes two tasks; if the new embedding fails, you lose both the migration and the re-embed.
Migrating on a live collection without a staged dry run: stage the migration on a recent snapshot, then do a short dual-write window in production.
Assuming Qdrant or pgvector is a drop-in replacement: both have different filter syntax and different tuning knobs. Budget a day for integration testing even when the migration script is simple.
Skipping the backup before migration: the ChromaDB data directory is your rollback. Copy it before you touch anything.

LLM Best Practices

Explorer

ChromaDB: Scale Limits and Migration

Overview

Watch four metrics to detect the scale boundary early

At roughly 500K vectors, plan the migration

Move to Qdrant when you need filtered ANN at scale

Move to pgvector when you already run Postgres

Migrate with a parallel write period, not a big-bang cutover

Avoid these migration anti-patterns

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

ChromaDB: Scale Limits and Migration

Overview

Watch four metrics to detect the scale boundary early

At roughly 500K vectors, plan the migration

Move to Qdrant when you need filtered ANN at scale

Move to pgvector when you already run Postgres

Migrate with a parallel write period, not a big-bang cutover

Avoid these migration anti-patterns

Related

Graph View

Table of Contents

Backlinks