Overview
ChromaDB is an embedded vector database that fits early-stage rag workloads: a few collections, a few million vectors, retrieval from a single application process. It is the right pick for prototypes, internal tools, and small production apps. This page covers collection design, metadata, retrieval, persistence, and the signals that you have outgrown it.
One collection per use case, not per tenant
A collection in ChromaDB pins an embedding model and dimension. Use that boundary deliberately.
- One collection per retrieval task:
docs_support,docs_marketing,code_snippets. Each may use a different embedding model. - Do not create one collection per customer or per user. Multi-tenant separation belongs in metadata (
tenant_id), filtered at query time. - A new collection per tenant explodes the count, fragments indexes, and breaks bulk queries.
import chromadb
client = chromadb.PersistentClient(path="./.chroma")
collection = client.get_or_create_collection(
name="docs_support",
metadata={"embedding_model": "text-embedding-3-small", "dimension": 1536},
)Type and pin metadata fields
Metadata filters are the second axis of retrieval. Treat the schema as fixed.
- Decide the metadata keys before ingest. Common keys:
source,tenant_id,created_at,lang,chunk_index. - Use one type per key forever. Once
created_atis a unix timestamp, do not mix in ISO strings later. - Keep values flat. Chroma does not query into nested objects.
- Index-worthy keys are the ones you filter on. Throw the rest into a
payloadJSON string and parse client-side.
collection.add(
ids=["doc-1"],
documents=["..."],
metadatas=[{"tenant_id": "t_123", "source": "zendesk", "created_at": 1736294400}],
embeddings=[embedding],
)Pin the embedding model and dimension at collection creation
The embedding model is part of the collection’s identity. Mixing models inside one collection produces meaningless similarity scores.
- Record the model name and dimension in the collection’s
metadata(Chroma stores it for you). - Switching models means a new collection. Re-embed everything; do not migrate in place.
- Normalize vectors if your model recommends it (most modern OpenAI and BGE models are already normalized). Use cosine distance for normalized vectors and L2 otherwise.
- See embeddings for the model selection rules.
Retrieve with top-k plus a metadata filter
The default retrieval pattern is query with where clauses. Use both axes.
results = collection.query(
query_embeddings=[query_embedding],
n_results=10,
where={"tenant_id": "t_123", "lang": "en"},
where_document={"$contains": "refund"},
)- Always pass a
wherefilter when one applies. Filtering after the fact wastes the top-k slots on irrelevant rows. n_resultsbetween 5 and 20 is the usable range. More than that and you are doing the LLM’s reranking job.- For hybrid retrieval, combine
where_documentkeyword filters with vector similarity, then rerank in the application layer. - See rag for the chunking and reranking patterns that feed Chroma.
Use PersistentClient locally and server mode for shared access
Persistence mode is a deployment decision.
# Local, single-process. File lives at ./.chroma
client = chromadb.PersistentClient(path="./.chroma")
# Shared, multi-process. Run `chroma run --path ./.chroma` and connect over HTTP.
client = chromadb.HttpClient(host="chroma", port=8000)PersistentClientis correct for CLIs, scripts, single-worker services, and notebooks.- Multi-process workloads need server mode. Two
PersistentClientinstances on the same path will corrupt the index. - For Docker, run the
chromadb/chromacontainer and mount a volume for the data directory.
Outgrow ChromaDB when scale or features demand it
ChromaDB stops being the right pick at specific signals. Switch before the migration becomes an outage.
- Throughput above a few hundred queries per second per node. Move to Qdrant or Weaviate.
- Vectors past tens of millions. Recall and latency degrade; pgvector with
HNSWon postgres gets you back to predictable performance. - Filtered ANN with sub-100ms p99 at scale. Qdrant’s payload indexes are the strongest option.
- You already run Postgres and do not want a second database. Use pgvector and consolidate.
- You need on-disk quantization, multi-vector retrieval, or cluster-level replication. Chroma does not have those.
Migration is a dump of vectors, ids, and metadata to Parquet, then bulk-load. Plan one migration, not three.