ChromaDB Best Practices

Overview

ChromaDB is an embedded vector database that fits early-stage rag workloads: a few collections, a few million vectors, retrieval from a single application process. It is the right pick for prototypes, internal tools, and small production apps. This page covers collection design, metadata, retrieval, persistence, and the signals that you have outgrown it.

One collection per use case, not per tenant

A collection in ChromaDB pins an embedding model and dimension. Use that boundary deliberately.

One collection per retrieval task: docs_support, docs_marketing, code_snippets. Each may use a different embedding model.
Do not create one collection per customer or per user. Multi-tenant separation belongs in metadata (tenant_id), filtered at query time.
A new collection per tenant explodes the count, fragments indexes, and breaks bulk queries.

import chromadb
client = chromadb.PersistentClient(path="./.chroma")
collection = client.get_or_create_collection(
    name="docs_support",
    metadata={"embedding_model": "text-embedding-3-small", "dimension": 1536},
)

Type and pin metadata fields

Metadata filters are the second axis of retrieval. Treat the schema as fixed.

Decide the metadata keys before ingest. Common keys: source, tenant_id, created_at, lang, chunk_index.
Use one type per key forever. Once created_at is a unix timestamp, do not mix in ISO strings later.
Keep values flat. Chroma does not query into nested objects.
Index-worthy keys are the ones you filter on. Throw the rest into a payload JSON string and parse client-side.

collection.add(
    ids=["doc-1"],
    documents=["..."],
    metadatas=[{"tenant_id": "t_123", "source": "zendesk", "created_at": 1736294400}],
    embeddings=[embedding],
)

Pin the embedding model and dimension at collection creation

The embedding model is part of the collection’s identity. Mixing models inside one collection produces meaningless similarity scores.

Record the model name and dimension in the collection’s metadata (Chroma stores it for you).
Switching models means a new collection. Re-embed everything; do not migrate in place.
Normalize vectors if your model recommends it (most modern OpenAI and BGE models are already normalized). Use cosine distance for normalized vectors and L2 otherwise.
See embeddings for the model selection rules.

Retrieve with top-k plus a metadata filter

The default retrieval pattern is query with where clauses. Use both axes.

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=10,
    where={"tenant_id": "t_123", "lang": "en"},
    where_document={"$contains": "refund"},
)

Always pass a where filter when one applies. Filtering after the fact wastes the top-k slots on irrelevant rows.
n_results between 5 and 20 is the usable range. More than that and you are doing the LLM’s reranking job.
For hybrid retrieval, combine where_document keyword filters with vector similarity, then rerank in the application layer.
See rag for the chunking and reranking patterns that feed Chroma.

Use `PersistentClient` locally and server mode for shared access

Persistence mode is a deployment decision.

# Local, single-process. File lives at ./.chroma
client = chromadb.PersistentClient(path="./.chroma")
 
# Shared, multi-process. Run `chroma run --path ./.chroma` and connect over HTTP.
client = chromadb.HttpClient(host="chroma", port=8000)

PersistentClient is correct for CLIs, scripts, single-worker services, and notebooks.
Multi-process workloads need server mode. Two PersistentClient instances on the same path will corrupt the index.
For Docker, run the chromadb/chroma container and mount a volume for the data directory.

Outgrow ChromaDB when scale or features demand it

ChromaDB stops being the right pick at specific signals. Switch before the migration becomes an outage.

Throughput above a few hundred queries per second per node. Move to Qdrant or Weaviate.
Vectors past tens of millions. Recall and latency degrade; pgvector with HNSW on postgres gets you back to predictable performance.
Filtered ANN with sub-100ms p99 at scale. Qdrant’s payload indexes are the strongest option.
You already run Postgres and do not want a second database. Use pgvector and consolidate.
You need on-disk quantization, multi-vector retrieval, or cluster-level replication. Chroma does not have those.

Migration is a dump of vectors, ids, and metadata to Parquet, then bulk-load. Plan one migration, not three.

LLM Best Practices

Explorer

Overview

One collection per use case, not per tenant

Type and pin metadata fields

Pin the embedding model and dimension at collection creation

Retrieve with top-k plus a metadata filter

Use `PersistentClient` locally and server mode for shared access

Outgrow ChromaDB when scale or features demand it

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

ChromaDB Best Practices

Overview

One collection per use case, not per tenant

Type and pin metadata fields

Pin the embedding model and dimension at collection creation

Retrieve with top-k plus a metadata filter

Use PersistentClient locally and server mode for shared access

Outgrow ChromaDB when scale or features demand it

Related

Graph View

Table of Contents

Backlinks

Use `PersistentClient` locally and server mode for shared access