Overview

ChromaDB offers two filter axes on every query: where operates on metadata fields, and where_document operates on the raw document text. Used together they form a hybrid retrieval pattern that is more precise than pure vector similarity. The failure mode in production is almost always the opposite: filters too loose, context window filled with off-topic chunks, model hallucinating from irrelevant text.

Always pass a where filter when a metadata constraint applies

A vector query without a filter searches the entire collection. For multi-tenant collections or collections with multiple content types, that is almost always wrong.

results = collection.query(
    query_embeddings=[query_vec],
    n_results=10,
    where={"tenant_id": "acme", "lang": "en"},
)
  • If you know the tenant, always filter by tenant_id. A missing filter leaks rows across tenants and wastes top-k slots on irrelevant results.
  • Filters reduce the candidate set before ANN scoring, not after. The n_results count applies to the post-filter population.
  • When the filter is highly selective (few matching rows), raise n_results to compensate; the candidate pool is smaller.

Use the operator syntax for range and set filters

ChromaDB supports comparison operators in where clauses. Use them instead of post-processing in Python.

# Range: last 90 days only
where={"created_at": {"$gte": 1736294400}}
 
# Set inclusion
where={"source": {"$in": ["zendesk", "confluence"]}}
 
# Exclusion
where={"lang": {"$ne": "de"}}
 
# Compound AND (default when multiple keys)
where={"tenant_id": "acme", "lang": "en"}
 
# Explicit OR
where={"$or": [{"source": "zendesk"}, {"source": "confluence"}]}

Available operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or. Do not reach for Python-side filtering when an operator covers the case; post-filtering defeats the purpose of the pre-filter pass.

Use where_document for keyword constraints, not as the primary filter

where_document runs a substring or contains check against the stored document text. It is slower than metadata filters and not indexed.

results = collection.query(
    query_embeddings=[query_vec],
    n_results=20,
    where={"tenant_id": "acme"},
    where_document={"$contains": "refund policy"},
)
  • Use where_document to enforce that a rare keyword appears (error codes, product identifiers, exact phrases the vector may not surface).
  • Combine it with a where metadata filter to keep the candidate set small before the text scan.
  • Do not rely on where_document alone as the primary retrieval mechanism; it has no ranking and returns arbitrary matches.

Retrieve a wider candidate set, then rerank in the application layer

ANN with a tight filter can return too few candidates because the filter reduces the searchable pool before scoring. The correct pattern is to over-fetch and rerank.

# Fetch more than you need
raw = collection.query(
    query_embeddings=[query_vec],
    n_results=50,
    where={"tenant_id": "acme"},
)
 
# Rerank top 50 down to top 5 with a cross-encoder
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(query_text, doc) for doc in raw["documents"][0]]
scores = model.predict(pairs)
top5 = sorted(zip(scores, raw["ids"][0], raw["documents"][0]), reverse=True)[:5]

This pattern is described in rag-reranking. Reranking on 50 candidates costs milliseconds; sending 50 candidates to the LLM costs tokens.

Combine metadata filters with full-text BM25 for hybrid retrieval

ChromaDB does not have a native BM25 index. For hybrid retrieval, run a parallel BM25 pass and merge results with reciprocal rank fusion before calling ChromaDB.

from rank_bm25 import BM25Okapi
 
def reciprocal_rank_fusion(bm25_ids, chroma_ids, k=60):
    scores = {}
    for rank, doc_id in enumerate(bm25_ids):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    for rank, doc_id in enumerate(chroma_ids):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    return sorted(scores, key=scores.get, reverse=True)

Use BM25 as the parallel path when queries contain rare tokens: product codes, error messages, proper nouns. See embeddings-hybrid-search for the full hybrid retrieval pattern and rag-retrieval for the retrieval loop design.

Validate filter correctness with an explain step

A filter that silently returns zero results is a worse failure than a filter error. Add a validation step in development.

def validate_filter(collection, where):
    count = collection.count()
    sample = collection.get(where=where, limit=1)
    if not sample["ids"]:
        raise ValueError(
            f"Filter {where} matches zero documents in {collection.name}. "
            f"Total docs: {count}"
        )

Zero-result filters are the most common cause of “the LLM said it couldn’t find anything” bugs. Surface them early rather than letting an empty context window pass silently to the model.