Overview
Normalization is the most frequently skipped step in embedding pipelines, and the most reliably harmful to skip. L2 normalization maps every vector to the unit sphere, after which cosine similarity reduces to dot product, and dot product is the fastest metric on every major vector engine. The cost of normalization is one divide-per-vector. The cost of skipping it is subtle, hard-to-diagnose retrieval regressions. For context on how dimensions interact with normalization, see embeddings-dimensionality.
L2-normalize every vector at write time and at query time
Normalize before storing in the index and normalize the query vector before each search. Both must be normalized, or the dot product no longer equals cosine similarity.
import numpy as np
def l2_normalize(v: np.ndarray) -> np.ndarray:
norm = np.linalg.norm(v)
if norm == 0:
return v # zero vector; leave as-is
return v / normFor batches:
def l2_normalize_batch(vecs: np.ndarray) -> np.ndarray:
norms = np.linalg.norm(vecs, axis=1, keepdims=True)
return np.where(norms > 0, vecs / norms, vecs)Apply normalization immediately after the embed call, before any caching step, so cached vectors are always in normalized form.
After normalization, dot product and cosine similarity are mathematically identical
For two unit vectors u and v:
dot(u, v) == cosine_similarity(u, v) == u · v / (|u| * |v|) == u · v
Because |u| = |v| = 1, the denominator drops out. This means you can configure the vector index to use inner product (IP / dot product), which is faster than computing cosine explicitly on every comparison. Set the metric to dot or ip after normalizing, never before.
- pgvector: use
<#>(inner product, negated) or<=>(cosine) operator; after normalization both produce the same ranking. - ChromaDB: set
metadata={"hnsw:space": "ip"}or"cosine". - Pinecone: set
metric="dotproduct"at index creation. - Faiss: use
IndexFlatIPorMETRIC_INNER_PRODUCTin HNSW.
Unnormalized vectors distort similarity by vector magnitude
Without normalization, longer documents tend to produce higher-magnitude embeddings. A dot product between an unnormalized query and an unnormalized corpus vector rewards length over relevance. A short, highly relevant document can rank below a verbose but weakly relevant one solely because of magnitude imbalance. This is a silent bug; the index does not error, results just degrade.
The classic symptom: recall on your golden set drops for short queries against long documents. If you see that pattern, check normalization first.
sklearn and sentence-transformers normalize by default; Voyage and OpenAI do not
Library defaults vary:
sklearn.preprocessing.normalize: normalizes by default (norm="l2").sentence_transformers: applies normalization whennormalize_embeddings=Trueis passed toencode(); not the default in older versions.- Voyage AI client: returns raw (unnormalized) vectors.
- OpenAI Python client: returns raw vectors.
- Hugging Face
transformerswith mean pooling: unnormalized.
Always check the output norm of the first batch before building an index:
vecs = np.array(embed(sample_texts))
norms = np.linalg.norm(vecs, axis=1)
print(f"min norm: {norms.min():.4f}, max norm: {norms.max():.4f}")
# If not approx 1.0, normalize before indexing.Re-normalize after Matryoshka truncation
Truncating a unit vector to fewer dimensions no longer produces a unit vector. The truncated prefix must be re-normalized before indexing or comparison. See embeddings-dimensionality for the truncation workflow.
vec_full = l2_normalize(raw_embedding)
vec_trunc = vec_full[:512]
vec_trunc = l2_normalize(vec_trunc) # required: truncation broke unit normStore normalized vectors; verify on read
Cache and persist normalized vectors. Normalizing at query time from a stored unnormalized vector wastes compute and risks inconsistency if the normalization code changes. Add a sanity assertion at index load time:
def assert_normalized(vecs: np.ndarray, tol=1e-4):
norms = np.linalg.norm(vecs, axis=1)
assert np.allclose(norms, 1.0, atol=tol), f"Vectors not unit-norm: {norms[:5]}"This check runs in milliseconds for any reasonable index sample and catches the most common pipeline misconfiguration before it reaches production.