Embeddings: Normalization

Overview

Normalization is the most frequently skipped step in embedding pipelines, and the most reliably harmful to skip. L2 normalization maps every vector to the unit sphere, after which cosine similarity reduces to dot product, and dot product is the fastest metric on every major vector engine. The cost of normalization is one divide-per-vector. The cost of skipping it is subtle, hard-to-diagnose retrieval regressions. For context on how dimensions interact with normalization, see embeddings-dimensionality.

L2-normalize every vector at write time and at query time

Normalize before storing in the index and normalize the query vector before each search. Both must be normalized, or the dot product no longer equals cosine similarity.

import numpy as np
 
def l2_normalize(v: np.ndarray) -> np.ndarray:
    norm = np.linalg.norm(v)
    if norm == 0:
        return v  # zero vector; leave as-is
    return v / norm

For batches:

def l2_normalize_batch(vecs: np.ndarray) -> np.ndarray:
    norms = np.linalg.norm(vecs, axis=1, keepdims=True)
    return np.where(norms > 0, vecs / norms, vecs)

Apply normalization immediately after the embed call, before any caching step, so cached vectors are always in normalized form.

After normalization, dot product and cosine similarity are mathematically identical

For two unit vectors u and v:

dot(u, v) == cosine_similarity(u, v) == u · v / (|u| * |v|) == u · v

Because |u| = |v| = 1, the denominator drops out. This means you can configure the vector index to use inner product (IP / dot product), which is faster than computing cosine explicitly on every comparison. Set the metric to dot or ip after normalizing, never before.

pgvector: use <#> (inner product, negated) or <=>(cosine) operator; after normalization both produce the same ranking.
ChromaDB: set metadata={"hnsw:space": "ip"} or "cosine".
Pinecone: set metric="dotproduct" at index creation.
Faiss: use IndexFlatIP or METRIC_INNER_PRODUCT in HNSW.

Unnormalized vectors distort similarity by vector magnitude

Without normalization, longer documents tend to produce higher-magnitude embeddings. A dot product between an unnormalized query and an unnormalized corpus vector rewards length over relevance. A short, highly relevant document can rank below a verbose but weakly relevant one solely because of magnitude imbalance. This is a silent bug; the index does not error, results just degrade.

The classic symptom: recall on your golden set drops for short queries against long documents. If you see that pattern, check normalization first.

sklearn and sentence-transformers normalize by default; Voyage and OpenAI do not

Library defaults vary:

sklearn.preprocessing.normalize: normalizes by default (norm="l2").
sentence_transformers: applies normalization when normalize_embeddings=True is passed to encode(); not the default in older versions.
Voyage AI client: returns raw (unnormalized) vectors.
OpenAI Python client: returns raw vectors.
Hugging Face transformers with mean pooling: unnormalized.

Always check the output norm of the first batch before building an index:

vecs = np.array(embed(sample_texts))
norms = np.linalg.norm(vecs, axis=1)
print(f"min norm: {norms.min():.4f}, max norm: {norms.max():.4f}")
# If not approx 1.0, normalize before indexing.

Re-normalize after Matryoshka truncation

Truncating a unit vector to fewer dimensions no longer produces a unit vector. The truncated prefix must be re-normalized before indexing or comparison. See embeddings-dimensionality for the truncation workflow.

vec_full = l2_normalize(raw_embedding)
vec_trunc = vec_full[:512]
vec_trunc = l2_normalize(vec_trunc)  # required: truncation broke unit norm

Store normalized vectors; verify on read

Cache and persist normalized vectors. Normalizing at query time from a stored unnormalized vector wastes compute and risks inconsistency if the normalization code changes. Add a sanity assertion at index load time:

def assert_normalized(vecs: np.ndarray, tol=1e-4):
    norms = np.linalg.norm(vecs, axis=1)
    assert np.allclose(norms, 1.0, atol=tol), f"Vectors not unit-norm: {norms[:5]}"

This check runs in milliseconds for any reasonable index sample and catches the most common pipeline misconfiguration before it reaches production.

LLM Best Practices

Explorer

Overview

L2-normalize every vector at write time and at query time

After normalization, dot product and cosine similarity are mathematically identical

Unnormalized vectors distort similarity by vector magnitude

sklearn and sentence-transformers normalize by default; Voyage and OpenAI do not

Re-normalize after Matryoshka truncation

Store normalized vectors; verify on read

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Embeddings: Normalization

Overview

L2-normalize every vector at write time and at query time

After normalization, dot product and cosine similarity are mathematically identical

Unnormalized vectors distort similarity by vector magnitude

sklearn and sentence-transformers normalize by default; Voyage and OpenAI do not

Re-normalize after Matryoshka truncation

Store normalized vectors; verify on read

Related

Graph View

Table of Contents

Backlinks