Overview

This page is the atomic definition. The RAG retrieval pipeline lives at rag-retrieval.

Definition

A reranker (cross-encoder) is a model that takes a query and a candidate document as a combined input and outputs a relevance score. This contrasts with bi-encoder embedding models, which encode query and document independently and compare their vectors. Cross-encoders are more accurate because they can attend to the interaction between query tokens and document tokens, but slower because they require a forward pass per candidate. The standard RAG pipeline runs reranking as a second stage: the first stage retrieves a broad candidate set (50-200 documents) using fast vector-similarity search; the reranker scores each candidate and returns the top 5-10 for insertion into the context. Popular reranking models: Cohere Rerank, Jina Reranker v2, Mixedbread mxbai-rerank, and cross-encoders from the Sentence Transformers library.

When it applies

Add a reranker when retrieval precision is insufficient, meaning the top-k results include many irrelevant documents. Measure recall and mean reciprocal rank on an evaluation set before and after adding the reranker to confirm the improvement justifies the latency cost.

Example

First-stage retrieval: 100 candidates via pgvector cosine search in 30 ms. Cohere Rerank scores all 100 in 80 ms and returns the 5 most relevant. The LLM receives 5 clean passages instead of 100, halving the context window cost.

Citing this term

See Reranker (llmbestpractices.com/glossary/reranker).