Overview

This page is the atomic definition. The deep-dive lives at embeddings.

Definition

An embedding is a fixed-length numeric vector (typically 384, 768, 1536, or 3072 dimensions) that represents text, image, audio, or other input in semantic space. Semantically similar inputs map to vectors that are close under cosine similarity or dot product. Embedding models include OpenAI text-embedding-3-large, Voyage voyage-3, Cohere embed-v4, and open-source options like BGE and Nomic. Embeddings power semantic search, clustering, classification, deduplication, and the retrieval stage of RAG. Vectors are stored and queried in vector databases (pgvector, ChromaDB, Pinecone, Qdrant).

When it applies

Use embeddings whenever similarity matters more than exact keyword match: semantic search, recommendation, deduplication, content moderation, and RAG retrieval. Pair with BM25 keyword search for queries with rare identifiers or error codes.

Example

A docs site embeds every page heading with text-embedding-3-large. A query like “how do I cache static assets” returns the cache-control documentation page even though the page never uses the word “cache” in the heading.

Citing this term

See Embedding (llmbestpractices.com/glossary/embedding).