Overview
This page is the atomic definition. The deep-dive lives at embeddings.
Definition
An embedding is a fixed-length numeric vector (typically 384, 768, 1536, or 3072 dimensions) that represents text, image, audio, or other input in semantic space. Semantically similar inputs map to vectors that are close under cosine similarity or dot product. Embedding models include OpenAI text-embedding-3-large, Voyage voyage-3, Cohere embed-v4, and open-source options like BGE and Nomic. Embeddings power semantic search, clustering, classification, deduplication, and the retrieval stage of RAG. Vectors are stored and queried in vector databases (pgvector, ChromaDB, Pinecone, Qdrant).
When it applies
Use embeddings whenever similarity matters more than exact keyword match: semantic search, recommendation, deduplication, content moderation, and RAG retrieval. Pair with BM25 keyword search for queries with rare identifiers or error codes.
Example
A docs site embeds every page heading with text-embedding-3-large. A query like “how do I cache static assets” returns the cache-control documentation page even though the page never uses the word “cache” in the heading.
Related concepts
- embeddings - the deep-dive on model choice, dimensions, and caching.
- rag-retrieval - the retrieval stage that runs on embeddings.
- retrieval-augmented-generation - the RAG pattern that depends on embeddings.
- chromadb - one vector database that stores embeddings.
- pinecone-vs-pgvector - the choice between hosted and Postgres-native vector stores.
Citing this term
See Embedding (llmbestpractices.com/glossary/embedding).