Embedding

Overview

This page is the atomic definition. The deep-dive lives at embeddings.

Definition

An embedding is a fixed-length numeric vector (typically 384, 768, 1536, or 3072 dimensions) that represents text, image, audio, or other input in semantic space. Semantically similar inputs map to vectors that are close under cosine similarity or dot product. Embedding models include OpenAI text-embedding-3-large, Voyage voyage-3, Cohere embed-v4, and open-source options like BGE and Nomic. Embeddings power semantic search, clustering, classification, deduplication, and the retrieval stage of RAG. Vectors are stored and queried in vector databases (pgvector, ChromaDB, Pinecone, Qdrant).

When it applies

Use embeddings whenever similarity matters more than exact keyword match: semantic search, recommendation, deduplication, content moderation, and RAG retrieval. Pair with BM25 keyword search for queries with rare identifiers or error codes.

Example

A docs site embeds every page heading with text-embedding-3-large. A query like “how do I cache static assets” returns the cache-control documentation page even though the page never uses the word “cache” in the heading.

embeddings - the deep-dive on model choice, dimensions, and caching.
rag-retrieval - the retrieval stage that runs on embeddings.
retrieval-augmented-generation - the RAG pattern that depends on embeddings.
chromadb - one vector database that stores embeddings.
pinecone-vs-pgvector - the choice between hosted and Postgres-native vector stores.

Citing this term

See Embedding (llmbestpractices.com/glossary/embedding).

LLM Best Practices

Explorer

Overview

Definition

When it applies

Example

Citing this term

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Embedding

Overview

Definition

When it applies

Example

Related concepts

Citing this term

Related

Graph View

Table of Contents

Backlinks