Overview

Postgres has serviceable full-text search. tsvector stores tokenized, stemmed, language-aware terms; tsquery expresses the search; GIN indexes make it fast. The trigram extension pg_trgm adds fuzzy match and similarity ranking. This page covers the pipeline, the indexing rules, weighting, ranking, and when to switch to a dedicated engine. The umbrella rules live in postgres.

Store searchable text as a generated tsvector column

A tsvector is a parsed document: tokens, positions, and weights. Store it as a generated column so it stays in sync with the source.

ALTER TABLE articles ADD COLUMN search tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

to_tsvector('english', ...) runs the English dictionary: lowercases, removes stop words, and stems (running to run). Pick the language per column or per row; an articles table with multilingual content stores the language in a column and computes the tsvector against it.

Build GIN indexes on the tsvector

GIN is the right index type for tsvector. It is slower to build than B-tree but fast to query, and it scales to millions of rows.

CREATE INDEX articles_search_gin ON articles USING GIN (search);

For very large corpora consider RUM (an extension) which stores positions in the index and ranks faster, at the cost of larger index size. See postgres-indexes for the broader index decision tree.

Query with tsquery and the @@ operator

tsquery is the parsed query. Use plainto_tsquery for user input that should match all words; phraseto_tsquery for ordered phrase matches; websearch_to_tsquery for Google-style operators with quotes and minus signs.

SELECT id, title
FROM articles
WHERE search @@ websearch_to_tsquery('english', 'postgres -mysql "full text"')
LIMIT 50;

The @@ operator returns true when the query matches the document. Always pair to_tsvector and to_tsquery with the same language; mismatched dictionaries silently miss results.

Weight fields with setweight

setweight tags tokens with one of four weights (A, B, C, D). ts_rank uses the weights to prefer matches in the title over matches in the body.

-- title weight A, body weight B, comments weight D
search = setweight(to_tsvector('english', title), 'A')
      || setweight(to_tsvector('english', body), 'B')
      || setweight(to_tsvector('english', comments), 'D');

Default weights are {0.1, 0.2, 0.4, 1.0} for {D, C, B, A}. Override with the second argument to ts_rank. Three rules of thumb: title is A, primary body is B, metadata is C, low-signal content is D.

Rank with ts_rank or ts_rank_cd

ts_rank produces a relevance score per row. ts_rank_cd (cover density) also accounts for term proximity.

SELECT id, title,
       ts_rank_cd(search, query, 32) AS rank
FROM articles, websearch_to_tsquery('english', $1) query
WHERE search @@ query
ORDER BY rank DESC
LIMIT 50;

The third argument is a normalization bitmask. 32 divides by 1 + log(unique words in document), which keeps long documents from dominating. Ranking is the part most teams outgrow first; if “best result on top” is the product, switch engines (see below).

pg_trgm decomposes strings into three-character chunks and indexes them. It supports similarity, fast ILIKE '%foo%' queries, and edit-distance-style ranking.

CREATE EXTENSION pg_trgm;
CREATE INDEX users_name_trgm ON users USING GIN (name gin_trgm_ops);
 
SELECT id, name, similarity(name, $1) AS score
FROM users
WHERE name % $1
ORDER BY score DESC
LIMIT 10;

Use pg_trgm for autocomplete, typo-tolerant name search, and SKU lookups. It does not stem; pair it with tsvector when both fuzzy and stemmed match matter.

Configure dictionaries for language-aware stemming

The bundled english dictionary handles stop words and stemming. Other languages ship with their own configs; check \dF in psql. For domain-specific vocabulary, build a custom configuration.

CREATE TEXT SEARCH DICTIONARY english_stem (TEMPLATE = snowball, language = 'english');
CREATE TEXT SEARCH CONFIGURATION app_search (COPY = pg_catalog.english);
ALTER TEXT SEARCH CONFIGURATION app_search
  ALTER MAPPING FOR asciiword, word WITH english_stem;

Run SELECT to_tsvector('app_search', 'running quickly') to confirm the tokenization. A search config bug shows up as zero results; debug by running ts_debug on a known document.

Switch to a dedicated engine when search is the product

Postgres full-text is right when search is a feature inside a transactional system. Switch engines when search becomes the product.

  • Move to Elastic, OpenSearch, Meilisearch, or Typesense when you need: synonyms, language-specific analyzers per field, BM25 ranking, faceting beyond a few columns, geo-relevance, or sub-50 ms latency at hundreds of millions of documents.
  • Move to a vector store (see rag and pgvector) when the relevance signal is semantic, not lexical.
  • A hybrid pattern works: keep Postgres as the source of truth, stream changes via logical replication or CDC into the search engine, query the search engine for ranking and the database for hydration.

Validate the planner picks the GIN index with postgres-explain before assuming Postgres is the bottleneck; most “slow search” tickets are missing indexes.