Postgres: EXPLAIN and Query Plans

Overview

Read the plan before you tune. Postgres 17 exposes the planner’s decisions through EXPLAIN; the planner picks indexes, scan types, and join orders from statistics, so reading the plan is the only honest answer to “why is this slow?“. This page covers the difference between EXPLAIN and EXPLAIN ANALYZE, how to read a plan, when each scan type is right, and how pg_stat_statements feeds the loop. The umbrella rules live in postgres.

Run EXPLAIN ANALYZE with BUFFERS

EXPLAIN returns the planner’s estimate without running the query. EXPLAIN ANALYZE runs the query and reports actual rows, actual time, and loop counts. BUFFERS adds the cache hit and disk read counts.

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT id, total_cents FROM orders
WHERE user_id = $1 AND created_at > now() - interval '30 days'
ORDER BY created_at DESC LIMIT 50;

Use plain EXPLAIN when the query mutates data. For everything else, ANALYZE plus BUFFERS is the default. Postgres 17 also reports WAL bytes per node when you add WAL to the option list.

Read the plan top-down, execute bottom-up

The output is a tree. The root is what the client receives; the leaves are scans on tables. The planner runs the leaves first, so debugging starts at the deepest node.

Indentation marks parent-child. A nested loop with two children executes the left child once and the right child once per outer row.
actual rows=... is the row count the node produced; multiply by loops=... for the total.
Buffers: shared hit=N read=M distinguishes warm cache from cold reads. A large read means the heap or index page was not in the buffer cache.

Start at the node with the highest actual time. Fix that. Re-run.

Compare estimated rows to actual rows

The planner picks a plan from row estimates. When estimates and reality diverge by more than ten times, the plan is suspect.

Index Scan using orders_user_id_idx on orders
  (cost=0.43..2.50 rows=1 width=20) (actual time=0.04..18.23 rows=4200 loops=1)

That node estimated one row and produced 4,200. The parent join chose a nested loop on that estimate; the real shape wanted a hash join. Run ANALYZE orders to refresh statistics, or ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 1000 for skewed columns. See postgres-vacuum for the autovacuum analyze schedule.

Know when each scan type is right

Three scan types cover most plans.

Sequential scan. Reads every page. Right when the table is small, when the query returns most rows, or when no useful index exists. Wrong when the table is large and the predicate is selective.
Index scan. Walks the index, then fetches the heap. Right for selective predicates. Wrong when the index returns most rows; the random heap reads cost more than a seq scan.
Index-only scan. Walks the index and skips the heap when every projected column lives in the index (or the page is all-visible). Right for hot read paths covered by INCLUDE; see postgres-indexes.

Bitmap heap scan sits between the two. The planner builds a bitmap of matching pages from one or more indexes, then visits each page once. It wins when an index is moderately selective and the rows are scattered.

Read costs as ratios, not absolutes

The cost=A..B numbers are arbitrary planner units. They are useful as ratios: a node with cost=1000 is roughly ten times the work the planner expects for a node with cost=100. They are not seconds.

actual time=A..B reports milliseconds: A is time to first row, B is time to last row.
A node with low cost and high actual time is the row estimate problem. The planner thought it was cheap.
Planning Time and Execution Time print at the bottom. Planning under a millisecond is normal; over ten suggests prepared statements would help.

Use BUFFERS to detect cold reads

BUFFERS reports cache hits and disk reads per node. A query that runs in 5 ms warm and 500 ms cold is an IO problem, not a plan problem. Run the query twice and compare; if read drops to zero on the second run, the working set does not fit in shared_buffers. Either raise shared_buffers, add a covering index that shrinks the working set, or partition the cold history; see postgres-partitioning.

Feed pg_stat_statements into the loop

pg_stat_statements records the normalized text, call count, total time, and rows for every query since the last reset. Enable it once in shared_preload_libraries.

SELECT calls, mean_exec_time, total_exec_time, query
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 20;

Top queries by total_exec_time are the right tuning targets; mean time hides queries that are cheap but run a million times an hour. Pair with observability and the slow-query log to attribute pain back to routes.

LLM Best Practices

Explorer

Postgres: EXPLAIN and Query Plans

Overview

Run EXPLAIN ANALYZE with BUFFERS

Read the plan top-down, execute bottom-up

Compare estimated rows to actual rows

Know when each scan type is right

Read costs as ratios, not absolutes

Use BUFFERS to detect cold reads

Feed pg_stat_statements into the loop

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Postgres: EXPLAIN and Query Plans

Overview

Run EXPLAIN ANALYZE with BUFFERS

Read the plan top-down, execute bottom-up

Compare estimated rows to actual rows

Know when each scan type is right

Read costs as ratios, not absolutes

Use BUFFERS to detect cold reads

Feed pg_stat_statements into the loop

Related

Graph View

Table of Contents

Backlinks