Overview

Read the plan before you tune. EXPLAIN ANALYZE runs the query and prints the chosen plan, with actual rows, time, and (with BUFFERS) cache and disk counts. This card is the lookup for what each node type means and when each is the right choice. The conceptual rules live in postgres-explain.

Run plans with EXPLAIN (ANALYZE, BUFFERS, VERBOSE) .... Always use ANALYZE to see real numbers; always use BUFFERS to see whether the heap is in cache.

Scan nodes

The planner picks one scan per table per query. The right one depends on selectivity, ordering, and index shape.

NodeWhen the planner picks it
Seq ScanNo usable index, or returning a large fraction (roughly >5 to 20%) of rows.
Index ScanSelective predicate on an indexed column; row order matters.
Index Only ScanAll requested columns covered by the index; visibility map allows it.
Bitmap Heap Scan + Bitmap Index ScanMedium selectivity; gather many index hits then sort heap accesses.
Tid ScanDirect ctid lookup; rare in app code.
Function ScanSELECT ... FROM jsonb_to_recordset(...) and similar.
Values ScanInline VALUES (...) lists.
Subquery ScanWraps a subselect that needs renaming.

A Seq Scan on a 10-million-row table is usually wrong unless the query asks for most of the rows. Add an index, or rewrite the predicate to be sargable.

Join nodes

Three join algorithms; each shines in a different shape.

NodePicked whenNotes
Nested LoopOne side is tiny, or both sides are tiny.Best with an indexed inner side.
Hash JoinInner side fits in work_mem.Builds a hash table once; scans outer once.
Merge JoinBoth inputs already sorted on the join key.Cheap after Index Scan returns sorted rows.

A Nested Loop over a million-row outer with a Seq Scan inner is the classic foot-gun. The plan usually wants a Hash Join.

Sort, group, aggregate

These nodes are where work_mem matters most.

NodeEffect
SortSort rows in memory or with on-disk merge. Watch for Sort Method: external merge.
Incremental SortSort within already-sorted prefixes; cheap when partial order is present.
Hash AggregateHashed grouping; work_mem-bounded.
Group AggregateStreaming aggregate over presorted input.
WindowAggWindow functions; usually wraps a Sort.
MaterializeCache an inner side of a Nested Loop for reuse.
MemoizePer-key cache of inner results (Postgres 14+).
Gather / Gather MergeWorkers used; parallel plan.

Sort Method: external merge Disk: 64MB means work_mem was too small for the sort; bumping it (per-session, not globally) often eliminates the spill.

Cost and timing fields

Every node prints two numbers and, with ANALYZE, two more.

FieldMeaning
cost=0.42..1.85Estimated startup cost to first row, then total cost. Arbitrary units; only relative comparison matters.
rows=42Estimated row count out of this node.
width=24Estimated bytes per row.
actual time=0.012..1.230Wall-clock ms to first row, then last row.
actual rows=128 loops=12Real row count and how many times the node executed.
Buffers: shared hit=12 read=3Cached pages vs disk pages, in 8KB blocks.

If actual rows differs from estimated rows by more than 10x at any node, statistics are stale or the column needs a different distribution model. Run ANALYZE <table> or raise default_statistics_target.

What to worry about

These are the patterns that warrant action.

SymptomLikely causeFix
Seq Scan on a large table for a selective predicateMissing or unused indexAdd an index; check column collation and operator class.
Row estimate off by 10x+Stale statistics or skewed dataANALYZE table; consider CREATE STATISTICS.
Sort Method: external merge Disk: Nwork_mem too small for sortRaise work_mem per session; or sort fewer rows.
loops=N with N large under Nested LoopOuter side too bigForce a Hash Join (drop the index hint), or filter the outer earlier.
Buffers: read= large vs hit= smallCold cache or huge result setWarm the cache, narrow the query, or shrink the index.
Filter: removes most rows after a scanPredicate not sargableRewrite to allow index use (avoid functions on indexed columns).

Common gotchas

  • EXPLAIN without ANALYZE shows estimates only. The estimate can lie; always confirm with ANALYZE.
  • EXPLAIN ANALYZE runs the query, including writes. Wrap data-modifying statements in BEGIN; ... ROLLBACK;.
  • Timing overhead from track_io_timing is real on systems with cheap clocks; expect 5 to 20% slowdown when measuring.
  • A plan that looks bad on a cold cache may be fine on a warm one. Read Buffers before judging.
  • The planner uses pg_statistic. ANALYZE after bulk loads or autovacuum will not run soon enough.