Overview
Read the plan before you tune. EXPLAIN ANALYZE runs the query and prints the chosen plan, with actual rows, time, and (with BUFFERS) cache and disk counts. This card is the lookup for what each node type means and when each is the right choice. The conceptual rules live in postgres-explain.
Run plans with EXPLAIN (ANALYZE, BUFFERS, VERBOSE) .... Always use ANALYZE to see real numbers; always use BUFFERS to see whether the heap is in cache.
Scan nodes
The planner picks one scan per table per query. The right one depends on selectivity, ordering, and index shape.
| Node | When the planner picks it |
|---|---|
Seq Scan | No usable index, or returning a large fraction (roughly >5 to 20%) of rows. |
Index Scan | Selective predicate on an indexed column; row order matters. |
Index Only Scan | All requested columns covered by the index; visibility map allows it. |
Bitmap Heap Scan + Bitmap Index Scan | Medium selectivity; gather many index hits then sort heap accesses. |
Tid Scan | Direct ctid lookup; rare in app code. |
Function Scan | SELECT ... FROM jsonb_to_recordset(...) and similar. |
Values Scan | Inline VALUES (...) lists. |
Subquery Scan | Wraps a subselect that needs renaming. |
A Seq Scan on a 10-million-row table is usually wrong unless the query asks for most of the rows. Add an index, or rewrite the predicate to be sargable.
Join nodes
Three join algorithms; each shines in a different shape.
| Node | Picked when | Notes |
|---|---|---|
Nested Loop | One side is tiny, or both sides are tiny. | Best with an indexed inner side. |
Hash Join | Inner side fits in work_mem. | Builds a hash table once; scans outer once. |
Merge Join | Both inputs already sorted on the join key. | Cheap after Index Scan returns sorted rows. |
A Nested Loop over a million-row outer with a Seq Scan inner is the classic foot-gun. The plan usually wants a Hash Join.
Sort, group, aggregate
These nodes are where work_mem matters most.
| Node | Effect |
|---|---|
Sort | Sort rows in memory or with on-disk merge. Watch for Sort Method: external merge. |
Incremental Sort | Sort within already-sorted prefixes; cheap when partial order is present. |
Hash Aggregate | Hashed grouping; work_mem-bounded. |
Group Aggregate | Streaming aggregate over presorted input. |
WindowAgg | Window functions; usually wraps a Sort. |
Materialize | Cache an inner side of a Nested Loop for reuse. |
Memoize | Per-key cache of inner results (Postgres 14+). |
Gather / Gather Merge | Workers used; parallel plan. |
Sort Method: external merge Disk: 64MB means work_mem was too small for the sort; bumping it (per-session, not globally) often eliminates the spill.
Cost and timing fields
Every node prints two numbers and, with ANALYZE, two more.
| Field | Meaning |
|---|---|
cost=0.42..1.85 | Estimated startup cost to first row, then total cost. Arbitrary units; only relative comparison matters. |
rows=42 | Estimated row count out of this node. |
width=24 | Estimated bytes per row. |
actual time=0.012..1.230 | Wall-clock ms to first row, then last row. |
actual rows=128 loops=12 | Real row count and how many times the node executed. |
Buffers: shared hit=12 read=3 | Cached pages vs disk pages, in 8KB blocks. |
If actual rows differs from estimated rows by more than 10x at any node, statistics are stale or the column needs a different distribution model. Run ANALYZE <table> or raise default_statistics_target.
What to worry about
These are the patterns that warrant action.
| Symptom | Likely cause | Fix |
|---|---|---|
Seq Scan on a large table for a selective predicate | Missing or unused index | Add an index; check column collation and operator class. |
| Row estimate off by 10x+ | Stale statistics or skewed data | ANALYZE table; consider CREATE STATISTICS. |
Sort Method: external merge Disk: N | work_mem too small for sort | Raise work_mem per session; or sort fewer rows. |
loops=N with N large under Nested Loop | Outer side too big | Force a Hash Join (drop the index hint), or filter the outer earlier. |
Buffers: read= large vs hit= small | Cold cache or huge result set | Warm the cache, narrow the query, or shrink the index. |
Filter: removes most rows after a scan | Predicate not sargable | Rewrite to allow index use (avoid functions on indexed columns). |
Common gotchas
EXPLAINwithoutANALYZEshows estimates only. The estimate can lie; always confirm withANALYZE.EXPLAIN ANALYZEruns the query, including writes. Wrap data-modifying statements inBEGIN; ... ROLLBACK;.- Timing overhead from
track_io_timingis real on systems with cheap clocks; expect 5 to 20% slowdown when measuring. - A plan that looks bad on a cold cache may be fine on a warm one. Read
Buffersbefore judging. - The planner uses
pg_statistic.ANALYZEafter bulk loads or autovacuum will not run soon enough.