Overview

This page is the atomic definition. PostgreSQL storage internals live at postgres.

Definition

Write amplification measures how many bytes the storage layer actually writes for each logical byte of application data. In PostgreSQL, a single UPDATE to one column can cause: the original page to be marked dead (MVCC), a new tuple version to be written to a heap page, one write per index that covers the updated column (each index entry must be updated), and a WAL record (write-ahead-log) for each modification. If a row has five indexes, a single-byte column update may write dozens of bytes. Write amplification is also a concern for SSDs: flash storage has a limited number of program/erase cycles, and high write amplification accelerates wear. In databases using LSM-tree storage engines (RocksDB, Cassandra), compaction is the main driver of write amplification; background threads repeatedly merge and rewrite sorted runs. PostgreSQL’s heap-based storage has write amplification from MVCC tuple versioning, index maintenance, and WAL. autovacuum and vacuum reclaim dead tuple space but themselves generate I/O.

When it applies

Monitor write amplification when: write throughput is high, SSD wear is a concern, PostgreSQL’s I/O is unexpectedly high relative to application write rate, or autovacuum is running continuously. Reduce by: dropping unused indexes, using partial indexes, tuning fillfactor on hot tables, and applying HOT (Heap Only Tuple) updates when only non-indexed columns change.

Example

A table with 10 indexes receives 10,000 row updates per second. Each update rewrites the heap tuple plus 10 index entries plus WAL. Effective disk writes may be 15-20x the raw data change rate. Dropping 6 unused indexes cuts write amplification to 4-5x.

  • write-ahead-log - every write is logged twice: once to WAL, once to heap or index.
  • btree-index - each additional index multiplies write amplification.
  • autovacuum - reclaims dead tuples but contributes its own I/O.
  • vacuum - manual vacuuming as a control mechanism.
  • hot-cold-storage - separating cold data reduces write amplification on hot tables.

Citing this term

See Write Amplification (llmbestpractices.com/glossary/write-amplification).