Overview
Vacuum reclaims space and refreshes statistics. Postgres uses multi-version concurrency control; every UPDATE and DELETE leaves a dead tuple that vacuum clears. Autovacuum runs the work on a schedule; the defaults are conservative and wrong for hot tables. This page covers tuning, bloat detection, online cleanup, and transaction ID wraparound. The umbrella rules live in postgres and the production playbook in postgres-prod.
Tune autovacuum per table on hot writes
Global defaults trigger autovacuum when twenty percent of a table is dead. On a one-billion-row events table that means autovacuum runs every two hundred million updates. Set per-table thresholds for high-churn tables.
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_limit = 2000
);Scale factor is the fraction of the table; threshold is the absolute floor. Cost limit controls how much work autovacuum does per round before it sleeps; raise it on workers with idle IO. Confirm the table is being vacuumed often enough by querying pg_stat_user_tables.last_autovacuum.
Detect bloat before it bites
Bloat is the gap between live tuples and the on-disk size of the table or index. Bloated indexes plan worse; bloated tables read slower. Two signals catch most of it.
SELECT relname, n_live_tup, n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup, 0), 3) AS dead_ratio,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_ratio DESC NULLS LAST;Treat a dead ratio above 0.2 on a hot table as an alert. Use the pgstattuple extension for accurate index bloat numbers. Tie the metric into observability so the on-call surface is dashboards, not ad-hoc queries.
Treat VACUUM FULL as a last resort
VACUUM FULL rewrites the table into a new file, then swaps. It reclaims every byte; it also takes an ACCESS EXCLUSIVE lock for the duration. On a 200 GB table that is an hours-long outage.
- Do not run
VACUUM FULLin production unless the database is offline. - Routine bloat does not need a rewrite; tuning autovacuum and running regular
VACUUM (no FULL)is enough. - A truncate is faster when the rows are not needed.
TRUNCATEresets the heap and indexes in one transaction.
Use pg_repack for online cleanup
pg_repack rewrites a bloated table or index online. It writes to a shadow table, captures concurrent changes through triggers, then swaps under a brief lock. Install the extension on the primary; the binary runs from a client.
pg_repack -h db.internal -U postgres -d app_prod -t events
pg_repack -h db.internal -U postgres -d app_prod --only-indexes -t eventspg_repack needs free disk equal to the table size during the rewrite. Schedule it during low traffic and watch replication lag; see postgres-replication for the lag rule. pg_squeeze is an alternative that runs as a background worker with no client process.
Prevent transaction ID wraparound
Postgres uses a 32-bit transaction ID counter; once it wraps, every row becomes invisible. Autovacuum runs an emergency anti-wraparound vacuum at autovacuum_freeze_max_age (default 200 million). Wraparound has caused multi-hour outages at scale.
SELECT datname,
age(datfrozenxid) AS xid_age,
2^31 - age(datfrozenxid) AS xids_remaining
FROM pg_database
ORDER BY age(datfrozenxid) DESC;Alert at 800 million; the database refuses writes at 2.1 billion. Lower autovacuum_freeze_max_age to 100 million on systems that approach the limit; raise autovacuum_cost_limit so anti-wraparound completes before backlog accumulates. The same advice covers multixact wraparound; monitor both.
Raise cost limits on dedicated workers
Autovacuum throttles itself with a cost budget. The default autovacuum_vacuum_cost_limit = 200 is set for shared hosts; modern instances with dedicated IO can run much higher.
ALTER SYSTEM SET autovacuum_max_workers = 6;
ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 2000;
ALTER SYSTEM SET autovacuum_naptime = '15s';
SELECT pg_reload_conf();Verify with pg_stat_progress_vacuum that vacuums finish in minutes, not hours. A vacuum that runs longer than its own table’s churn rate falls behind permanently.
Vacuum after large backfills
Autovacuum does not see a bulk load until it crosses the threshold. After a backfill or a large delete, run VACUUM ANALYZE by hand.
VACUUM (ANALYZE, VERBOSE) events;ANALYZE refreshes the statistics the planner reads; stale statistics produce the bad plans you debug in postgres-explain. After deleting one percent of rows from a huge table, mark the indexes for cleanup with VACUUM (INDEX_CLEANUP ON).
Partition cold history to keep vacuum cheap
Vacuum cost is proportional to table size, not row churn. A one-terabyte append-only events table vacuums slowly even when only the last day changes.
- Range-partition by month. Old partitions are static; autovacuum visits them rarely.
- Drop the oldest partition with
DROP TABLE; it is instant and reclaims space without a vacuum. - See postgres-partitioning for the partition layout.