Overview
Postgres ships two replication models. Streaming replication copies the write-ahead log to a binary replica; logical replication decodes the log into row-level changes for selected tables. Each solves a different problem. This page covers the choice, the lag rules, the safety boundary, and the sync mode tradeoff. The umbrella rules live in postgres and the production playbook in postgres-prod.
Use streaming replication for read scale and warm standbys
Streaming replication is the default. The primary ships WAL records to one or more replicas; each replica applies them in order. The replica is byte-for-byte identical and stays within seconds of the primary.
# primary postgresql.conf
wal_level = replica
max_wal_senders = 10
wal_keep_size = 2GBSet up a replica with pg_basebackup --pgdata=/var/lib/postgresql/data --write-recovery-conf --slot=replica1. Use a replication slot so the primary retains the WAL the replica needs; without a slot, a disconnected replica can fall off the WAL retention window and require a re-clone.
Reach for streaming replication when you need a read replica, a failover candidate, or a base for backups. It carries every change including DDL.
Use logical replication for selective copies
Logical replication uses publications and subscriptions. The primary publishes a set of tables; the subscriber consumes the changes through the logical decoding plugin. It crosses major versions, sends only the rows you care about, and supports writeable subscribers.
-- primary
CREATE PUBLICATION analytics FOR TABLE orders, order_items;
-- subscriber
CREATE SUBSCRIPTION analytics_sub
CONNECTION 'host=primary dbname=app user=replica password=...'
PUBLICATION analytics;Use logical replication for zero-downtime major-version upgrades, for shipping a subset of tables to a reporting warehouse, and for sharding migrations. Limits: DDL does not replicate, sequences do not replicate, and large transactions buffer to disk on the subscriber.
Treat replicas as read scale, not as backups
A replica that lags by ten seconds is not a backup. A DROP TABLE on the primary replays on the replica within seconds; a corrupted page replicates as corruption.
- Run logical dumps and WAL archives for recovery; see postgres-prod for the PITR setup.
- Use replicas for read offload (reporting, search indexing, analytics) and as failover candidates.
- Promote a replica to recover from a primary outage, not to recover from operator error.
Monitor replica lag in seconds and bytes
Replica lag is the gap between what the primary has written and what the replica has applied. Two metrics matter.
-- on the primary
SELECT application_name,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
write_lag, flush_lag, replay_lag
FROM pg_stat_replication;Alert when replay_lag exceeds the recovery point objective for read replicas, or 100 MB of lag_bytes for failover candidates. Wire the metric into observability so the on-call dashboard surfaces it before the first stale-read incident.
Pick async by default, sync for critical writes
Async is the default; the primary commits without waiting. Sync waits for at least one replica to acknowledge before commit.
- Async: low write latency, possible data loss on primary failure equal to the lag.
- Sync: zero data loss on failover, write latency includes the round-trip to the replica.
Set sync per session for the few writes that need it, not globally; a global synchronous_commit = on blocks every commit when the sync replica disconnects.
SET LOCAL synchronous_commit = remote_apply;
INSERT INTO payments (...) VALUES (...);remote_write is fast; remote_apply is strongest. Tie the choice to the durability budget per write.
Cascade replicas when one primary fans out far
A primary that streams to a dozen replicas burns network and CPU. Cascade through one tier of intermediate replicas; the intermediate restreams WAL to its downstream consumers. Cascading adds one apply latency per hop and offloads bandwidth from the primary. Most managed providers expose this as a configuration option; self-hosted setups use primary_conninfo on the downstream replicas.
Tune hot_standby_feedback against vacuum
A read replica with long-running queries holds back vacuum on the primary; the rows it reads may still be visible to it. hot_standby_feedback = on delays vacuum until the replica is done and shows up as bloat. Leave it off and tolerate query cancellation on the replica, or turn it on and accept the bloat risk; pair with postgres-vacuum. Long analytical queries belong on a dedicated replica with feedback on; OLTP read replicas should keep it off.
Drill the failover before the incident
Replication is half the story. The failover playbook lives in postgres-prod: promote with pg_ctl promote or the provider’s button, repoint the connection pool, then re-clone the old primary as a new replica. Run the drill quarterly. A replication topology that has never been failed over is theoretical.