Postgres in Production

Overview

Run Postgres on a managed provider until you have a concrete reason not to. The provider handles backups, patching, replication, and pager duty; you handle schema and queries. This page covers production decisions downstream of postgres: hosting, backups, PITR, failover, pooling, replicas, secrets, and observability.

Default to a managed provider

Pick from Neon, Supabase, AWS RDS or Aurora, Google Cloud SQL, or Crunchy Bridge. Each ships automated backups, minor-version patching, monitoring, and a one-click read replica.

Neon: branchable Postgres with separated storage and compute. Good for preview environments.
Supabase: Postgres plus auth, storage, edge functions. Good when the app needs more than the database.
RDS or Cloud SQL: boring, durable, expensive at the high end. Pick when you already live in AWS or GCP.

Self-host only when the managed bill is provably worse at scale, residency laws pin you to a region no provider serves, or you need an extension the provider blocks. See hostinger-vps for the hardening baseline.

Backups are useless until you have restored one

A backup you have not restored is a hope. Run a full restore drill quarterly and after any backup config change.

# Nightly logical dump, parallel, custom format
pg_dump --jobs=4 --format=directory --file=/backups/app-$(date +%F) app_prod
 
# Restore drill: into a scratch instance, end-to-end timed
pg_restore --jobs=4 --dbname=app_restore /backups/app-2026-05-14

Push dumps off-box with restic or the provider’s snapshot tooling; see hostinger-vps. Keep seven daily, four weekly, and twelve monthly snapshots. The runbook records wall-clock time to first query after restore.

Layer WAL archiving for point-in-time recovery

Logical dumps give you a daily RPO. Pair them with continuous WAL archiving when you need minutes.

Managed providers ship PITR by default. Confirm the retention window (Neon: 7 to 30 days; RDS: up to 35) and the latest restorable timestamp.
Self-hosted: configure archive_command to ship WAL segments to S3. pgBackRest or wal-g wrap this with retention, encryption, and parallel restore.

RPO is the WAL archive interval, typically under a minute. RTO is dominated by replay since the last base backup; size that against your incident budget.

Drill the failover before the incident

A failover playbook exists when someone has timed it end to end. Run the drill quarterly.

Managed: trigger the provider’s failover button. Time endpoint propagation, pool reconvergence, app reconnect.
Self-hosted: promote a replica with pg_ctl promote or Patroni; update the connection string at the pool.

Write the steps, the expected duration, and the rollback path. The playbook lives next to the runbook.

Pool connections, do not multiply them

Postgres is one process per connection. A hundred app workers each holding ten connections starves the server. Use PgBouncer in transaction mode for any service with more than a few workers; Supabase and Neon ship their own pooler. See pooling rules in postgres, Prisma flags in prisma, and the rollout discipline in migrations.

Read replicas are for read scale, not safety

A replica that lags by ten seconds is not a backup. Use replicas to offload read-heavy queries (reporting, search indexing) and to keep a failover candidate warm. Do not point the nightly backup job at a replica; back up from primary or a dedicated backup-only node. See postgres-replication for the streaming-vs-logical trade-offs.

Rotate secrets out of plain env vars

Never store the production database URL in a .env committed to history. Use a secret manager (AWS Secrets Manager, GCP Secret Manager, Doppler, 1Password) and rotate on a schedule. Prefer short-lived IAM credentials where supported (RDS IAM auth, Cloud SQL IAM). fastapi and other clients read the URL from the store at startup, not from disk.

Wire observability before you need it

Three signals catch most production pain.

pg_stat_statements: enable in shared_preload_libraries; review top time-consumers weekly. Feed it into postgres-explain for the plan-level read, and into postgres-indexes for the index decisions.
Slow query log: log statements over 250 ms. Ship to the same log pipeline as the app.
Autovacuum: alert on n_dead_tup ratios and last_autovacuum age for hot tables. Bloat is the slowest, most preventable production decay. See postgres-vacuum for the per-table tuning.

Pair with host metrics from hostinger-vps (CPU, memory, disk, IO wait) and app-side traces and SLOs from observability. Alert on disk above 80 percent and replication lag above the RPO target.

LLM Best Practices

Explorer

Overview

Default to a managed provider

Backups are useless until you have restored one

Layer WAL archiving for point-in-time recovery

Drill the failover before the incident

Pool connections, do not multiply them

Read replicas are for read scale, not safety

Rotate secrets out of plain env vars

Wire observability before you need it

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Postgres in Production

Overview

Default to a managed provider

Backups are useless until you have restored one

Layer WAL archiving for point-in-time recovery

Drill the failover before the incident

Pool connections, do not multiply them

Read replicas are for read scale, not safety

Rotate secrets out of plain env vars

Wire observability before you need it

Related

Graph View

Table of Contents

Backlinks