Disaster Recovery: Best Practices

Overview

An untested backup is not a backup. Disaster recovery requires knowing what to protect, automating the capture, storing it outside the blast radius, and running a full restore drill before the real disaster strikes.

Define RPO and RTO before choosing tools

Set these numbers first; they determine every tool and schedule decision downstream.

Recovery Point Objective (RPO): the maximum acceptable data loss, measured as time. An RPO of one hour means losing up to one hour of writes is acceptable. An RPO of five minutes means backups or WAL streaming must run continuously.

Recovery Time Objective (RTO): the maximum acceptable downtime, measured as time from incident to restored service. An RTO of four hours gives the team time to restore from a daily pg_dump. An RTO of fifteen minutes requires a warm standby or fast PITR restore.

A mismatch between RPO/RTO and the actual backup strategy is the most common failure mode. Write both numbers in a runbook and verify them on each quarterly drill.

Back up databases with Supabase PITR and pg_dump

Use two complementary strategies: continuous WAL-based PITR and periodic logical backups.

Supabase PITR streams WAL segments to object storage, allowing restore to any second within the retention window. Enable it under Dashboard > Settings > Add-ons. The Pro plan default retention is 7 days, purchasable up to 28 days in 7-day increments at $100/month each; the Team and Enterprise plans support the same add-on. To restore in place, go to Dashboard > Project > Settings > Database > Backups > Point-in-Time, select the target timestamp, and confirm. The project is inaccessible during the restore; plan for downtime proportional to database size. Supabase also offers a “Restore to a New Project” option that creates a separate copy without downtime on the source project. (Supabase PITR docs)

pg_dump for logical backups complements PITR by producing portable snapshots you can move to a separate account. Run a nightly dump and compress the output:

pg_dump "$DATABASE_URL" \
  --format=custom \
  --compress=9 \
  --file="backup-$(date +%Y%m%d).dump"

Retain daily dumps for 30 days, weekly dumps for 90 days. Store them in a bucket under a separate cloud account; see hostinger-vps-backups for the restic-to-B2 pattern that applies equally here.

For self-hosted Postgres, pg_basebackup captures a physical binary snapshot for faster large-database restores alongside logical pg_dump exports. See postgres-prod and postgres for connection-string patterns and restore commands.

Back up secrets and configuration

A restored database is useless without the environment configuration that connects your application to it. Store all secrets in a secrets manager, not in Git. See secrets-and-env for the full pattern. Before a disaster, confirm that:

Every secret has a named owner and a documented rotation procedure.
Infrastructure-as-code (Terraform, Pulumi) or a config snapshot can recreate the deployment environment without manual steps.
API keys and OAuth credentials have backup or rotation paths that do not depend on the primary account being intact.

Store backups in a separate account and region

A single account compromise or provider outage takes backups with it if they live in the same account. Offsite storage means:

A separate cloud account (different login, separate billing) for backup buckets.
A separate geographic region from the primary deployment.
Immutable or versioned bucket policies that prevent deletion by a compromised key.

Apply the same principle to Supabase PITR: the PITR archive is managed by Supabase’s infrastructure and is logically isolated, but your logical pg_dump copies should live in a bucket you control in a separate account.

Automate and alert on backup jobs

Backups that fail silently are the same as no backups. Wrap each job in monitoring:

# Cron example: nightly dump at 02:00 UTC
0 2 * * * /usr/local/bin/run-backup.sh || curl -s "$HEALTHCHECK_PING_URL/fail"

Use a dead-man’s-switch service (Healthchecks.io or similar) that pages when the job does not check in. Log the output and the exit code on each run.

Run a restore drill on a schedule

Run a full restore drill at least once per quarter. A drill means:

Pick a backup from the previous 24 hours.
Restore it to a staging environment, not the production one.
Run the application smoke-test suite against the restored environment.
Measure the actual time from “incident declared” to “service verified.” Record the measured RTO and RPO.
Fix any gap between the measured result and the target numbers before the next drill.

Document the drill outcome in the incident log. A successful drill also validates that secrets and config restore paths work end-to-end; a database restore without the application config is an incomplete test.

hostinger-vps-backups - restic backup setup, retention policy, and restore drill for VPS workloads
incident-response - escalation, communication, and postmortem workflow
postgres - pg_dump, pg_restore, and connection-string patterns
postgres-prod - managed vs self-hosted Postgres, PITR, and failover playbook
secrets-and-env - secret storage, rotation, and environment configuration
supabase - Supabase project setup, RLS, and platform features

LLM Best Practices

Explorer

Disaster Recovery: Best Practices

Overview

Define RPO and RTO before choosing tools

Back up databases with Supabase PITR and pg_dump

Back up secrets and configuration

Store backups in a separate account and region

Automate and alert on backup jobs

Run a restore drill on a schedule

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Disaster Recovery: Best Practices

Overview

Define RPO and RTO before choosing tools

Back up databases with Supabase PITR and pg_dump

Back up secrets and configuration

Store backups in a separate account and region

Automate and alert on backup jobs

Run a restore drill on a schedule

Related

Graph View

Table of Contents

Backlinks