Overview

This page is the atomic definition. Citus and horizontal scaling options for PostgreSQL live at postgres.

Definition

Sharding (horizontal partitioning) splits a database table across multiple independent database servers, called shards, using a partition-key to determine which shard holds each row. Unlike vertical scaling (bigger server) or read replicas (copies of the full dataset), sharding distributes both data and write load. Each shard is a full, independent database instance; no shard has access to another shard’s data. The shard key selection is critical: a poor choice creates hot shards (uneven distribution) while a good choice spreads load evenly. Queries that touch rows on a single shard execute as fast as a single-node query. Queries that span multiple shards require a scatter-gather operation that the application or a sharding proxy must coordinate. Cross-shard joins and transactions are expensive and often avoided by denormalizing data. PostgreSQL does not shard natively; Citus (now a PostgreSQL extension) is the most common solution. Sharding introduces significant operational complexity and should be used only after exhausting single-node vertical scaling, read replicas, and caching.

When it applies

Consider sharding when: a single PostgreSQL instance cannot handle write throughput even with optimized hardware; table sizes exceed terabytes and vacuum/index maintenance become impractical; or multi-region data residency requirements mandate geographic data separation. Most applications never need sharding.

Example

A multi-tenant SaaS shards by tenant_id. Each of 10 shards holds data for ~10% of tenants. A query filtered by tenant_id routes to one shard. A cross-tenant analytics query must fan out to all 10 shards and aggregate results.

  • partition-key - the column that determines shard assignment; must distribute load evenly.
  • eventual-consistency - cross-shard consistency often requires relaxing to eventual consistency.
  • replication-lag - each shard typically has its own replica set with independent lag.
  • connection-pool - each shard needs its own connection pool; total connections multiply by shard count.
  • postgres - Citus extension for PostgreSQL sharding.

Citing this term

See Sharding (llmbestpractices.com/glossary/sharding).