Database Replication — Primary-Replica, Failover & Read Scaling
A single database server is a single point of failure. One hardware fault, one crashed process, and your application is down. Replication copies your data to multiple servers so that if one fails, another takes over — and while everything is running, it lets you distribute the read load across multiple machines.
Why Replication?
Two goals, one mechanism:
- High availability: if the primary database fails, a replica can take over (failover) and keep the application running
- Read scaling: direct read queries to replicas, freeing the primary for writes
Primary-Replica Architecture
The primary (historically called master) accepts all writes. The replicas (historically called slaves) receive a stream of changes from the primary and apply them, maintaining a copy of the data.
How it works:
- A write arrives at the primary
- Primary writes to its write-ahead log (WAL in PostgreSQL) or binary log (binlog in MySQL)
- The log is streamed to each replica
- Replicas apply the log entries in order, maintaining a consistent copy
Synchronous Replication
The primary waits for at least one replica to confirm the write before acknowledging success to the client.
Pros:
- Zero data loss: if primary crashes after acknowledging, the replica already has the data
- Strong consistency: replicas are never behind
Cons:
- Higher write latency: every write waits for the network round-trip to at least one replica
- Availability risk: if the replica is slow or unreachable, writes block
PostgreSQL Synchronous Replication
Asynchronous Replication
The primary writes to its log and immediately acknowledges success. It streams the log to replicas in the background without waiting for confirmation.
Pros:
- Low write latency: no waiting for network round-trips
- Replica failures don't affect write availability
Cons:
- Data loss on primary failure: writes acknowledged but not yet on replica are lost
- Replication lag: replicas may be seconds or more behind
When Async is Acceptable
For most web applications, asynchronous replication is the right default. Losing a few seconds of data is an acceptable trade-off for write performance. Use synchronous replication for financial transactions or any data where loss is unacceptable.
Semi-Synchronous Replication
A middle ground: the primary waits for at least one replica to receive (but not necessarily apply) the data.
If no replica responds within the timeout, it falls back to asynchronous mode automatically. This prevents data loss in the most common failure scenario (primary dies before replicating) while tolerating replica failures gracefully.
Replication Lag
Replication lag is the delay between when a write is committed on the primary and when it appears on a replica. It's measured in seconds (ideally milliseconds).
How to Measure
Consequences of Lag
Stale reads: a user updates their profile and immediately reads it — the read hits a lagged replica and returns old data. This violates read-your-writes consistency.
Solutions:
- Route reads-after-writes to the primary
- Use session-level consistency: after a write, track a "read timestamp" and only use replicas that have caught up
- Add a brief sleep/retry before reading back
Failover: Promoting a Replica
When the primary fails, one replica must be promoted to become the new primary.
Manual Failover (PostgreSQL)
Automatic Failover
Tools like Patroni (PostgreSQL), MHA (MySQL), or cloud-managed databases handle this automatically:
- Health check detects primary is unreachable
- Consensus algorithm (Raft, etcd) elects the most up-to-date replica
- Elected replica is promoted
- Other replicas repoint to the new primary
- Application connection pool is notified (via DNS update or proxy layer)
Failover Challenges
- Split-brain: both old primary and new primary accept writes simultaneously → data divergence. Fencing mechanisms (STONITH — Shoot The Other Node In The Head) prevent this.
- Data loss window: with async replication, some writes on the old primary may not have reached the promoted replica.
Multi-Primary (Multi-Master) Replication
Every node can accept writes. Changes are replicated to all other nodes.
Advantage: write availability even if one region goes down. No single point of failure.
Problem: write conflicts. Two clients update the same row on different nodes simultaneously. Conflict resolution strategies:
- Last-write-wins (LWW): the write with the later timestamp wins. Can silently drop data.
- Application-level resolution: conflicts are flagged and the application resolves them.
- CRDTs (Conflict-free Replicated Data Types): data structures designed to merge without conflicts (e.g., counters, sets).
Multi-primary is complex to operate correctly. Use it only when geography-distributed writes are required.
Sharding vs Replication
These solve different problems and are often used together:
| Concept | What it does | Problem solved |
|---|---|---|
| Replication | Copies data to multiple nodes | High availability, read scaling |
| Sharding | Splits data across multiple nodes | Write scaling, storage scaling |
A production system often does both: data is sharded across N nodes, and each shard has its own primary-replica set.
Consistency Models
| Model | Guarantee | Common Use |
|---|---|---|
| Strong consistency | Every read returns the most recent write | Financial systems, coordination |
| Eventual consistency | All replicas converge eventually | Social media likes, DNS |
| Read-your-writes | A client always sees its own writes | User profile updates |
| Monotonic reads | A client never reads older data after reading newer data | Feed readers |
| Causal consistency | Causally related operations appear in order | Messaging |
Cloud Read Replicas
AWS RDS Read Replicas
- Up to 5 read replicas per RDS instance
- Cross-region read replicas for geographic read distribution
- Read replicas can be promoted to standalone databases in seconds
Amazon Aurora
Aurora uses a shared distributed storage layer. All instances (up to 15 read replicas) share the same storage — replicas are near-instantaneous because they read from the same storage volume, no data copying required.
Google Cloud SQL
Python: Routing Writes and Reads
Checking Replication Lag in Python
Summary
- Replication copies data to multiple servers for high availability and read scaling
- Synchronous: zero data loss, higher latency — use for financial-critical writes
- Asynchronous: low latency, possible data loss — the default for most systems
- Replication lag causes stale reads — measure it, and route reads-after-writes to the primary
- Failover promotes a replica to primary — automate it with Patroni, Orchestrator, or use a managed cloud database
- Multi-primary enables geographic writes but introduces complex conflict resolution
- Sharding solves write/storage scale; replication solves availability and read scale — use both in production
- Cloud databases (RDS, Aurora, Cloud SQL) manage replication and failover for you at the cost of less control