Database Replication Topologies and Failover Behavior

LESSON

Consistency and Replication

011 30 min intermediate

Day 282: Database Replication Topologies and Failover Behavior

The core idea: replication creates more copies of the same authority domain. It improves availability, read capacity, and recovery options, but it does not remove the need to choose who is authoritative and what happens when that authority changes hands.


Today's "Aha!" Moment

The insight: Replication is not just "copy the database to another machine." It is a continuing protocol about where writes go, how followers catch up, what lag means, and who is allowed to become primary when failure happens.

Why this matters: Many outages happen not because replication is missing, but because teams assume that "having replicas" automatically means safe failover, fresh reads, and zero ambiguity about leadership.

Concrete anchor: A primary fails during a traffic spike. You have replicas, but one is lagging, another is healthy but isolated, and clients still have stale connection targets. The problem is no longer storage. It is authority transfer under uncertainty.

The practical sentence to remember:
Replication gives you copies. Failover is the logic that decides which copy can safely become the source of truth.


Why This Matters

The problem: A single database node is a single authority point. If it fails, you lose write availability and may lose service entirely. Replication exists to reduce that fragility, but it introduces a new question: which copy is current enough and legitimate enough to serve writes after failure?

Without this model:

With this model:

Operational payoff: Better failover design, better expectations around replica lag, and fewer incidents where a promoted replica turns out not to contain the state everyone assumed it had.


Learning Objectives

By the end of this lesson, you should be able to:

  1. Explain the main replication topologies in terms of authority, lag, and failover behavior.
  2. Describe what failover actually requires beyond just having a replica available.
  3. Reason about read freshness, write availability, and split-brain risk under different designs.

Core Concepts Explained

Concept 1: Replication Topologies Are Really Authority Topologies

Concrete example / mini-scenario: A transactional database has one primary in Madrid and two followers in Frankfurt and Paris. Reads may go to followers, but all writes still flow through the primary.

Intuition: The shape of replication matters because it determines where the authoritative write stream begins and how other nodes consume it.

Common topology families:

  1. Primary-replica

    • One node accepts writes
    • Replicas follow the primary's change stream
    • Operationally simple, very common
  2. Multi-primary / multi-leader

    • More than one node can accept writes
    • Requires conflict handling or scoped ownership
    • Useful in some geo or partitioned scenarios, but much harder
  3. Leaderless / quorum-style

    • Authority is spread across replicas
    • Reads and writes rely on quorum overlap
    • Often shifts more reconciliation complexity to the client or datastore logic

Why this lesson focuses on failover: In practice, many systems start with primary-replica because it gives the clearest operational story. The hardest production questions then become: how fresh are the replicas, and how do we promote one safely?

Connection to later sharding: Replication adds copies of the same data domain. Sharding later divides the authority domain itself across multiple shards.


Concept 2: Replica Lag Changes What Reads Actually Mean

Concrete example / mini-scenario: A user updates their profile, then immediately loads a page that reads from a follower. The old profile appears for a few seconds.

Intuition: Replication is often asynchronous or partially asynchronous, so a follower is not "another primary." It is a copy that is catching up.

What lag changes:

Important distinctions:

Operational reality: Replica lag is not just a metric. It is an application behavior contract.

If reads route to lagging replicas, users may observe stale state. If failover promotes a lagging replica, recent acknowledged data may disappear unless durability rules prevented that.

Mental model:
Replication lag is the distance between "what the primary has decided" and "what this replica can currently prove."


Concept 3: Failover Is a Controlled Transfer of Authority

Concrete example / mini-scenario: The primary becomes unreachable. Monitoring fires. An orchestration system chooses a follower to promote and reroutes clients.

Intuition: Failover is not just a restart action. It is a sequence of safety decisions:

  1. Is the primary truly down or just partitioned?
  2. Which replica is most up to date?
  3. How do we prevent the old primary from continuing to accept writes if it comes back?
  4. How do clients discover and trust the new primary?

This is where incidents usually happen:

Critical failover ingredients:

Trade-off:

This is why failover timing is never "just tune it lower." It is a policy about how much uncertainty you are willing to tolerate before moving authority.


Troubleshooting

Issue: Reads from replicas sometimes show old data.

Why it happens: Replica lag is part of the design. The replica has not yet applied the latest committed changes from the primary.

Clarification / Fix: Decide which paths require fresh reads and route them accordingly, or use read-after-write techniques where necessary.

Issue: Failover succeeds technically, but recent writes disappear.

Why it happens: The promoted replica was behind the failed primary, and the system's durability policy allowed that gap to exist.

Clarification / Fix: Revisit synchronous/semi-synchronous policy, promotion eligibility, and the application's tolerance for write loss windows.

Issue: Failover sometimes creates two writable primaries.

Why it happens: Failure detection and authority revocation are not strong enough under partition or delayed network conditions.

Clarification / Fix: Add fencing, stronger leadership control, or clearer orchestration rules. The goal is not just promotion, but safe promotion.

Issue: Replica lag spikes during heavy load.

Why it happens: Followers may be bottlenecked by apply speed, I/O, network bandwidth, or replay serialization.

Clarification / Fix: Measure the replication pipeline end to end instead of blaming the primary only. The bottleneck may be in apply, not ship.


Advanced Connections

Connection 1: Replication <-> WAL / Durable Logs

The parallel: Many replication systems stream from the same durable change log that crash recovery depends on. Local recovery and remote replication often start from the same ordered source of truth.

Why this matters: This is why WAL and replication are so close conceptually: first record changes durably, then let other nodes catch up.

Connection 2: Replication <-> Sharding

The contrast: Replication copies the same authority domain. Sharding splits the authority domain across nodes. You usually replicate each shard later, but that is a different design decision.

Why this matters: Teams often reach for sharding when replication was enough, or expect replication alone to solve problems that really require partitioning data ownership.


Resources

Suggested Resources


Key Insights

  1. Replication topology is really about write authority and how other copies follow it.
  2. Replica lag changes the meaning of reads and failovers, not just dashboard metrics.
  3. Failover is a safety protocol, not merely a restart or promotion button.

PREVIOUS Optimistic Concurrency and Timestamp Ordering NEXT MVCC Internals: Snapshot Reads and Write Visibility

← Back to Consistency and Replication

← Back to Learning Hub