Multi-Leader Replication and Conflict Classes

LESSON

Consistency and Replication

044 30 min advanced

Day 473: Multi-Leader Replication and Conflict Classes

The core idea: Multi-leader replication is only safe when every write type is assigned a conflict class, because the replication layer can detect concurrent writes but it cannot invent the right business rule for merging them.

Today's "Aha!" Moment

In 043.md, Harbor Point kept reservation shard 184 behind one leader in Maryland. That made the answer to "who assigns write order?" simple: md-db-4 did. Now Harbor Point wants travel agencies in Lisbon to confirm bookings locally even when the Atlantic link is slow, so it adds lis-db-2 as a second writable leader for the same reservation data. Latency improves for European agents, but the old safety argument disappears the moment both regions can accept writes before they have seen each other's latest state.

The non-obvious insight is that multi-leader replication is not one mechanism. It is several different mechanisms hidden under one topology. "Append a staff note to reservation R-8821" is a very different problem from "assign cabin C14 to one customer" or "capture the payment for reservation R-8821." The first can often be merged safely. The second represents a uniqueness invariant. The third escapes the database entirely because a credit card charge cannot be "un-applied" by replaying a row winner. If all three operations share one generic policy such as last-write-wins, the system may look healthy while it is silently discarding intent or duplicating side effects.

That is why conflict classes matter more than the slogan "active-active." A conflict class says what kind of invariant a write threatens, how concurrency is detected, and what the system is allowed to do when two leaders race. Once Harbor Point names those classes explicitly, it can keep low-latency multi-region writes for the operations that are mergeable and reserve stricter coordination for the ones that would otherwise double-book cabins or charge customers twice.

Why This Matters

Teams reach for multi-leader replication because the pressure is real: regional write latency drops, one region can keep taking writes during an inter-region outage, and local operations stop depending on a faraway leader election. Those benefits are valuable, especially for a booking product that serves agencies on both sides of the Atlantic. The trade-off is that write conflicts stop being an occasional anomaly and become part of the steady-state design.

For Harbor Point, the dangerous failure is not "the databases disagree for a few seconds." The dangerous failure is that different kinds of disagreement have different blast radii. Losing a staff note is annoying. Letting two leaders independently reserve cabin C14 for different travelers is revenue loss and customer trust damage. Capturing the same card twice is a financial incident. Multi-leader replication becomes production-ready only when the replication topology is paired with an explicit map of which write classes may merge, which must be rejected, and which must stay single-writer despite the multi-region deployment.

That classification changes architecture decisions immediately. The team no longer asks, "Should this database be active-active?" It asks, "Which write paths are commutative enough for active-active, which need conflict detection with retries, and which still require one home region or a synchronous coordination step?" That is a much more operational question, and it leads to a system that exposes its correctness boundary instead of hoping replication eventually smooths everything over.

Learning Objectives

By the end of this session, you will be able to:

  1. Define conflict classes in terms of invariants - Distinguish mergeable updates from uniqueness conflicts and external side-effect conflicts.
  2. Trace how multi-leader systems detect concurrency - Explain how two leaders discover that they accepted incompatible writes before replication converged.
  3. Choose where multi-leader belongs and where it does not - Use conflict classes to decide when active-active replication is worth the operational trade-off.

Core Concepts Explained

Concept 1: Conflict classes are about business invariants, not just row differences

Harbor Point's Lisbon leader and Maryland leader both store reservation records, but not every field in those records has the same semantics. If a Lisbon agent appends "customer requested late check-in" while Maryland appends "customer prefers deck-side cabin," both updates can coexist. If Lisbon marks cabin C14 reserved for Bruno while Maryland marks the same cabin reserved for Alicia, the rows may look superficially similar, but the invariant is completely different: only one active reservation may own that cabin at a time.

That distinction is why mature multi-leader designs classify writes before they worry about timestamps or replica lag. A useful classification for Harbor Point looks like this:

Conflict class Harbor Point example What makes it safe or unsafe Typical response
Commutative or append-only Add a reservation note, append an audit event, increment loyalty points with an operation ID Operations can be replayed in any order if duplicates are suppressed Merge automatically
Same-record overwrite Update guest phone number, switch reservation status from pending to confirmed Two leaders may both write a new value for one field Use version checks, deterministic winner rules, or send to repair workflow
Cross-record constraint Reserve cabin C14, assign a unique voucher code, enforce one active booking per cabin-night The invariant spans more than one field or row Avoid unconstrained active-active; use one home writer, certification, or synchronous coordination
External side effect Capture payment, send supplier confirmation, issue refund The effect already happened outside the replica set Require idempotency keys, outbox processing, and compensating actions

The production consequence is that "conflict resolution" is not one setting. Conflict class tells you whether a conflict may be merged, whether it must be rejected, or whether it should never have been allowed to race across leaders in the first place. If Harbor Point tries to make cabin allocation behave like note appends, it is not simplifying the design. It is lying about the invariant.

Concept 2: Replication can detect concurrency, but detection is not the same as safe resolution

Once Harbor Point has two writable leaders, each leader accepts a local transaction, stamps it with some concurrency metadata, and ships it to the other region. The metadata varies by system: some use origin timestamps, some use vector clocks or Lamport-like versions, and some use write-set certification to check whether concurrent transactions touched the same keys. The core question is always the same: did one write happen after observing the other, or did both leaders act independently?

Suppose the system sees this timeline:

09:01:03 md-db-4  accepts reserve(C14, Alicia)   version = md:881
09:01:05 lis-db-2 accepts reserve(C14, Bruno)    version = lis:417
09:01:07 link heals; both updates replicate
09:01:07 each leader sees the remote write as concurrent, not ancestral

At that point the replication layer has done something important but limited. It has proven the conflict is real. It has not proven what "correct" means. If Harbor Point applies last-write-wins by wall clock, one customer silently loses the cabin. If it keeps both rows and hopes downstream reconciliation will sort it out, availability looks good right until check-in time. If it uses write-set certification and rejects whichever transaction loses certification, the user in one region gets a retry instead of silent corruption. Those outcomes are not database trivia. They are product semantics expressed as replication policy.

A simplified handler makes the separation clear:

def reconcile(local_txn, remote_txn, conflict_class):
    if conflict_class == "commutative":
        return merge_operations(local_txn, remote_txn)
    if conflict_class == "overwrite":
        return choose_winner_and_record_loser(local_txn, remote_txn)
    if conflict_class == "constraint":
        raise RetryOrReject("requires single-owner or certified commit")
    if conflict_class == "external_side_effect":
        return run_compensation_workflow(local_txn, remote_txn)

The trade-off is visible here. The more classes Harbor Point allows into true multi-leader mode, the lower its regional write latency can be during normal operation. The more it relies on certification, retries, or manual repair for those classes, the more it shifts cost onto application design, operator tooling, and user-facing retry paths. Multi-leader systems succeed when that cost is intentional instead of accidental.

Concept 3: Production designs narrow active-active scope instead of making everything writable everywhere

After classifying its writes, Harbor Point does not actually keep one uniform policy for the whole reservation domain. It chooses different write envelopes for different conflict classes:

This is the practical lesson most teams miss. Multi-leader replication is rarely a binary property of a whole product. It is a selective capability granted only to the operations whose conflict class has a safe merge or retry story. Systems that try to make every row active-active often rediscover, painfully, that some invariants are coordination problems no matter how elegant the replication layer looks.

That selective approach also prepares the next design question. Once you accept that not every operation wants a single leader but not every operation can tolerate unconstrained concurrency, the next step is to reason about how read and write quorums bound visibility and staleness. That is exactly where 045.md picks up.

Troubleshooting

Advanced Connections

Connection 1: 043.md removed many of these problems by centralizing order in one leader

Leader-based replication avoided most reservation conflicts because one process assigned the shard's write order. Multi-leader replication deliberately gives up that single sequencer, so conflict classes become the replacement tool for expressing which concurrent writes are acceptable and which are not.

Connection 2: 045.md approaches concurrent writes through quorum overlap instead of multiple leaders

The next lesson studies leaderless replication and quorum math. That model changes the coordination mechanism, but it does not remove the need to understand conflict classes; it simply shifts the question from "which leader accepted the write?" to "which replicas overlap enough to make a read or write meaningful?"

Connection 3: CRDT-style data types solve only the mergeable end of the problem space

Counters, sets, and append-only logs can often be designed so concurrent updates converge automatically. Cabin exclusivity and payment capture are different because they encode one-winner or external-side-effect semantics, which CRDT-style convergence alone does not make safe.

Resources

Key Insights

  1. Conflict classes describe invariants, not just data shapes - Two writes can touch similar rows while demanding completely different safety rules.
  2. Concurrency detection is necessary but insufficient - A database can prove two leaders raced without knowing whether to merge, reject, or compensate.
  3. Selective active-active beats universal active-active - Multi-leader replication is strongest when only mergeable or retry-tolerant write paths use it.
PREVIOUS Leader-Based Replication Mechanics NEXT Leaderless Replication and Quorum Math

← Back to Consistency and Replication

← Back to Learning Hub