Day 473: Multi-Leader Replication and Conflict Classes

The core idea: Multi-leader replication is only safe when every write type is assigned a conflict class, because the replication layer can detect concurrent writes but it cannot invent the right business rule for merging them.

Today's "Aha!" Moment

In 043.md, Harbor Point kept reservation shard 184 behind one leader in Maryland. That made the answer to "who assigns write order?" simple: md-db-4 did. Now Harbor Point wants travel agencies in Lisbon to confirm bookings locally even when the Atlantic link is slow, so it adds lis-db-2 as a second writable leader for the same reservation data. Latency improves for European agents, but the old safety argument disappears the moment both regions can accept writes before they have seen each other's latest state.

The non-obvious insight is that multi-leader replication is not one mechanism. It is several different mechanisms hidden under one topology. "Append a staff note to reservation R-8821" is a very different problem from "assign cabin C14 to one customer" or "capture the payment for reservation R-8821." The first can often be merged safely. The second represents a uniqueness invariant. The third escapes the database entirely because a credit card charge cannot be "un-applied" by replaying a row winner. If all three operations share one generic policy such as last-write-wins, the system may look healthy while it is silently discarding intent or duplicating side effects.

That is why conflict classes matter more than the slogan "active-active." A conflict class says what kind of invariant a write threatens, how concurrency is detected, and what the system is allowed to do when two leaders race. Once Harbor Point names those classes explicitly, it can keep low-latency multi-region writes for the operations that are mergeable and reserve stricter coordination for the ones that would otherwise double-book cabins or charge customers twice.

Why This Matters

Teams reach for multi-leader replication because the pressure is real: regional write latency drops, one region can keep taking writes during an inter-region outage, and local operations stop depending on a faraway leader election. Those benefits are valuable, especially for a booking product that serves agencies on both sides of the Atlantic. The trade-off is that write conflicts stop being an occasional anomaly and become part of the steady-state design.

For Harbor Point, the dangerous failure is not "the databases disagree for a few seconds." The dangerous failure is that different kinds of disagreement have different blast radii. Losing a staff note is annoying. Letting two leaders independently reserve cabin C14 for different travelers is revenue loss and customer trust damage. Capturing the same card twice is a financial incident. Multi-leader replication becomes production-ready only when the replication topology is paired with an explicit map of which write classes may merge, which must be rejected, and which must stay single-writer despite the multi-region deployment.

That classification changes architecture decisions immediately. The team no longer asks, "Should this database be active-active?" It asks, "Which write paths are commutative enough for active-active, which need conflict detection with retries, and which still require one home region or a synchronous coordination step?" That is a much more operational question, and it leads to a system that exposes its correctness boundary instead of hoping replication eventually smooths everything over.

Learning Objectives

By the end of this session, you will be able to:

Define conflict classes in terms of invariants - Distinguish mergeable updates from uniqueness conflicts and external side-effect conflicts.
Trace how multi-leader systems detect concurrency - Explain how two leaders discover that they accepted incompatible writes before replication converged.
Choose where multi-leader belongs and where it does not - Use conflict classes to decide when active-active replication is worth the operational trade-off.

Core Concepts Explained

Concept 1: Conflict classes are about business invariants, not just row differences

Harbor Point's Lisbon leader and Maryland leader both store reservation records, but not every field in those records has the same semantics. If a Lisbon agent appends "customer requested late check-in" while Maryland appends "customer prefers deck-side cabin," both updates can coexist. If Lisbon marks cabin C14 reserved for Bruno while Maryland marks the same cabin reserved for Alicia, the rows may look superficially similar, but the invariant is completely different: only one active reservation may own that cabin at a time.

That distinction is why mature multi-leader designs classify writes before they worry about timestamps or replica lag. A useful classification for Harbor Point looks like this:

Conflict class	Harbor Point example	What makes it safe or unsafe	Typical response
Commutative or append-only	Add a reservation note, append an audit event, increment loyalty points with an operation ID	Operations can be replayed in any order if duplicates are suppressed	Merge automatically
Same-record overwrite	Update guest phone number, switch reservation status from `pending` to `confirmed`	Two leaders may both write a new value for one field	Use version checks, deterministic winner rules, or send to repair workflow
Cross-record constraint	Reserve cabin `C14`, assign a unique voucher code, enforce one active booking per cabin-night	The invariant spans more than one field or row	Avoid unconstrained active-active; use one home writer, certification, or synchronous coordination
External side effect	Capture payment, send supplier confirmation, issue refund	The effect already happened outside the replica set	Require idempotency keys, outbox processing, and compensating actions

The production consequence is that "conflict resolution" is not one setting. Conflict class tells you whether a conflict may be merged, whether it must be rejected, or whether it should never have been allowed to race across leaders in the first place. If Harbor Point tries to make cabin allocation behave like note appends, it is not simplifying the design. It is lying about the invariant.

Concept 2: Replication can detect concurrency, but detection is not the same as safe resolution

Once Harbor Point has two writable leaders, each leader accepts a local transaction, stamps it with some concurrency metadata, and ships it to the other region. The metadata varies by system: some use origin timestamps, some use vector clocks or Lamport-like versions, and some use write-set certification to check whether concurrent transactions touched the same keys. The core question is always the same: did one write happen after observing the other, or did both leaders act independently?

Suppose the system sees this timeline:

09:01:03 md-db-4  accepts reserve(C14, Alicia)   version = md:881
09:01:05 lis-db-2 accepts reserve(C14, Bruno)    version = lis:417
09:01:07 link heals; both updates replicate
09:01:07 each leader sees the remote write as concurrent, not ancestral

At that point the replication layer has done something important but limited. It has proven the conflict is real. It has not proven what "correct" means. If Harbor Point applies last-write-wins by wall clock, one customer silently loses the cabin. If it keeps both rows and hopes downstream reconciliation will sort it out, availability looks good right until check-in time. If it uses write-set certification and rejects whichever transaction loses certification, the user in one region gets a retry instead of silent corruption. Those outcomes are not database trivia. They are product semantics expressed as replication policy.

A simplified handler makes the separation clear:

def reconcile(local_txn, remote_txn, conflict_class):
    if conflict_class == "commutative":
        return merge_operations(local_txn, remote_txn)
    if conflict_class == "overwrite":
        return choose_winner_and_record_loser(local_txn, remote_txn)
    if conflict_class == "constraint":
        raise RetryOrReject("requires single-owner or certified commit")
    if conflict_class == "external_side_effect":
        return run_compensation_workflow(local_txn, remote_txn)

The trade-off is visible here. The more classes Harbor Point allows into true multi-leader mode, the lower its regional write latency can be during normal operation. The more it relies on certification, retries, or manual repair for those classes, the more it shifts cost onto application design, operator tooling, and user-facing retry paths. Multi-leader systems succeed when that cost is intentional instead of accidental.

Concept 3: Production designs narrow active-active scope instead of making everything writable everywhere

After classifying its writes, Harbor Point does not actually keep one uniform policy for the whole reservation domain. It chooses different write envelopes for different conflict classes:

Reservation notes, audit events, and agency comments stay active-active because they are append-only and easy to deduplicate.
Customer profile edits remain multi-leader, but only with per-field version checks and a visible repair queue when two agents change the same field concurrently.
Cabin availability and confirmed reservations move back to a single home region per sailing, because the uniqueness invariant is more important than shaving a few milliseconds off every write.
Payment capture leaves the database write path entirely and goes through an idempotent outbox-driven workflow so region failover cannot charge twice.

This is the practical lesson most teams miss. Multi-leader replication is rarely a binary property of a whole product. It is a selective capability granted only to the operations whose conflict class has a safe merge or retry story. Systems that try to make every row active-active often rediscover, painfully, that some invariants are coordination problems no matter how elegant the replication layer looks.

That selective approach also prepares the next design question. Once you accept that not every operation wants a single leader but not every operation can tolerate unconstrained concurrency, the next step is to reason about how read and write quorums bound visibility and staleness. That is exactly where 045.md picks up.

Troubleshooting

Issue: Last-write-wins appears to "solve" reservation conflicts, but customers report disappearing bookings.
- Why it happens: The policy resolved storage divergence without preserving the business invariant that one accepted booking must produce one durable customer-visible outcome.
- Clarification / Fix: Treat cabin assignment as a constraint conflict, not an overwrite conflict. Use certification, a home writer, or synchronous coordination instead of timestamp winners.
Issue: Conflict rates spike during an inter-region link flap, even for operations that should have been safe.
- Why it happens: The system is classifying too broadly, so append-only operations and true invariant conflicts are sharing one expensive reconciliation path.
- Clarification / Fix: Split write classes explicitly. Deduplicate append-only events by operation ID and keep stricter conflict handling only for writes that can violate invariants.
Issue: Operators can reconcile the database state after recovery, but duplicate charges or duplicate supplier notifications still happened.
- Why it happens: External side effects escaped before conflict resolution completed, so database convergence did not rewind the real-world action.
- Clarification / Fix: Put side effects behind idempotency keys and outbox processing. Reconcile state first, then run compensation or replay logic intentionally.
Issue: Support sees "please retry" errors far more often after enabling active-active writes.
- Why it happens: The database is correctly rejecting certified conflicts, but the product path has no retry budget or user-facing fallback.
- Clarification / Fix: Measure conflict rate per class, not just cluster-wide. If a path cannot tolerate retries, it may not belong in multi-leader mode.

Advanced Connections

Connection 1: 043.md removed many of these problems by centralizing order in one leader

Leader-based replication avoided most reservation conflicts because one process assigned the shard's write order. Multi-leader replication deliberately gives up that single sequencer, so conflict classes become the replacement tool for expressing which concurrent writes are acceptable and which are not.

Connection 2: 045.md approaches concurrent writes through quorum overlap instead of multiple leaders

The next lesson studies leaderless replication and quorum math. That model changes the coordination mechanism, but it does not remove the need to understand conflict classes; it simply shifts the question from "which leader accepted the write?" to "which replicas overlap enough to make a read or write meaningful?"

Connection 3: CRDT-style data types solve only the mergeable end of the problem space

Counters, sets, and append-only logs can often be designed so concurrent updates converge automatically. Cabin exclusivity and payment capture are different because they encode one-winner or external-side-effect semantics, which CRDT-style convergence alone does not make safe.

Resources

[BOOK] Designing Data-Intensive Applications
- Focus: Read the multi-leader replication chapter with attention to why write conflicts are application-specific rather than purely storage-specific.
[PAPER] Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System
- Focus: Notice how Bayou separates conflict detection from application-specific merge procedures and dependency checks.
[DOC] Apache CouchDB Documentation: Conflicts
- Focus: See how an eventually replicated store exposes document conflicts explicitly instead of pretending a generic merge rule fits every workload.
[DOC] MySQL 8.0 Reference Manual: Group Replication Conflict Detection and Resolution
- Focus: Map write-set certification to the idea that some concurrent transactions should be rejected rather than merged silently.

Key Insights

Conflict classes describe invariants, not just data shapes - Two writes can touch similar rows while demanding completely different safety rules.
Concurrency detection is necessary but insufficient - A database can prove two leaders raced without knowing whether to merge, reject, or compensate.
Selective active-active beats universal active-active - Multi-leader replication is strongest when only mergeable or retry-tolerant write paths use it.

← Back to Consistency and Replication

← Back to Learning Hub