Replication Models: Primary-Backup, Multi-Leader, and Leaderless
LESSON
Replication Models: Primary-Backup, Multi-Leader, and Leaderless
The core idea: Replication models are coordination choices because each one decides where writes become authoritative and where the system pays to reconcile disagreement.
Core Insight
Imagine a product with three kinds of data: account balances, shopping carts, and user preference toggles. Copying each kind of data to several machines is easy to say, but the hard question is different: where is a write allowed to become real?
For account balances, a single strongly controlled write path may be worth the latency because conflicting updates are expensive. For shopping carts, local writes from multiple devices or regions might be acceptable if the merge behavior is clear. For preferences, a leaderless or eventually repaired model may be enough if "last update wins" or another simple rule is acceptable.
Primary-backup, multi-leader, and leaderless replication are three answers to that authority question. They are not just deployment diagrams. They move coordination cost to different moments: before the write, after the write, or across read/write quorum and repair paths.
The trade-off is the whole point. Primary-backup buys clarity by centralizing authority. Multi-leader buys locality by accepting reconciliation. Leaderless buys availability and flexibility by pushing authority into quorums, version metadata, and repair semantics.
Authority, Not Just Copies
A system can have many physical copies while still having one logical write authority. Another system can have many write authorities and therefore more conflict risk. A third system can avoid a permanent leader and instead decide whether enough replicas have participated.
The useful comparison is:
Model Where writes become authoritative
-------------- ---------------------------------------------
Primary-backup one primary path
Multi-leader several regional or shard-local leaders
Leaderless replica quorums plus version/repair semantics
This framing helps avoid a common mistake: choosing a replication model by how scalable it sounds rather than by how the data behaves under conflict.
A balance transfer, a collaborative document edit, and a preference toggle do not have the same conflict cost. A replication model that is excellent for one may be dangerous or needlessly expensive for another.
Primary-Backup: Pay Early for a Clear Writer
Primary-backup centralizes normal write authority. Clients send writes to the primary; backups learn from it.
client -> primary -> backups
That model is attractive when the system needs a simple answer to "what order did writes happen in?" The primary becomes the normal sequencer for a shard, log, or piece of metadata.
It often fits:
- consensus-backed logs
- metadata stores
- strongly ordered control paths
- data where conflicting writes are costly
The benefit is a cleaner correctness story. The system can make most write conflicts disappear by not allowing several normal writers for the same authority domain.
The cost is concentration:
- the primary can become a bottleneck
- write locality is constrained
- failover behavior matters a lot
- backup lag affects read freshness and recovery
Primary-backup is therefore not "simple and free." It pays coordination cost before or during the write so the rest of the system has less reconciliation to do later.
Multi-Leader: Buy Local Writes, Own Conflicts
Multi-leader replication lets several leaders accept writes, often in different regions or sites:
region A leader <----async----> region B leader
region B leader <----async----> region C leader
This can be the right shape when local write latency and regional autonomy matter. A user in Europe can write to a European leader while a user in Asia writes to an Asian leader, even if the inter-region link is slow or temporarily impaired.
The price is that several writes can be legitimate at the same time. That means global order is no longer simple. Conflicts are not protocol accidents; they are part of the design surface.
Multi-leader works best when at least one of these is true:
- conflicts are rare
- conflicts are easy to merge
- users can resolve conflicts explicitly
- the product can tolerate temporary divergence
- the data model has natural commutative operations
The practical question is not "can we run more leaders?" It is:
Can this product survive several valid writers before reconciliation finishes?
If the answer is no, multi-leader can look fast in the happy path and become expensive during every edge case.
Leaderless: Move Coordination into Quorums and Repair
Leaderless replication removes the fixed primary role. Clients can write to a replica set, and the system uses quorum choices, version metadata, read repair, anti-entropy, and merge policy to converge.
client -> replicas
write quorum
read quorum
versioning / repair / merge
This model can improve availability and locality because no single always-current leader has to be reachable for every operation. But leaderless does not mean "no coordination." It means coordination has moved.
Instead of asking one leader for the authoritative write path, the system asks:
Did enough replicas participate?
Can reads observe enough evidence?
Can divergent versions be detected and repaired?
Does the application know how to merge conflict?
The hidden cost is real:
- quorum tuning becomes part of correctness and latency
- stale reads become possible under weaker settings
- version vectors or similar metadata may be needed
- read repair and anti-entropy become operationally important
- application merge logic can become the hardest part
Leaderless systems are strongest when availability and partition tolerance matter, and when the data model can absorb reconciliation without surprising users.
Worked Example: Three Data Types
Take three fields in the same product.
account_balance
shopping_cart
theme_preference
For account_balance, conflicting writes can lose money or create double-spend behavior. A primary-backed or consensus-backed model may be worth the coordination cost.
For shopping_cart, two devices may add different items while disconnected. A multi-leader or leaderless model can work if the merge rule is "union added items, preserve removals carefully."
For theme_preference, the system may accept a much weaker rule, such as the most recent timestamped preference wins. A leaderless or eventually repaired path may be acceptable because the cost of conflict is low.
The same replication model across all three may be either too strict or too loose. The right design follows the data's conflict semantics.
Comparison and Misreadings
A compact way to remember the trade-off:
Model Main gain Main cost
-------------- ------------------------ -------------------------------
Primary-backup simple authority/order bottleneck and failover pressure
Multi-leader local writes/autonomy conflict handling later
Leaderless availability/flexibility quorum, repair, and merge logic
"Multi-leader is just faster primary-backup" is wrong. It creates several legitimate write authorities, which means reconciliation becomes a core responsibility.
"Leaderless removes coordination" is also wrong. It removes one coordination shape and replaces it with quorum math, version tracking, and repair.
"Primary-backup is always safest" is too broad. It may be safest for a particular invariant, but it can be too expensive for workloads that need regional locality or high availability under partitions.
Connections
The previous lesson showed ZAB as a strong leader-based design for ordered coordination state. Primary-backup lives near that world: one normal authority makes ordering easier.
The next lesson on distributed logs sharpens the same question. Logs give ordered histories, but the scope of that order depends on partitioning, authority, and the replication model underneath.
Resources
- [BOOK] Designing Data-Intensive Applications
- Focus: Read the replication chapters for a clear comparison of leader-based, multi-leader, and leaderless designs.
- [PAPER] Dynamo: Amazon's Highly Available Key-value Store
- Focus: Study leaderless-style ideas such as quorums, versioning, sloppy quorum, hinted handoff, and repair.
- [PAPER] In Search of an Understandable Consensus Algorithm
- Focus: Use Raft as a contrast point for a strongly leader-centric replicated log.
Key Takeaways
- Replication models differ mainly by where write authority lives: one primary, several leaders, or quorum-backed replica sets.
- There is no free model; each trade-off moves cost among write latency, locality, conflict handling, failover, and repair.
- Workload semantics decide the right choice: the more expensive a conflict is, the more deliberately the system should constrain where writes become authoritative.