Chain Replication and Ordered Failover

LESSON

Consistency and Replication

011 30 min intermediate

Chain Replication and Ordered Failover

The core idea: Chain replication makes a replicated object behave like one ordered pipeline: writes enter at the head, become committed at the tail, and failover must preserve that prefix before clients can trust the new chain.

Core Insight

Harbor Point has one reservation bucket that is too important to reconcile casually: the last available allocation for a thinly traded municipal bond. Two desks can race to reserve it, and a stale or double-accepted answer would create a real business break. The team could use quorums and repair, but for this object they want a simpler promise: if a write is acknowledged, every later committed read should reflect one agreed order.

Chain replication makes that promise by turning the replica set into a fixed pipeline. A reservation update enters at the head, travels through each replica in order, and is not complete until the tail has applied it. Ordinary reads go to the tail because the tail is the node that has seen the full committed prefix. The chain may have in-flight writes, but the committed history has one clear end.

The tempting misconception is that chain replication is just leader-follower replication with more followers. The head does look leader-like, but the defining property is the ordered path and the special meaning of the tail. A completed write is not "the head accepted it." A completed write is "the update reached the tail in the configured chain."

That clarity has a cost. Writes pay the latency of the chain, the tail can become a read bottleneck, and node failures require careful reconfiguration. Chain replication trades some availability flexibility for a read and write contract that is easy to explain and hard to accidentally weaken.

The Ordered Pipeline

Suppose Harbor Point stores the allocation bucket on three replicas:

chain epoch 42

[HEAD A]  ->  [B]  ->  [TAIL C]

A trader submits reserve(bucket=MUNI-CA-17, qty=1). The request does not go to whichever replica is nearby. It enters at the head:

client
  |
  v
[HEAD A] --reserve#881--> [B] --reserve#881--> [TAIL C]
                                                    |
                                                    v
                                             commit / ack

Each replica applies updates in the same sequence before forwarding them. If reserve#881 is followed by release#882, every replica sees that order as the updates move down the chain. The tail applies reserve#881 last, and that is the point where the write becomes part of the committed prefix.

Update order at each replica:

A: reserve#881, release#882, ...
B: reserve#881, release#882, ...
C: reserve#881, release#882, ...  <- committed prefix visible here

This is why chain replication is attractive for objects where the business wants one simple story. The chain removes ordinary read-time ambiguity. Harbor Point does not need to ask whether a read quorum intersects the latest write quorum for this bucket. It asks whether the read came from the tail of the current chain.

The mechanism also reveals the trade-off. The head can contain writes that the tail has not committed yet. The middle can be between the head and the tail. Those states are normal, but they are not all safe read points for the committed view. The design gets its clean semantics by giving each position in the chain a different meaning.

Reads From the Tail

After reserve#881 reaches the tail, a read from C sees the committed reservation. A read from A might see more than the committed prefix because the head may have accepted a later update that has not reached the tail. A read from B has the same problem in milder form. It is somewhere inside the pipeline, not at the point where the committed prefix is defined.

Time T:

A has applied: reserve#881, release#882
B has applied: reserve#881
C has applied: reserve#881

committed prefix: reserve#881
tail read: safe committed view
head read: may expose in-flight work

That rule is the contrast with the previous lesson. In read-your-writes routing, Harbor Point checked whether a replica had replayed a session token. In chain replication, the architecture names the safe read point up front. If the client reads the tail of the current chain after its write has been acknowledged, it reads a node that contains the full committed prefix.

This does not make the design free. Tail reads can concentrate load, so high-read systems may need batching, object placement across many chains, or extensions such as CRAQ that allow non-tail reads when they can prove the object version is clean. The base idea stays the same: a faster read path must preserve the committed-prefix contract instead of treating all replicas as interchangeable.

Ordered Failover

The hard part of chain replication is not the happy path. It is changing the chain without losing the meaning of "committed."

If the middle node fails, the system might move from:

epoch 42: A -> B -> C

to:

epoch 43: A -> C

That change must be coordinated. The control plane has to stop clients from using the old epoch, determine which updates reached which replicas, preserve the committed prefix at the old tail, and resume only when the new chain has a safe ordering. Otherwise, Harbor Point could accidentally acknowledge an update in one topology and read from a new topology that has not preserved it.

Different failures stress different parts of the invariant:

Failed position   Main risk
---------------   ------------------------------------------------
head              clients may be sending writes to a dead entry point
middle            in-flight updates may be split across the pipeline
tail              the committed read point has disappeared

A head failure is often the easiest to reason about: promote the next node as the new head after ensuring clients stop using the old one. A tail failure is more delicate because the tail is where completion was observed. The new tail must be known to contain the committed prefix before reads resume. A middle failure requires reconnecting the predecessor and successor without skipping or duplicating in-flight updates.

The phrase "ordered failover" is doing real work here. Failover is not merely replacing capacity. It is installing a new chain epoch that preserves the prefix relation clients rely on.

Operational Trade-offs

Chain replication buys a strong and local mental model: writes have one path, reads have one safe point, and committed state is a prefix at the tail. That makes it attractive for per-object stores, metadata services, inventory buckets, and other places where a single ordered history matters more than accepting writes on any reachable node.

It also concentrates operational pressure:

The comparison with quorum systems is not that one is universally stronger. They package coordination differently. Quorums rely on intersecting read and write sets plus repair. Chain replication relies on one ordered propagation path plus a distinguished committed read point. Harbor Point should choose the shape that matches the object: repair-friendly preferences and counters may not need a chain, while a scarce allocation bucket benefits from the tail-defined prefix.

The next topic, sharding, changes the unit of authority again. Chain replication says how one replicated object or partition can maintain an ordered history. Sharding asks how many such authority domains should exist and how requests find the right one.

Failure Modes

Reading from an internal replica for lower latency. A middle replica may be close to the client and mostly up to date, but it is not the committed read point in basic chain replication. Reading there can expose in-flight writes or miss the latest committed prefix unless the system has an explicit protocol proving the version is safe.

Acknowledging at the head. If Harbor Point tells the client a reservation is complete when the head accepts it, the client has a stronger belief than the chain has earned. A crash before the update reaches the tail can leave the system unable to honor that acknowledgement.

Treating failover as simple rerouting. Moving clients from A -> B -> C to A -> C without an epoch and catch-up protocol can break the ordering guarantee. The new chain must preserve committed history, not merely have live nodes.

Forgetting that tail load is part of the design. The tail is special, so read scaling needs a deliberate plan: partition objects across chains, add safe read extensions, or accept the tail as the bottleneck for the objects that need this guarantee.

Resources

Key Takeaways

  1. Chain replication makes a committed write mean that the update reached the tail through one ordered path.
  2. The tail is the safe read point because it defines the full committed prefix of the chain.
  3. Failover must install a new chain epoch that preserves committed history before clients rely on it.
  4. The design trades flexible availability and arbitrary replica reads for a simpler, stronger per-object replication contract.
PREVIOUS Replication Lag and Read-Your-Writes NEXT Sharding and Authority Boundaries