Consensus, Quorums, and Coordination

LESSON

Distributed Systems Foundations

005 25 min beginner

Consensus, Quorums, and Coordination

Core Insight

Imagine a feature flag service that controls checkout. The flag decides whether customers use the old payment flow or a new one. Five machines store the flag. Normally one leader accepts changes and the others follow. Then the network splits: machines A, B, and C can talk to each other, while D and E are isolated on the other side.

If both sides keep accepting official changes, checkout can wake up with two histories. One side says version 8 enabled the new payment flow. The other side says version 8 disabled it. Both groups may have logs, timestamps, and confident operators. The problem is not missing data; it is conflicting authority.

Consensus is the family of mechanisms used when several machines must make one official decision even though machines can fail and messages can be delayed. The decision might be which node is leader, which configuration version is active, who owns a lock, or which entry comes next in a shared log.

A quorum is enough participants to make a decision count. In a five-node group, a common quorum size is three. The important property is not the word "majority" by itself. The important property is overlap: any two quorums of three share at least one node. That overlap carries evidence from one official decision to the next and helps prevent two isolated groups from inventing separate histories.

Consensus is not a speed trick. It is a safety mechanism. It buys one official answer by spending latency and, during some failures, availability. Use it where disagreement is more expensive than waiting.

The Decision That Must Not Fork

Use the feature flag service as the worked example. It protects one small but important fact:

checkout_payment_flow = old

Five nodes replicate the configuration:

nodes: A B C D E
quorum size: 3
current version: 7
current leader: A

An operator now enables the new flow. The system should not make the change official merely because one node heard the request. One node can crash. One node can be stale. One node can be isolated. The change becomes official only when enough nodes accept the same ordered decision.

proposed version 8:
  checkout_payment_flow = new

A appends version 8 locally
A asks B C D E to accept version 8
B accepts
C accepts

quorum reached: A B C
version 8 is now committed

Nodes D and E may learn version 8 later. They do not need to be instantly up to date for the decision to be safe. The safety comes from the committed decision having passed through a group that overlaps with any future group that can also commit.

The mental model is: a decision becomes official when it has enough shared evidence that a later official decision cannot be completely ignorant of it.

Why Quorum Overlap Matters

With five nodes and quorum size three, any two successful decision groups overlap.

quorum 1: A B C
quorum 2: C D E

overlap: C

That shared node matters because it can carry evidence of earlier accepted decisions. A later quorum cannot be made entirely of participants that never saw, accepted, or had a chance to learn about the earlier official history.

Now compare a bad threshold:

five nodes
quorum size: 2

group A B accepts:
  version 8 = new

group D E accepts:
  version 8 = old

overlap: none

This threshold is more available because smaller groups can proceed. It is also unsafe for a single-history promise because two disjoint groups can make incompatible official decisions. That failure is called split brain: more than one participant or group acts as if it has official authority.

Quorum overlap is the first layer. A real consensus protocol also needs rules for leadership, terms, ordering, retries, and recovery. Overlap says successful decision groups intersect. The protocol says what nodes do with that intersection so old leaders, stale messages, and reordered replies do not corrupt the history.

Leaders, Terms, And Logs

Consensus protocols such as Raft usually make one node the leader for a period of time. The leader receives proposed changes, assigns them positions in a log, and asks followers to accept those entries. A log is an ordered sequence of decisions. A term or epoch is a numbered leadership period.

For the feature flag, the log might look like this:

index 6, term 3: checkout_payment_flow = old
index 7, term 3: fraud_check_level = strict
index 8, term 4: checkout_payment_flow = new

The index says where the decision sits in the ordered history. The term says which leadership period proposed it. Those numbers help nodes answer questions such as:

The exact details vary by protocol, but the design goal is stable: once an entry is committed, a later leader must not erase it and replace it with a different official entry at the same position.

Here is a simplified path:

1. A becomes leader for term 4.
2. Operator asks A to enable the new payment flow.
3. A appends log entry index 8, term 4.
4. A sends the entry to followers.
5. B and C accept index 8, term 4.
6. A sees quorum A B C and marks index 8 committed.
7. A tells clients version 8 is official.

The client does not need to understand every message. The important promise is that the service will not later allow another group to make a different official index 8 without preserving the committed history.

It also helps to separate committed from applied. A committed entry is official because a quorum accepted it under the protocol's rules. An applied entry is one a node has already used to update its local state machine. A follower may know that version 8 is committed but apply it a moment later than the leader. That lag is usually acceptable as long as the follower does not claim a conflicting official history.

This distinction explains why consensus-backed systems often expose both a committed index and an applied index. The committed index tells you how far the official log has advanced. The applied index tells you how far this node's local behavior has caught up. When those numbers drift apart, the system may still be safe, but clients reading from lagging followers can see stale behavior unless the read path has its own rule.

What Happens During A Partition

Now split the network:

side 1: A B C
side 2: D E
quorum size: 3

Side 1 can still form a quorum. If A remains leader and can reach B and C, it can continue committing guarded decisions. Side 2 has two live machines, but two is not enough. They may serve stale reads if the product allows that, but they must not accept official writes that require consensus.

This can feel harsh. Machines D and E are not dead. They may have CPU, memory, disk, and local logs. But they lack enough shared evidence to safely create new official history. Refusing writes is the point. The system is choosing "no new official answer here" over "two official answers."

A different partition shows the availability cost:

side 1: A B
side 2: C D
isolated: E

No side can form a quorum of three. The protected decision path stops. That does not mean the whole product must stop. The feature flag service may continue serving the last known committed value, but it should not publish a new official version until enough participants can coordinate again.

That is the core trade-off: consensus preserves safety by sometimes refusing progress.

Coordination Boundaries

The hardest design question is not "Can we use consensus?" It is "Which decisions must not fork?"

Some actions do not deserve consensus. Refreshing a cache entry twice may waste work but not break the product. Approximate analytics counters can often merge later. A social feed can tolerate stale ranking for a while. Making these actions wait for quorums can add latency and fragility without protecting a meaningful invariant.

Other decisions are different:

For those decisions, two official answers can create damage that repair cannot easily hide. If two leaders both accept writes, the system may produce incompatible histories. If two configuration versions are both official, operators cannot explain which behavior users should have seen.

A useful design sentence is:

For this decision, disagreement would cost more than waiting,
so the system requires a quorum before the decision becomes official.

If that sentence is not true, consensus may be the wrong tool.

Failure Modes And Limits

The representative failure is split brain. Two groups both believe they can lead, accept writes, or publish configuration. When communication returns, the system does not merely have a missing update. It has incompatible histories that may have produced user-visible effects.

Another failure is stale leadership. A node that was leader before a partition may keep acting as if it still has authority. Terms, leases, heartbeats, and quorum checks are ways protocols stop old authority from quietly continuing after the group has moved on.

A third failure is coordinating too much. If every low-value action waits for a quorum, the system can become slower and less available than the product requires. Consensus should protect the small set of decisions where disagreement is unacceptable, not every read, cache refresh, or approximate statistic.

Consensus also does not make time disappear. It orders decisions inside the protected log, but it still uses messages, timeouts, leader elections, and failure detection. The next lesson on time and causality explains why clocks alone cannot prove what a node has seen.

Operational Signals

A consensus-backed system should expose signals that show both safety and progress pressure:

These signals matter because consensus failures can look like ordinary slowness from a client. A write that hangs may mean the leader cannot reach a quorum. A spike in elections may mean unstable networking. A follower lagging behind may not break safety immediately, but it reduces the margin for future failures.

The operational question is: can we tell whether the system is refusing progress to protect safety, or merely stuck because the implementation is unhealthy?

Practice Prompt

Pick one system decision: leader election, feature flag publishing, lock ownership, shopping cart updates, cache refresh, analytics counters, or configuration rollout. Fill in:

decision:
what would fork if two answers existed:
participants:
quorum or owner rule:
what becomes official only after quorum:
what happens without enough participants:
why waiting is or is not worth it:
signal that would show the coordination path is unhealthy:

If you cannot name what would fork, the decision may not need consensus. If you can name a costly fork, the next question is what evidence makes the decision official.

Resources

Key Takeaways

PREVIOUS Fault Tolerance, Retries, and Idempotency NEXT Time, Clocks, and Causality