Raft Log Replication and Commit Semantics

Day 214: Raft Log Replication and Commit Semantics

In Raft, replicating an entry and committing an entry are not the same event. The whole safety story depends on that gap being handled carefully.


Today's "Aha!" Moment

Once Raft makes the leader explicit, the next instinct is simple: the leader receives a command, sends it to followers, and once enough followers have it, the command is done.

That intuition is close, but dangerously incomplete.

The subtlety is that "an entry exists on several machines" does not automatically mean "this entry is now part of the durable committed history the cluster may safely apply." There is a gap between:

That gap is the aha for this lesson.

Raft needs it because leadership can change. A leader may replicate an entry to some followers and then disappear before the cluster has enough evidence that this entry is safe to treat as committed. A later leader must not accidentally build a different history that makes an earlier unsafe assumption visible to users.

So Raft log replication is not just "copy bytes to followers." It is:

That is what makes Raft a consensus protocol rather than just leader-led shipping of log records.

Why This Matters

Suppose a leader appends entry 42 and successfully sends it to one follower, but crashes before reaching a majority. The entry exists in multiple places. Is it safe for any node to apply it to the state machine?

If the answer were "yes, because replication happened," the cluster could expose state that later disappears when a new leader overwrites or bypasses that entry. That would break the whole illusion of a single authoritative history.

This is why Raft's commit semantics matter so much:

In production, many confusing bugs or misunderstandings live exactly here:

If the learner gets commit semantics right, later topics like membership change, snapshots, and production debugging become much easier.

Learning Objectives

By the end of this session, you will be able to:

  1. Differentiate replication from commitment - Explain why an entry can be present on followers before it is safe to treat as committed.
  2. Trace the Raft replication path - Understand how AppendEntries, log matching, and follower rollback/repair maintain a coherent log.
  3. Reason about commit safely - Describe why the leader tracks replication progress and why commit semantics are tied to majority and term rules.

Core Concepts Explained

Concept 1: The Leader Replicates Entries, but Followers Must Prove Log Continuity Before Accepting Them

Concrete example / mini-scenario: The leader wants to append entry [(term 8, index 15, cmd=X)] to a follower. That follower may already have a different suffix from an earlier leader.

This is why Raft's replication RPC carries more than the new entry itself. The leader also sends:

That previous-entry check is the key to safe repair.

If the follower agrees that:

then the leader and follower share a common prefix, and the new entries can be appended after it.

If the follower disagrees, it rejects the append and the leader backs up, trying an earlier prefix until it finds the point where their logs match.

ASCII intuition:

leader:   [1][2][3][4][5][6]
follower: [1][2][3][X][Y]

find matching prefix first
then replace divergent suffix

This gives us the log matching property intuition:

That property is not free. It is maintained by this constant "prove the prefix, then append" discipline.

So Raft replication is really:

1. identify shared prefix
2. repair divergence if needed
3. append new entries after the known-good prefix

That is much more than blind shipping.

Concept 2: Replicated Does Not Mean Committed

Concrete example / mini-scenario: The leader writes a new entry to itself and one follower in a five-node cluster. Three other followers have not acknowledged it yet.

At this moment, the entry is clearly replicated somewhere. But it is not yet committed, because the protocol does not yet have enough evidence that this entry belongs to the history a future leader will be forced to preserve.

The simplest intuition is:

Raft leaders therefore track follower progress, commonly via ideas like matchIndex and nextIndex:

The leader uses this information to ask:

Only when the answer becomes strong enough does the leader advance the commit index.

ASCII sketch:

leader log:   40 41 42 43
matchIndex:
  leader   -> 43
  follower1-> 43
  follower2-> 42
  follower3-> 39
  follower4-> 42

majority has index 42
=> 42 may be eligible for commit

This is the heart of the lesson. The system can know that an entry exists before it knows that the cluster may now treat it as committed truth.

That distinction is what prevents temporary replication from becoming premature externally visible state.

Concept 3: Commit Semantics Protect the State Machine from Unstable History

Concrete example / mini-scenario: A leader from term 8 replicates an entry widely but then fails. A leader from term 9 is elected. Which entries can now safely be treated as committed and applied?

This is where Raft's commit rules become subtle and important.

Leaders do not just wave entries into permanence because those entries appear in many places. They need to respect the conditions that guarantee a future leader cannot safely replace the history beneath them.

The practical mental model is:

apply to state machine only after commit
commit only after the protocol has enough evidence that this log prefix is now authoritative

This is why Raft separates:

Those are different pieces of state:

That ordering is what keeps externally visible behavior aligned with consensus safety.

It also explains why client semantics are subtle. If a leader accepts a command but crashes before commitment, the command may need retry or may appear not to have taken effect even though it existed temporarily in some logs.

So the deeper point is:

That is what "commit semantics" actually means.

Troubleshooting

Issue: "If an entry is on multiple nodes, why isn't it already committed?"

Why it happens / is confusing: Physical distribution feels like proof of permanence.

Clarification / Fix: Because leadership may still change and some replicated entries may not yet be forced into the future authoritative history. Commitment is a protocol guarantee, not just a storage count.

Issue: "Why do followers sometimes delete entries they already had?"

Why it happens / is confusing: Deleting a replicated entry can look like data loss.

Clarification / Fix: Those entries were part of a divergent, not-yet-authoritative suffix. Followers repair their logs to match the leader's confirmed prefix so the cluster can recover one consistent history.

Issue: "Why not apply entries as soon as the leader writes them locally?"

Why it happens / is confusing: It seems faster and simpler.

Clarification / Fix: Because the leader alone is not enough to define committed truth. Applying too early would expose state that could later be rolled back by leadership change.

Advanced Connections

Connection 1: Raft Log Replication <-> Strong Leadership

The parallel: The previous lesson made the leader explicit. This lesson shows the operational consequence: the leader becomes the single normal writer of new entries and the coordinator of follower repair and commit advancement.

Real-world case: Many real Raft incidents are actually replication-health incidents: lagging followers, stuck commit index, or repeated prefix mismatch and catch-up.

Connection 2: Raft Log Replication <-> Membership Changes

The parallel: Once we understand what commit means, we can safely talk about changing the voting set itself. Membership change is hard precisely because the meaning of "majority" is changing while the log continues to advance.

Real-world case: Joint consensus can be seen as a careful extension of commit semantics to a period where two configurations must overlap safely.

Resources

Optional Deepening Resources

Key Insights

  1. Replication and commitment are different protocol states - An entry can exist on several nodes before it is safe to apply.
  2. Prefix matching is the core repair mechanism - Leaders do not blindly append; they prove shared history, then repair divergence and extend the log.
  3. Commit semantics protect the state machine - Only committed entries become externally visible authoritative history.

Knowledge Check (Test Questions)

  1. Why does Raft include previous-index and previous-term information in append requests?

    • A) To prove that leader and follower share a common prefix before extending the log.
    • B) To compress the packet.
    • C) To avoid using quorums.
  2. What is the clearest difference between a replicated entry and a committed entry?

    • A) Replicated means it was written on disk somewhere; committed means the protocol now treats it as part of authoritative history.
    • B) There is no real difference.
    • C) Committed means every node has already applied it.
  3. Why should the state machine wait for commit before applying an entry?

    • A) Because leaders are always slow.
    • B) Because an uncommitted replicated suffix may still be replaced after leadership changes.
    • C) Because followers cannot store entries before commit.

Answers

1. A: The previous-entry metadata is how the leader verifies shared prefix and safely repairs divergent follower logs.

2. A: Replication is a storage/distribution fact; commitment is a consensus fact about authoritative history.

3. B: Applying too early would expose state derived from a suffix that the cluster may not yet be forced to keep.



← Back to Learning