Day 214: Raft Log Replication and Commit Semantics
In Raft, replicating an entry and committing an entry are not the same event. The whole safety story depends on that gap being handled carefully.
Today's "Aha!" Moment
Once Raft makes the leader explicit, the next instinct is simple: the leader receives a command, sends it to followers, and once enough followers have it, the command is done.
That intuition is close, but dangerously incomplete.
The subtlety is that "an entry exists on several machines" does not automatically mean "this entry is now part of the durable committed history the cluster may safely apply." There is a gap between:
- replicated: some set of nodes has the entry
- committed: the protocol can now treat the entry as part of the authoritative log
That gap is the aha for this lesson.
Raft needs it because leadership can change. A leader may replicate an entry to some followers and then disappear before the cluster has enough evidence that this entry is safe to treat as committed. A later leader must not accidentally build a different history that makes an earlier unsafe assumption visible to users.
So Raft log replication is not just "copy bytes to followers." It is:
- append entries under the current leader
- check log consistency on followers
- track replication progress carefully
- only advance commit when the protocol's safety conditions are satisfied
That is what makes Raft a consensus protocol rather than just leader-led shipping of log records.
Why This Matters
Suppose a leader appends entry 42 and successfully sends it to one follower, but crashes before reaching a majority. The entry exists in multiple places. Is it safe for any node to apply it to the state machine?
If the answer were "yes, because replication happened," the cluster could expose state that later disappears when a new leader overwrites or bypasses that entry. That would break the whole illusion of a single authoritative history.
This is why Raft's commit semantics matter so much:
- they separate "seen by some replicas" from "chosen by the protocol"
- they define when it is safe to apply a command to the state machine
- they explain a huge amount of real operational behavior, including lag, catch-up, and failover outcomes
In production, many confusing bugs or misunderstandings live exactly here:
- assuming append success means client-visible durability
- assuming follower logs must always match perfectly in real time
- assuming a leader can commit any old replicated entry simply because it appears on several nodes
If the learner gets commit semantics right, later topics like membership change, snapshots, and production debugging become much easier.
Learning Objectives
By the end of this session, you will be able to:
- Differentiate replication from commitment - Explain why an entry can be present on followers before it is safe to treat as committed.
- Trace the Raft replication path - Understand how
AppendEntries, log matching, and follower rollback/repair maintain a coherent log. - Reason about commit safely - Describe why the leader tracks replication progress and why commit semantics are tied to majority and term rules.
Core Concepts Explained
Concept 1: The Leader Replicates Entries, but Followers Must Prove Log Continuity Before Accepting Them
Concrete example / mini-scenario: The leader wants to append entry [(term 8, index 15, cmd=X)] to a follower. That follower may already have a different suffix from an earlier leader.
This is why Raft's replication RPC carries more than the new entry itself. The leader also sends:
- the index immediately before the new entries
- the term at that previous index
That previous-entry check is the key to safe repair.
If the follower agrees that:
- "yes, I do have that previous index and term"
then the leader and follower share a common prefix, and the new entries can be appended after it.
If the follower disagrees, it rejects the append and the leader backs up, trying an earlier prefix until it finds the point where their logs match.
ASCII intuition:
leader: [1][2][3][4][5][6]
follower: [1][2][3][X][Y]
find matching prefix first
then replace divergent suffix
This gives us the log matching property intuition:
- if two logs contain an entry with the same index and term, they share the same prefix up to that point
That property is not free. It is maintained by this constant "prove the prefix, then append" discipline.
So Raft replication is really:
1. identify shared prefix
2. repair divergence if needed
3. append new entries after the known-good prefix
That is much more than blind shipping.
Concept 2: Replicated Does Not Mean Committed
Concrete example / mini-scenario: The leader writes a new entry to itself and one follower in a five-node cluster. Three other followers have not acknowledged it yet.
At this moment, the entry is clearly replicated somewhere. But it is not yet committed, because the protocol does not yet have enough evidence that this entry belongs to the history a future leader will be forced to preserve.
The simplest intuition is:
- replication is about placement
- commit is about protocol authority
Raft leaders therefore track follower progress, commonly via ideas like matchIndex and nextIndex:
nextIndex: where the leader should next try to append on that followermatchIndex: the highest log index that follower is known to store
The leader uses this information to ask:
- do a majority of replicas now store this entry?
Only when the answer becomes strong enough does the leader advance the commit index.
ASCII sketch:
leader log: 40 41 42 43
matchIndex:
leader -> 43
follower1-> 43
follower2-> 42
follower3-> 39
follower4-> 42
majority has index 42
=> 42 may be eligible for commit
This is the heart of the lesson. The system can know that an entry exists before it knows that the cluster may now treat it as committed truth.
That distinction is what prevents temporary replication from becoming premature externally visible state.
Concept 3: Commit Semantics Protect the State Machine from Unstable History
Concrete example / mini-scenario: A leader from term 8 replicates an entry widely but then fails. A leader from term 9 is elected. Which entries can now safely be treated as committed and applied?
This is where Raft's commit rules become subtle and important.
Leaders do not just wave entries into permanence because those entries appear in many places. They need to respect the conditions that guarantee a future leader cannot safely replace the history beneath them.
The practical mental model is:
apply to state machine only after commit
commit only after the protocol has enough evidence that this log prefix is now authoritative
This is why Raft separates:
- last log index stored
- commit index
- last applied
Those are different pieces of state:
- an entry may be present in the log
- later become committed
- only then be applied to the deterministic state machine
That ordering is what keeps externally visible behavior aligned with consensus safety.
It also explains why client semantics are subtle. If a leader accepts a command but crashes before commitment, the command may need retry or may appear not to have taken effect even though it existed temporarily in some logs.
So the deeper point is:
- Raft is not only replicating logs
- it is managing when a log prefix becomes safe enough to expose as real system state
That is what "commit semantics" actually means.
Troubleshooting
Issue: "If an entry is on multiple nodes, why isn't it already committed?"
Why it happens / is confusing: Physical distribution feels like proof of permanence.
Clarification / Fix: Because leadership may still change and some replicated entries may not yet be forced into the future authoritative history. Commitment is a protocol guarantee, not just a storage count.
Issue: "Why do followers sometimes delete entries they already had?"
Why it happens / is confusing: Deleting a replicated entry can look like data loss.
Clarification / Fix: Those entries were part of a divergent, not-yet-authoritative suffix. Followers repair their logs to match the leader's confirmed prefix so the cluster can recover one consistent history.
Issue: "Why not apply entries as soon as the leader writes them locally?"
Why it happens / is confusing: It seems faster and simpler.
Clarification / Fix: Because the leader alone is not enough to define committed truth. Applying too early would expose state that could later be rolled back by leadership change.
Advanced Connections
Connection 1: Raft Log Replication <-> Strong Leadership
The parallel: The previous lesson made the leader explicit. This lesson shows the operational consequence: the leader becomes the single normal writer of new entries and the coordinator of follower repair and commit advancement.
Real-world case: Many real Raft incidents are actually replication-health incidents: lagging followers, stuck commit index, or repeated prefix mismatch and catch-up.
Connection 2: Raft Log Replication <-> Membership Changes
The parallel: Once we understand what commit means, we can safely talk about changing the voting set itself. Membership change is hard precisely because the meaning of "majority" is changing while the log continues to advance.
Real-world case: Joint consensus can be seen as a careful extension of commit semantics to a period where two configurations must overlap safely.
Resources
Optional Deepening Resources
- [PAPER] In Search of an Understandable Consensus Algorithm (Raft)
- Link: https://raft.github.io/raft.pdf
- Focus: Re-read the log replication and safety sections now that you are explicitly watching the distinction between append, commit, and apply.
- [DOC] The Raft Consensus Algorithm
- Link: https://raft.github.io/
- Focus: Useful for diagrams and supplementary references that make the replication flow more concrete.
- [ARTICLE] The Secret Lives of Data: Raft
- Link: https://thesecretlivesofdata.com/raft/
- Focus: Helpful visualization for seeing leader append, follower catch-up, and commit progression.
Key Insights
- Replication and commitment are different protocol states - An entry can exist on several nodes before it is safe to apply.
- Prefix matching is the core repair mechanism - Leaders do not blindly append; they prove shared history, then repair divergence and extend the log.
- Commit semantics protect the state machine - Only committed entries become externally visible authoritative history.
Knowledge Check (Test Questions)
-
Why does Raft include previous-index and previous-term information in append requests?
- A) To prove that leader and follower share a common prefix before extending the log.
- B) To compress the packet.
- C) To avoid using quorums.
-
What is the clearest difference between a replicated entry and a committed entry?
- A) Replicated means it was written on disk somewhere; committed means the protocol now treats it as part of authoritative history.
- B) There is no real difference.
- C) Committed means every node has already applied it.
-
Why should the state machine wait for commit before applying an entry?
- A) Because leaders are always slow.
- B) Because an uncommitted replicated suffix may still be replaced after leadership changes.
- C) Because followers cannot store entries before commit.
Answers
1. A: The previous-entry metadata is how the leader verifies shared prefix and safely repairs divergent follower logs.
2. A: Replication is a storage/distribution fact; commitment is a consensus fact about authoritative history.
3. B: Applying too early would expose state derived from a suffix that the cluster may not yet be forced to keep.