State Machine Replication and Deterministic Apply
LESSON
State Machine Replication and Deterministic Apply
The core idea: State machine replication makes a cluster behave like one reliable service by applying the same committed command sequence deterministically, with a trade-off between strong service semantics and strict boundaries around nondeterminism, commits, and side effects.
Core Insight
Imagine three replicas storing cluster configuration. A client asks to add node n4. Consensus can decide that the command belongs at log index 57, but that decision is not the complete service. Each replica must apply command 57 to its local state machine and end with the same configuration.
That is the shift behind state machine replication. Consensus chooses an ordered command history. The state machine turns that history into current service state. If every replica starts from the same state and applies the same committed commands in the same order, the replicas can behave like one fault-tolerant machine.
The common mistake is to treat the replicated log as the whole service. The log is evidence and ordering. The state machine is where ordered decisions become meaning: a new member appears, a lease is granted, a key is updated, or a deployment generation changes.
The trade-off is precision for discipline. The model is wonderfully clear, but only if command application is deterministic, clients are acknowledged at the right boundary, and external side effects are kept out of naive replay.
From Chosen Commands to Shared State
A replicated state machine has a simple contract:
same initial state
+ same committed command sequence
+ deterministic apply
= same resulting state
For a metadata store, the command might be set /deployments/payments replicas=4. For a lock service, the command might be grant lease L to client C with fencing token 88. For a membership service, the command might be add node n4 as voter.
Consensus chooses where each command belongs in the log. The state machine applies committed commands in log order.
proposed entry -> replicated entry -> committed entry -> applied command
Each arrow is a stronger claim:
- proposed means some node heard the request
- replicated means some replicas stored it
- committed means the protocol's durability rule is satisfied
- applied means the state machine has advanced through that command
The state machine should advance from the committed prefix, not from every proposal a leader happens to see. Mixing those boundaries is how systems expose state that later disappears after leader change, acknowledge writes that were not actually durable, or let different replicas answer from different histories.
Worked Example: Adding a Cluster Member
Suppose the current configuration is:
members = [n1, n2, n3]
A client submits:
add_member(n4)
The leader appends the command at log index 57. At this moment, the request has a position, but the service should not yet assume the configuration changed. If the leader crashes before enough replicas accept the entry, a new leader may not preserve it.
After the commit rule is satisfied, each replica applies the command:
state before 57: members = [n1, n2, n3]
command 57: add_member(n4)
state after 57: members = [n1, n2, n3, n4]
Every correct replica applies the same command at index 57 and reaches the same state. A recovering replica can replay from a snapshot plus the committed tail and arrive at the same configuration.
That is the service-level promise. The cluster is not merely storing matching log files; it is using the log to make state transitions converge.
Determinism Is a Correctness Requirement
State machine replication depends on deterministic apply. If every replica receives the same committed commands but one replica reads local wall-clock time, another samples randomness, and a third calls an external service during apply, they may diverge even though consensus did its job.
The safe pattern is to put nondeterministic choices into the command before consensus decides it, or to make them explicit state that every replica applies identically.
Bad shape:
apply("create session") calls local clock on each replica
Better shape:
command = create_session(id=S, expires_at=T)
apply(command) stores the supplied T
The same rule applies to validation. If a command is valid only when it reads local unreplicated state, replicas may disagree. Validation should either happen before consensus in a way that is rechecked safely during apply, or depend only on replicated state and command contents.
Determinism does not mean every replica must execute at the same speed. A follower can lag, replay, or recover from snapshot. It means that once it applies the same committed prefix, the resulting state is the same.
Side Effects Belong Behind a Boundary
External side effects are the easiest way to break replay.
Bad shape:
apply("send invoice") sends an email
recovery replays the log
apply("send invoice") sends the email again
Better shape:
apply("invoice_ready", id=I) records durable state
worker observes invoice_ready(I)
worker sends email with idempotency key I
worker records completion
The replicated state machine records the decision. A separate actor performs external work using idempotency keys, fencing tokens, or completion records. That boundary lets recovery replay internal state without blindly repeating irreversible effects.
Failure Modes and Design Response
Three failure modes show up repeatedly.
First, replicas apply different results from the same command because apply reads nondeterministic local inputs. The response is to move time, randomness, IDs, and external observations into the command or into replicated state.
Second, clients see success before the command is durably committed. The response is to acknowledge only after the service's commit rule is satisfied and the client-visible result is stable.
Third, recovery replays commands that trigger external side effects again. The response is to make apply update internal state only, then let external actors use request IDs, revisions, fencing tokens, and idempotency keys.
State machine replication therefore gives a clean mental model, not a free implementation. It asks engineers to keep the ordering, apply, and side-effect boundaries explicit.
Operational Checklist
For a real replicated service, review these questions:
- Does every replica apply only the committed prefix?
- Is command application deterministic from replicated state and command contents?
- Are client acknowledgements tied to commit evidence rather than local receipt?
- Can a recovering replica rebuild from snapshot plus committed log tail?
- Are external side effects outside the apply path and protected against duplicate execution?
- Do reads state clearly whether they observe committed-applied state or some weaker local state?
If the answer to any of these is unclear, the service may have a consensus algorithm but not a safe replicated state machine.
Connections
The previous capstone drew the boundary around a consensus-backed control plane. This lesson explains what happens inside that boundary: the agreed log becomes service state only through deterministic application.
The next lesson on quorum intersection, ballots, and commit evidence explains why a command is safe to preserve across future leaders. State machine replication depends on that evidence because only committed history should drive the service state.
Resources
- [PAPER] In Search of an Understandable Consensus Algorithm
- Focus: Read the replicated log and state machine framing, especially how committed entries are applied.
- [PAPER] Viewstamped Replication Revisited
- Focus: Compare another replicated state machine design and its recovery model.
- [PAPER] Paxos Made Simple
- Focus: Connect chosen values with the need to preserve a consistent sequence.
- [BOOK] Designing Data-Intensive Applications
- Focus: Use the replication and consensus chapters for the service-level mental model.
Key Takeaways
- Consensus chooses an ordered command history; the state machine turns that history into service state.
- Replicas converge only if they apply the same committed commands deterministically.
- External side effects should sit behind explicit idempotent boundaries, not inside naive replay.