Final Capstone: Consensus-Backed Control Plane Architecture

LESSON

Consensus and Coordination

024 45 min intermediate FINAL CAPSTONE

Final Capstone: Consensus-Backed Control Plane Architecture

The core idea: A consensus-backed control plane is ready when it can defend exactly which state needs one authoritative history, with a trade-off between strong coordination guarantees and the latency, operational discipline, recovery complexity, and failure testing those guarantees require.

Core Insight

Imagine you are designing the control plane for a regional compute platform. It stores desired deployments, elects controllers, publishes service endpoints, recovers from member failure, and exposes watches to agents across the fleet. Some state needs one authoritative story. Some state is important but should stay out of the consensus path.

The design task is to defend that boundary. Consensus is valuable when disagreement about authority would make the platform unsafe. It is harmful when teams push high-volume, low-authority data through it because "important" was confused with "must be serialized."

The misconception is that a consensus-backed design is correct once it names Raft, Paxos, etcd, or ZooKeeper. The actual design must specify the committed state, the API semantics, the read and lease guarantees, the recovery path, and the evidence that proves the system survives failure.

Strong coordination buys clarity of authority, but it spends latency, operational discipline, and a tighter failure envelope. A good architecture makes that exchange visible enough to review.

Scenario and Constraints

The platform needs these capabilities:

The system has three zones. The first target is a three-member consensus cluster, one member per zone. Writes must commit through quorum. Controllers use watches from known revisions. Lease grants include fencing tokens that downstream resources check.

That architecture is intentionally narrow. It uses consensus for control decisions, not for every event the platform observes.

Authority Boundary

Start with the boundary:

inside consensus:
  desired deployment specs
  scheduler lease ownership
  rollout phase and gates
  service endpoint authority
  membership metadata
  monotonically increasing revisions

outside consensus:
  metrics
  logs
  traces
  large artifacts
  image layers
  derived caches
  high-volume status samples

The inside set is small because it answers authority questions. Who owns the scheduler shard? Which deployment spec is current? Which rollout gate is open? Which member set can decide history?

The outside set can be large, useful, and operationally critical without needing consensus serialization. Logs, metrics, and artifacts need durability and retrieval, but they usually do not need one global order that every controller must agree on before acting.

The review question is:

Would two different answers make the platform unsafe?

If yes, the state may belong in consensus. If no, a cheaper replicated store, event pipeline, object store, or cache is probably the better fit.

API Walkthrough

The control plane exposes a small API surface:

put-if-revision(key, expected_revision, value)
grant-lease(role, ttl) -> token
watch(prefix, from_revision)
snapshot-status()
member-status()

Desired state changes use conditional writes so operators and automation do not overwrite stale state silently. Scheduler leadership uses leases with fencing tokens. Watchers resume from revisions and rebuild local caches after disconnect. Snapshot, compaction, and membership status are first-class because recovery time matters.

The API should force clients to handle stale assumptions. A failed put-if-revision means the client must re-read and recompute. A lost lease means the controller must stop acting until it obtains a fresh token. A watch that falls behind compaction must return a clear resync signal instead of silently skipping history.

Worked Flow: Deployment Update

operator submits desired deployment v12
client reads current revision 481
client writes put-if-revision(/deploy/app, 481, v12)
consensus commits revision 482
watchers receive revision 482
active scheduler reconciles desired state to actual resources
downstream writes include scheduler fencing token

This flow ties each action to evidence. The desired state update is guarded by a revision. Watchers know which committed revision they are processing. The scheduler acts only while it has current lease authority. Downstream resources can reject stale scheduler writes.

The same flow also shows what should not be in consensus. Per-pod logs, trace spans, image blobs, and high-frequency health samples may be related to the deployment, but they do not define the authoritative desired state. They should not share the critical write path.

Failure Review

A credible design names the failures it expects and the evidence that preserves safety.

Partition between zones:

Slow disk on the leader:

Watcher falls behind compaction:

Controller pauses after receiving a lease:

Permanent quorum loss:

The design is not trying to hide these trade-offs. It is trying to make them reviewable.

Invariants and Tests

The architecture is not ready until its guarantees are testable.

Core invariants include:

Failure tests should exercise partitions, process pauses, slow disks, compaction gaps, client retries, leader changes, member replacement, snapshot restore, and forced recovery drills. Jepsen-style history checking is useful because it tests the claim the architecture makes, not just the happy path implementation.

Crash-Fault or Byzantine?

For this regional compute platform, crash-fault consensus is probably the right default if all consensus members live inside one administrative trust boundary. The dominant risks are slow disks, partitions, bad placement, stale controllers, operator mistakes, and recovery ambiguity.

Byzantine consensus becomes relevant if the control plane spans organizations, untrusted operators, public validators, or adversarial infrastructure. Then the design must add stable identities, key management, authenticated votes, quorum certificates, and a stronger threat model.

The capstone decision is not "which protocol is more advanced?" It is "which fault model matches the trust boundary we actually have?"

Readiness Check

Before this control plane is ready, the team should be able to answer:

  1. Which state is authoritative and why?
  2. Which operations require linearizable reads?
  3. Which leases need fencing tokens?
  4. What metrics warn that consensus latency is leaving the safe envelope?
  5. How does a watcher resume after disconnect or compaction?
  6. What is the exact recovery path after quorum loss?
  7. Which invariants will be tested under partitions, pauses, retries, and slow disks?
  8. What state is deliberately outside consensus, and what stores it instead?
  9. What is the trust boundary: crash-fault only or Byzantine?

If any answer is vague, the design is not finished.

Resources

Key Takeaways

PREVIOUS Byzantine Consensus and Quorum Certificates