Consensus Systems in Production: etcd, Consul, and ZooKeeper

LESSON

Consensus and Coordination

014 30 min intermediate

Consensus Systems in Production: etcd, Consul, and ZooKeeper

The core idea: etcd, Consul, and ZooKeeper turn expensive consensus into practical control-plane primitives, and the main trade-off is paying quorum cost for small authoritative metadata instead of treating them like general-purpose databases.

Core Insight

A platform team needs one place to record who owns a leader lease, which services are healthy, which configuration version is active, and which controllers should react to a change. That state is small compared with user data, but it is high value: if two nodes disagree about it, the platform can split brain.

This is the job of production coordination systems. They are not attractive because they are convenient key-value stores. They are attractive because they give engineers an API for small pieces of state that must be ordered, watched, leased, or tied to session membership under failure.

etcd, Consul, and ZooKeeper all sit in that family, but they do not invite the same design. etcd feels like a replicated control-plane key-value store with watches, leases, and compare-and-swap style transactions. Consul combines coordinated server-side state with service discovery, health checks, sessions, and operational datacenter workflows. ZooKeeper offers a hierarchical coordination namespace with sessions, watches, ephemeral nodes, and a long history in distributed infrastructure.

The mistake is to choose one because "it stores data." The better question is: what coordination shape does this platform need, and is the state important enough to pay consensus cost?

The Control-Plane Workload

Imagine a scheduler managing a fleet of workers. It needs to know which scheduler instance is leader, which workers are alive, which jobs are assigned, and which configuration revision is active. Those decisions affect many services, so the system needs one authoritative story.

That does not mean the coordination store should carry every request, metric, or business event. It should hold metadata whose disagreement is dangerous:

The workload is usually read-heavy, watch-heavy, and small-object oriented. Writes are meaningful and relatively scarce. If a team routes hot application traffic or large payloads through the same store, it turns a carefully protected control plane into a bottleneck.

What These Systems Package

All three systems provide a strongly coordinated core, but the surface primitives matter as much as the consensus protocol beneath them.

System Natural Mental Model Coordination Style
etcd Replicated control-plane KV Watches, leases, revisions, transactions
Consul Discovery and health-aware coordination Service catalog, health checks, KV, sessions
ZooKeeper Coordination tree Znodes, watches, sessions, ephemeral nodes

etcd is strongly associated with Kubernetes-style control loops. Controllers watch a keyspace, observe revisions, compare current state with desired state, and write updates through a Raft-backed API. This fits systems that need a compact, strongly consistent metadata store for controllers.

Consul is often a natural fit when service discovery and health are central to the workload. Its coordination story includes KV and sessions, but its operational value often comes from combining service registration, health checks, DNS/API discovery, and datacenter-aware workflows.

ZooKeeper uses a tree of znodes and session-oriented primitives. Ephemeral nodes disappear when a session ends, making them useful for membership, presence, and leader-election patterns. Watches let clients react to changes, though watch semantics must be understood carefully rather than treated as a magical event stream.

Primitives Shape System Design

The coordination API changes how applications express ownership and change.

controller / service / client
          |
          v
 [coordinated metadata system]
   |          |          |
 watches    leases     sessions
 revisions  health     ephemeral nodes

A lease says, "this ownership claim is valid only while renewal succeeds." That is useful for leader election and lock-like behavior, but it is not a timeless global mutex. A paused process can believe it still owns something after another node has legitimately taken over, so downstream systems may still need fencing tokens or revision checks.

A watch says, "tell me when this coordinated state changes." That is useful for controllers and service discovery, but watches are not a substitute for durable event processing. Clients must handle reconnects, missed ranges, compaction, and resync from current state.

A compare-and-swap transaction says, "write this only if the state still matches what I observed." That is powerful for safe updates, but it assumes the state being guarded is small and worth coordinating.

Worked Example: A Control Plane Choice

Suppose a platform needs three things:

If the central design is a Kubernetes-like control plane, etcd is the natural mental model: controllers read revisions, watch changes, and write small metadata updates through a replicated key-value API.

If the central design is a service catalog with health checks, discovery, and operational integration across services, Consul may fit better. The service registry and health model are part of the product shape, not add-ons.

If the platform already relies on tree-structured coordination, ephemeral membership nodes, and session semantics, ZooKeeper may be the clearer fit, especially in ecosystems that already speak its patterns.

The choice is not a universal ranking. It is a fit between primitives and workload.

The Production Trade-Off

Consensus-backed coordination buys one authoritative answer for critical metadata. The price is real:

That trade-off is worth paying for a leader lease, cluster membership, or control-plane configuration that must not split brain. It is usually not worth paying for user profiles, telemetry, large documents, queue payloads, or high-volume business events.

The rule of thumb is blunt: if the data is large, high-churn, user-facing, or merely convenient to store, it probably does not belong in the consensus store. If disagreement about it can break the control plane, it may.

Choosing Among Them

Need Likely Fit Reason
Kubernetes-style controller metadata etcd Revisioned KV, watches, leases, and Raft-backed control-plane state
Integrated service discovery and health catalog Consul Service registration, health checks, discovery APIs, KV, and sessions
Session and ephemeral-node coordination tree ZooKeeper Znodes, watches, sessions, ephemeral presence, and classic coordination recipes

This table is not a replacement for operational evaluation. It is a way to start from the coordination shape rather than from branding. The real decision should also include operator familiarity, ecosystem integration, client library behavior, backup/restore procedures, and failure-mode testing.

Common Misreadings

These systems are not general-purpose application databases. They expose storage APIs, but they are optimized for small, critical metadata that benefits from strong coordination.

A distributed lock is not automatically safe. Most lock-like APIs are lease or session based, so long pauses, timeouts, and delayed clients still require fencing, ownership checks, or idempotent downstream operations.

Watches are not durable queues. They are coordination notifications that need resync logic, especially after reconnects, compaction, or long client pauses.

Connections

The previous lesson on exactly-once, idempotency, and deduplication matters here because control-plane primitives often create the identities and ownership boundaries that make retries safe. A lease or compare-and-swap can coordinate a step, but the side effect still needs a safe boundary.

The next lesson on Jepsen-style verification follows naturally. Systems that hold leader elections, leases, watches, and critical metadata need evidence that their observable behavior still satisfies the contract under partitions, pauses, failover, and client retries.

Resources

Key Takeaways

PREVIOUS Exactly-Once Semantics, Idempotency, and Deduplication NEXT Jepsen-Style Verification and Failure Injection