CRDTs and Coordination Avoidance: State-Based CRDTs and Merge Semantics

LESSON

CRDTs and Coordination Avoidance

003 30 min intermediate

CRDTs and Coordination Avoidance: State-Based CRDTs and Merge Semantics

Core Insight

CartService now has a mergeable internal state shape: replicas can order states by information and combine them with a join. The next engineering question is how replicas actually use that shape while links are slow, messages repeat, and one region may be offline for minutes.

A state-based CRDT answers with a simple contract: each replica stores an internal state payload, local updates move that payload forward, and replicas occasionally send their payloads to each other.

When a replica receives another payload, it merges the two with the join operation from the semilattice. The formal name for this style is convergent replicated data type, often abbreviated as CvRDT.

The non-obvious insight is that state-based replication deliberately makes message delivery boring. A replica can receive the same state twice, receive an older state after a newer one, or learn a fact through a third replica instead of directly from the origin.

If merge is idempotent, commutative, associative, and inflationary, those delivery accidents do not change the final meaning. The system can merge again and keep moving toward convergence.

The trade-off is robustness versus payload cost. Sending state is forgiving because the receiver does not need an exactly-once operation log. It can also be expensive because the state may include replica components, tombstones, version vectors, or other metadata that grows with the workload.

The State-Based Contract

A state-based CRDT has four pieces:

payload:
  the internal state stored at each replica

update:
  a local function that changes the payload monotonically

merge:
  a join function that combines two payloads

query:
  a read function that derives the user-visible value from the payload

For a grow-only set:

payload:
  set of elements

update add(x):
  payload = payload union {x}

merge(other):
  payload = payload union other.payload

query:
  return payload

That tiny design already has the important behavior. Local adds are fast. Merge can run whenever replicas communicate. Duplicate state delivery is harmless because union is idempotent.

Anti-Entropy Flow

State-based CRDTs often replicate through anti-entropy. Anti-entropy is background repair: replicas periodically exchange what they know until differences disappear.

time 1:
  A = {book}
  B = {charger}
  C = {}

time 2:
  A sends state to C
  C = merge({}, {book}) = {book}

time 3:
  B sends state to C
  C = merge({book}, {charger}) = {book, charger}

time 4:
  C sends state to A
  A = merge({book}, {book, charger}) = {book, charger}

The system does not require a single broadcast order. It only needs enough successful exchanges over time. A replica can learn a fact directly from the origin or indirectly from another replica that already merged it.

This is why state-based CRDTs fit unreliable networks well. If a message is dropped, another round can send the state again. If a message is duplicated, merge absorbs it. If messages cross in flight, the join rules keep the result stable.

Worked Example: G-Counter

A grow-only counter cannot safely store one integer. Two replicas incrementing from zero would both reach 1, and a naive max merge would lose one increment.

A state-based G-Counter stores one component per replica.

payload at A:
  {A: 0, B: 0, C: 0}

local increment at A:
  {A: 1, B: 0, C: 0}

local increment at B:
  {A: 0, B: 1, C: 0}

merge by componentwise max:
  {A: max(1,0), B: max(0,1), C: max(0,0)}
  = {A: 1, B: 1, C: 0}

query:
  sum components = 2

The payload is not the value users see. The payload is the merge-safe evidence. The query turns that evidence into the visible count.

This is why the internal shape looks more complicated than a single integer. The extra structure is what prevents one increment from hiding another.

Componentwise max is the join. It is commutative, associative, and idempotent:

merge(A_payload, B_payload) = merge(B_payload, A_payload)
merge(merge(A, B), C) = merge(A, merge(B, C))
merge(A, A) = A

Those laws are what let anti-entropy run in a messy distributed environment.

Merge Semantics

State-based merge should be:

The inflationary property is easy to overlook. A merge that sometimes discards local facts can still be deterministic, but it breaks convergence reasoning. If replicas can move backward, a delayed old payload can erase newer information.

For example, this is unsafe for a grow-only set:

merge(other):
  payload = other.payload

If A has {book, charger} and receives an old payload {book}, assignment loses charger. A correct merge uses union:

merge(other):
  payload = payload union other.payload

The receiver moves upward in the information order.

State Size and Delta Pressure

The simplest state-based design sends the whole payload. That can be acceptable for small sets, small counters, or low-frequency state. It becomes expensive when the payload carries large values or long-lived metadata.

Common sources of size pressure:

This pressure does not invalidate the model. It explains why later designs introduce delta-state synchronization, compaction, causal context management, and anti-entropy policies. The state-based contract stays the same, but the engineering around payload exchange gets more careful.

State-Based Versus Operation-Based

State-based CRDTs send mergeable payloads. Operation-based CRDTs send operations that must be delivered under stronger assumptions, often involving causal delivery or reliable dissemination.

state-based:
  send "here is my current payload"
  receiver joins payloads
  duplicate delivery is naturally safe

operation-based:
  send "apply this update operation"
  receiver applies operation
  delivery rules matter more

The state-based approach is often easier to make robust across unreliable networks. The operation-based approach can be more bandwidth-efficient because it sends only updates, but it has a sharper delivery contract. The next lesson focuses on that operation path.

Common Failure Modes

Practice

Design a state-based CRDT for a page view counter replicated across three regions.

Specify:

Then explain why sending only total_views = 10 is not enough internal state for safe merging.

Connections

Resources

Key Takeaways

PREVIOUS CRDTs and Coordination Avoidance: Semilattices, Joins, and Monotonic State NEXT CRDTs and Coordination Avoidance: Operation-Based CRDTs and Causal Delivery