CRDTs and Coordination Avoidance: Semilattices, Joins, and Monotonic State

LESSON

CRDTs and Coordination Avoidance

002 30 min intermediate

CRDTs and Coordination Avoidance: Semilattices, Joins, and Monotonic State

Core Insight

CartService can avoid coordination only if its replicas have a safe way to combine partial knowledge. Replica A might know that the cart contains {book, charger}. Replica B might know that the same cart contains {headphones}. If the merge rule is set union, neither replica has to win. Both can move to {book, charger, headphones}.

That small move hides the core algebra behind many CRDTs. The useful intuition is simple: each replica has partial knowledge, and merge should keep all the knowledge from both sides.

The formal language comes after that intuition. The state must have an order that means "knows at least as much as." The merge operation must produce a state that is at least as informative as both inputs. In CRDT language, that merge is often called a join, and the shape that supports it is a join-semilattice.

The name sounds heavier than the idea. A join-semilattice is a state space where any two states have a safe combined state above both of them.

The non-obvious point is that a CRDT merge is not just a conflict handler. It is a monotonic growth rule. A replica is allowed to learn more, but it should not move backward in the information order. If local updates and merges both move state forward, then duplicate messages, reordered messages, and delayed messages become much less dangerous.

The trade-off is expressiveness versus coordination freedom. Monotonic state is easy to merge and replay safely, but not every product rule is monotonic. "This item has ever been added" fits naturally. "This exact quantity is now five after arbitrary add and remove operations" needs more careful structure.

Ordering State by Information

A normal numeric order says 3 <= 5 because five is larger. A CRDT order usually means something different: one state is below another because it contains less information.

For a grow-only set:

{book} <= {book, charger}

The right side knows everything the left side knows, plus one more fact. For a maximum counter that tracks the highest sequence number seen:

seen_seq = 7 <= seen_seq = 11

The right side has observed at least as much progress.

This order does not have to compare every pair. Two replicas can hold different partial states:

replica A: {book}
replica B: {charger}

Neither is more informative than the other. The merge operation must create a state that includes both pieces of information:

join({book}, {charger}) = {book, charger}

That result is an upper bound: both inputs are below it in the information order.

What a Join Must Guarantee

A join operation should have three practical properties. These are not abstract decoration; each one corresponds to something that happens in a real network.

commutative:
  join(a, b) = join(b, a)

associative:
  join(join(a, b), c) = join(a, join(b, c))

idempotent:
  join(a, a) = a

These rules map directly to distributed failure behavior.

Commutativity handles message reordering. If A hears from B before C, or from C before B, the final merged state should not depend on that arrival order.

Associativity handles batching and topology. Replicas can merge pairwise, through anti-entropy gossip, or through regional aggregation without changing the meaning of the result.

Idempotence handles duplicate delivery. If the same state arrives twice, merging it twice should not double count it.

The join-semilattice gives the system a safe default response to messy replication: merge again.

Monotonic Local Updates

A state-based CRDT also needs local updates to move upward in the same order.

For a grow-only set:

add(book):
  state = state union {book}

The update never removes information. It creates a state greater than or equal to the old state.

For a maximum register that stores the highest observed value:

observe(11):
  state = max(state, 11)

Seeing 11 after 7 moves forward. Seeing 7 after 11 leaves state unchanged.

This monotonic behavior is what lets replicas accept local work and reconcile later. A replica can update from local input, another replica can update from different local input, and the join can preserve both contributions.

start:
  A = {}
  B = {}

local updates:
  A = {book}
  B = {charger}

merge:
  A = join({book}, {charger}) = {book, charger}
  B = join({charger}, {book}) = {book, charger}

No leader decided the order. The merge rule made order irrelevant for this data type.

Where The Model Breaks

The same structure does not automatically handle every update.

Suppose CartService tracks a visible item count as a plain integer:

A starts at 0 and applies add -> 1
B starts at 0 and applies add -> 1

If the merge rule is max, the result is 1, but two adds happened. If the merge rule is addition, duplicate delivery can count the same add more than once. The state is missing information about which replica contributed which increment.

A CRDT counter fixes this by changing the state shape. Instead of one integer, it can store one component per replica:

A state: {A: 1, B: 0}
B state: {A: 0, B: 1}

join by componentwise max:
  {A: max(1,0), B: max(0,1)}
  = {A: 1, B: 1}

read value:
  A + B = 2

The merge is still monotonic, commutative, associative, and idempotent. The trick is that the internal state contains enough information to make the read value correct.

This is a common CRDT pattern: design the state for safe merge, then derive the user-visible value from that state.

Read Values Versus Merge State

The merge state and the displayed value are not always the same thing. This is one of the most important CRDT habits to build.

internal counter state:
  {A: 3, B: 5, C: 2}

displayed value:
  10

The internal state carries causal or replica-specific information. The displayed value is a projection. This distinction matters because users care about the projection, but convergence depends on the internal state.

For a set, internal and displayed state may look similar. For counters, registers with conflict metadata, observed-remove sets, and nested maps, the internal state is usually richer than the visible value.

The operational cost is metadata. More internal structure can preserve more concurrent intent, but it consumes storage, bandwidth, and compaction effort. A design that is mathematically mergeable can still be too expensive if the metadata grows without a cleanup plan.

Design Checklist

When designing a CRDT state shape, ask:

If you cannot answer those questions, the type may still be a useful replicated data structure, but it is not yet a clear CRDT design.

Worked Example

Use a grow-only shopping wishlist.

state:
  set of item ids

order:
  subset

join:
  union

local add:
  state = state union {item_id}

read:
  sorted list of item ids

Check the properties:

union(a, b) = union(b, a)
union(union(a, b), c) = union(a, union(b, c))
union(a, a) = a

Now state can be sent many times, in any order, through any replica topology, and the final value still converges if every update eventually reaches the group.

The design is intentionally limited. It supports adds but not removes. That is not a bug in the model; it is the boundary of this simple semilattice. Later CRDTs add deletion by carrying more information, such as tombstones or observed-remove metadata.

Common Failure Modes

Practice

Design a monotonic state shape for a "highest badge level earned" field.

Answer these questions:

Then repeat the exercise for "current shipping address." Notice how much harder it is to define a safe join without a conflict policy or causal metadata.

Connections

Resources

Key Takeaways

PREVIOUS CRDTs and Coordination Avoidance: Coordination Boundaries and Convergence Goals NEXT CRDTs and Coordination Avoidance: State-Based CRDTs and Merge Semantics