CRDTs and Coordination Avoidance: Semilattices, Joins, and Monotonic State
LESSON
CRDTs and Coordination Avoidance: Semilattices, Joins, and Monotonic State
Core Insight
CartService can avoid coordination only if its replicas have a safe way to combine partial knowledge. Replica A might know that the cart contains {book, charger}. Replica B might know that the same cart contains {headphones}. If the merge rule is set union, neither replica has to win. Both can move to {book, charger, headphones}.
That small move hides the core algebra behind many CRDTs. The useful intuition is simple: each replica has partial knowledge, and merge should keep all the knowledge from both sides.
The formal language comes after that intuition. The state must have an order that means "knows at least as much as." The merge operation must produce a state that is at least as informative as both inputs. In CRDT language, that merge is often called a join, and the shape that supports it is a join-semilattice.
The name sounds heavier than the idea. A join-semilattice is a state space where any two states have a safe combined state above both of them.
The non-obvious point is that a CRDT merge is not just a conflict handler. It is a monotonic growth rule. A replica is allowed to learn more, but it should not move backward in the information order. If local updates and merges both move state forward, then duplicate messages, reordered messages, and delayed messages become much less dangerous.
The trade-off is expressiveness versus coordination freedom. Monotonic state is easy to merge and replay safely, but not every product rule is monotonic. "This item has ever been added" fits naturally. "This exact quantity is now five after arbitrary add and remove operations" needs more careful structure.
Ordering State by Information
A normal numeric order says 3 <= 5 because five is larger. A CRDT order usually means something different: one state is below another because it contains less information.
For a grow-only set:
{book} <= {book, charger}
The right side knows everything the left side knows, plus one more fact. For a maximum counter that tracks the highest sequence number seen:
seen_seq = 7 <= seen_seq = 11
The right side has observed at least as much progress.
This order does not have to compare every pair. Two replicas can hold different partial states:
replica A: {book}
replica B: {charger}
Neither is more informative than the other. The merge operation must create a state that includes both pieces of information:
join({book}, {charger}) = {book, charger}
That result is an upper bound: both inputs are below it in the information order.
What a Join Must Guarantee
A join operation should have three practical properties. These are not abstract decoration; each one corresponds to something that happens in a real network.
commutative:
join(a, b) = join(b, a)
associative:
join(join(a, b), c) = join(a, join(b, c))
idempotent:
join(a, a) = a
These rules map directly to distributed failure behavior.
Commutativity handles message reordering. If A hears from B before C, or from C before B, the final merged state should not depend on that arrival order.
Associativity handles batching and topology. Replicas can merge pairwise, through anti-entropy gossip, or through regional aggregation without changing the meaning of the result.
Idempotence handles duplicate delivery. If the same state arrives twice, merging it twice should not double count it.
The join-semilattice gives the system a safe default response to messy replication: merge again.
Monotonic Local Updates
A state-based CRDT also needs local updates to move upward in the same order.
For a grow-only set:
add(book):
state = state union {book}
The update never removes information. It creates a state greater than or equal to the old state.
For a maximum register that stores the highest observed value:
observe(11):
state = max(state, 11)
Seeing 11 after 7 moves forward. Seeing 7 after 11 leaves state unchanged.
This monotonic behavior is what lets replicas accept local work and reconcile later. A replica can update from local input, another replica can update from different local input, and the join can preserve both contributions.
start:
A = {}
B = {}
local updates:
A = {book}
B = {charger}
merge:
A = join({book}, {charger}) = {book, charger}
B = join({charger}, {book}) = {book, charger}
No leader decided the order. The merge rule made order irrelevant for this data type.
Where The Model Breaks
The same structure does not automatically handle every update.
Suppose CartService tracks a visible item count as a plain integer:
A starts at 0 and applies add -> 1
B starts at 0 and applies add -> 1
If the merge rule is max, the result is 1, but two adds happened. If the merge rule is addition, duplicate delivery can count the same add more than once. The state is missing information about which replica contributed which increment.
A CRDT counter fixes this by changing the state shape. Instead of one integer, it can store one component per replica:
A state: {A: 1, B: 0}
B state: {A: 0, B: 1}
join by componentwise max:
{A: max(1,0), B: max(0,1)}
= {A: 1, B: 1}
read value:
A + B = 2
The merge is still monotonic, commutative, associative, and idempotent. The trick is that the internal state contains enough information to make the read value correct.
This is a common CRDT pattern: design the state for safe merge, then derive the user-visible value from that state.
Read Values Versus Merge State
The merge state and the displayed value are not always the same thing. This is one of the most important CRDT habits to build.
internal counter state:
{A: 3, B: 5, C: 2}
displayed value:
10
The internal state carries causal or replica-specific information. The displayed value is a projection. This distinction matters because users care about the projection, but convergence depends on the internal state.
For a set, internal and displayed state may look similar. For counters, registers with conflict metadata, observed-remove sets, and nested maps, the internal state is usually richer than the visible value.
The operational cost is metadata. More internal structure can preserve more concurrent intent, but it consumes storage, bandwidth, and compaction effort. A design that is mathematically mergeable can still be too expensive if the metadata grows without a cleanup plan.
Design Checklist
When designing a CRDT state shape, ask:
- What is the information order?
- What does "state A knows at least as much as state B" mean?
- What is the join operation?
- Is the join commutative, associative, and idempotent?
- Do local updates only move state upward?
- What user-visible value is derived from the merge state?
- What metadata grows, and when can it be compacted?
- Which domain invariants are preserved by this monotonic growth?
If you cannot answer those questions, the type may still be a useful replicated data structure, but it is not yet a clear CRDT design.
Worked Example
Use a grow-only shopping wishlist.
state:
set of item ids
order:
subset
join:
union
local add:
state = state union {item_id}
read:
sorted list of item ids
Check the properties:
union(a, b) = union(b, a)
union(union(a, b), c) = union(a, union(b, c))
union(a, a) = a
Now state can be sent many times, in any order, through any replica topology, and the final value still converges if every update eventually reaches the group.
The design is intentionally limited. It supports adds but not removes. That is not a bug in the model; it is the boundary of this simple semilattice. Later CRDTs add deletion by carrying more information, such as tombstones or observed-remove metadata.
Common Failure Modes
- Ordering by the wrong thing: Wall-clock time can produce a total order, but it may not represent information or user intent.
- Using a non-idempotent merge: Summing repeated states can double count duplicate delivery.
- Hiding missing metadata: A simple integer cannot distinguish two increments from one duplicated increment.
- Confusing internal state with the read value: The compact value shown to users may not contain enough information to merge safely.
- Ignoring compaction: Version vectors, dots, and tombstones need lifecycle rules once replicas churn.
Practice
Design a monotonic state shape for a "highest badge level earned" field.
Answer these questions:
- What is the information order?
- What is the join operation?
- Is duplicate delivery harmless?
- What value does the user see?
- Which product rule would break this simple design?
Then repeat the exercise for "current shipping address." Notice how much harder it is to define a safe join without a conflict policy or causal metadata.
Connections
001.mdexplains why the application promise decides whether coordination can be avoided.003.mdapplies this algebra to state-based CRDTs, where whole replica states are exchanged and joined.010.mdreturns to monotonicity from the perspective of invariant confluence and the CALM principle.
Resources
- [PAPER] A comprehensive study of Convergent and Commutative Replicated Data Types
- Focus: Read the definitions of convergent replicated data types and the role of semilattices.
- [PAPER] Eventually Consistent
- Focus: Use it for the broader context around eventual consistency and convergence under replication.
- [PAPER] The Bloom language and the CALM principle
- Focus: Connect monotonic reasoning in data types to monotonic reasoning in distributed programs.
Key Takeaways
- A join-semilattice gives replicas a merge operation that tolerates reordering, batching, and duplicate delivery.
- CRDT state is ordered by information: local updates and merges should move state forward, not backward.
- The internal merge state may need more metadata than the user-visible value.
- The main trade-off is merge freedom versus metadata and expressiveness; richer semantics usually require richer state.