CRDTs for Gossip Convergence
LESSON
CRDTs for Gossip Convergence
The core idea: CRDTs move convergence into the data type itself, trading metadata and modeling constraints for merge rules that tolerate gossip reordering, duplication, and concurrent updates.
Core Insight
Suppose replicas R1 and R2 both accept updates to a shared counter while a network partition is active. R1 increments the counter once. R2 also increments it once. When the partition heals, anti-entropy can find that the replicas differ, and gossip can move the missing state around. Neither mechanism decides what the correct combined value should be.
If the system uses a naive last-write-wins rule, one increment may disappear. That is not a transport failure. Both updates existed. The loss happened because the merge rule did not preserve the meaning of concurrent work.
CRDTs, conflict-free replicated data types, attack that problem at the data-structure level. They are designed so replicas can accept local updates, exchange state or operations asynchronously, and still converge when the same information is delivered in different orders or more than once.
The trade-off is not free convergence. CRDTs make some replicated behaviors easy by restricting the shape of the data and carrying enough metadata to merge safely. They work best when the application's meaning can be expressed as a monotonic, merge-safe structure.
Delivery Is Not Convergence
Gossip answers one question:
How do updates move between replicas?
Anti-entropy with Merkle trees answers another:
Where do replicas differ?
CRDTs answer a third:
When different states meet, what merge rule makes the final result converge?
Those questions are easy to blur. If every update is eventually delivered, it can feel like replicas should naturally converge. But delivery only ensures that replicas eventually see similar information. The system still needs a rule for combining that information without depending on arrival order.
For example:
R1 sees: counter increment by R1
R2 sees: counter increment by R2
later:
R1 receives R2's state
R2 receives R1's state
If merge means "keep the value with the latest timestamp," one increment can be lost. If merge means "combine independent replica contributions," both increments can survive.
That is the CRDT move: design the state so merge preserves the intended information.
Merge Algebra
For state-based CRDTs, convergence usually depends on merge behaving like a join operation. Three properties matter in practical terms:
- commutative: merge order does not matter
- associative: grouping does not matter
- idempotent: seeing the same state twice does not keep changing the result
Those properties are exactly what gossip needs. Gossip may deliver A before B, B before A, or the same state multiple times. A merge function with those properties can absorb that mess and still converge.
A grow-only counter is the smallest useful example. Instead of storing one integer, it stores one component per replica:
class GCounter:
def __init__(self, counts=None):
self.counts = counts or {}
def increment(self, replica_id, n=1):
self.counts[replica_id] = self.counts.get(replica_id, 0) + n
def merge(self, other):
merged = {}
for replica_id in set(self.counts) | set(other.counts):
merged[replica_id] = max(
self.counts.get(replica_id, 0),
other.counts.get(replica_id, 0),
)
return GCounter(merged)
def value(self):
return sum(self.counts.values())
If R1 has {R1: 1} and R2 has {R2: 1}, merge produces {R1: 1, R2: 1}. The user-visible value is 2.
The key detail is that merge uses pointwise maximum, not addition of whole observed states. That makes duplicate delivery safe:
merge({R1: 1}, {R1: 1}) -> {R1: 1}
not {R1: 2}
The metadata is doing real work. It remembers enough per-replica structure to avoid losing concurrent increments or double-counting duplicate delivery.
Worked Example
Imagine three replicas gossip a grow-only counter.
R1: {R1: 1, R2: 0, R3: 0}
R2: {R1: 0, R2: 1, R3: 0}
R3: {R1: 0, R2: 0, R3: 0}
R1 and R2 both incremented during a partition. When gossip resumes, R3 may receive R2 first:
R3 merge R2 -> {R1: 0, R2: 1, R3: 0}
Then it receives R1:
R3 merge R1 -> {R1: 1, R2: 1, R3: 0}
If the messages arrive in the opposite order, the final state is the same:
R3 merge R1 -> {R1: 1, R2: 0, R3: 0}
R3 merge R2 -> {R1: 1, R2: 1, R3: 0}
If R3 receives R1 twice, the second merge has no further effect. That is idempotence in action.
This is why CRDTs fit gossip so naturally. The transport can be redundant and unordered because the data type is designed to absorb redundancy and reordering.
Data Model Constraints
CRDTs are powerful when the domain can tolerate merge-safe semantics:
- distributed counters
- sets with well-defined add/remove behavior
- presence maps
- collaborative text structures
- replicated metadata where concurrent additions are acceptable
They are much harder when the domain needs strict global invariants:
- "never sell more seats than exist"
- "account balance must never go below zero"
- "only one primary owner may exist"
- "two users cannot reserve the same unique name"
Those constraints often need coordination, escrow, leases, consensus, or application-specific conflict handling. A CRDT can sometimes help with part of the state, but it does not remove the invariant problem by itself.
The main trade-off is availability and convergence versus invariant strength. CRDTs let replicas accept updates locally and merge later, but they can only guarantee semantics that the data type actually encodes.
Common Failure Modes
Assuming delivery implies convergence
Gossip can move all updates and still lose meaning if the merge rule is wrong. Convergence is a property of both dissemination and merge semantics.
Using last-write-wins for semantic conflicts
Last-write-wins is simple, but it discards concurrent intent. That may be acceptable for some fields and disastrous for counters, carts, or collaborative edits.
Ignoring metadata cost
CRDTs often carry per-replica, causal, or tombstone metadata. That metadata is the price of safe merge and must be bounded or compacted in production systems.
Applying CRDTs where invariants dominate
If the main requirement is a strict global constraint, a CRDT alone may be the wrong tool. Coordination may be necessary for the part of the system that enforces the invariant.
Connections
Merkle trees localize divergence. CRDTs define what safe convergence means once divergent state is exchanged.
Vector clocks, next, expose causal relationships. They help systems distinguish "this version includes that history" from "these versions are concurrent," which is often useful alongside CRDTs and conflict handling.
Membership and gossip still matter. CRDTs do not move information by themselves; they rely on dissemination and anti-entropy loops to keep replicas exchanging state.
Resources
- [PAPER] A comprehensive study of Convergent and Commutative Replicated Data Types
- Focus: Foundational paper for CRDT definitions and convergence properties.
- [SITE] CRDTs Illustrated
- Focus: Visual intuition for common CRDT families and merge behavior.
- [PAPER] A Conflict-Free Replicated JSON Datatype
- Focus: Concrete example of applying CRDT ideas to nested application data.
Key Takeaways
- Gossip moves updates, but CRDT merge semantics decide whether replicas converge without losing meaning.
- Associative, commutative, and idempotent merge behavior lets replicas tolerate reordering and duplicate delivery.
- CRDTs often pay for safe convergence with metadata and data-model constraints.
- They are best for domains whose concurrent updates can be represented by merge-safe semantics; strict invariants may still require coordination.