CRDTs for Gossip Convergence

LESSON

Gossip, Membership, and Epidemic Systems

011 30 min intermediate

CRDTs for Gossip Convergence

The core idea: CRDTs move convergence into the data type itself, trading metadata and modeling constraints for merge rules that tolerate gossip reordering, duplication, and concurrent updates.

Core Insight

Suppose replicas R1 and R2 both accept updates to a shared counter while a network partition is active. R1 increments the counter once. R2 also increments it once. When the partition heals, anti-entropy can find that the replicas differ, and gossip can move the missing state around. Neither mechanism decides what the correct combined value should be.

If the system uses a naive last-write-wins rule, one increment may disappear. That is not a transport failure. Both updates existed. The loss happened because the merge rule did not preserve the meaning of concurrent work.

CRDTs, conflict-free replicated data types, attack that problem at the data-structure level. They are designed so replicas can accept local updates, exchange state or operations asynchronously, and still converge when the same information is delivered in different orders or more than once.

The trade-off is not free convergence. CRDTs make some replicated behaviors easy by restricting the shape of the data and carrying enough metadata to merge safely. They work best when the application's meaning can be expressed as a monotonic, merge-safe structure.

Delivery Is Not Convergence

Gossip answers one question:

How do updates move between replicas?

Anti-entropy with Merkle trees answers another:

Where do replicas differ?

CRDTs answer a third:

When different states meet, what merge rule makes the final result converge?

Those questions are easy to blur. If every update is eventually delivered, it can feel like replicas should naturally converge. But delivery only ensures that replicas eventually see similar information. The system still needs a rule for combining that information without depending on arrival order.

For example:

R1 sees: counter increment by R1
R2 sees: counter increment by R2

later:
R1 receives R2's state
R2 receives R1's state

If merge means "keep the value with the latest timestamp," one increment can be lost. If merge means "combine independent replica contributions," both increments can survive.

That is the CRDT move: design the state so merge preserves the intended information.

Merge Algebra

For state-based CRDTs, convergence usually depends on merge behaving like a join operation. Three properties matter in practical terms:

Those properties are exactly what gossip needs. Gossip may deliver A before B, B before A, or the same state multiple times. A merge function with those properties can absorb that mess and still converge.

A grow-only counter is the smallest useful example. Instead of storing one integer, it stores one component per replica:

class GCounter:
    def __init__(self, counts=None):
        self.counts = counts or {}

    def increment(self, replica_id, n=1):
        self.counts[replica_id] = self.counts.get(replica_id, 0) + n

    def merge(self, other):
        merged = {}
        for replica_id in set(self.counts) | set(other.counts):
            merged[replica_id] = max(
                self.counts.get(replica_id, 0),
                other.counts.get(replica_id, 0),
            )
        return GCounter(merged)

    def value(self):
        return sum(self.counts.values())

If R1 has {R1: 1} and R2 has {R2: 1}, merge produces {R1: 1, R2: 1}. The user-visible value is 2.

The key detail is that merge uses pointwise maximum, not addition of whole observed states. That makes duplicate delivery safe:

merge({R1: 1}, {R1: 1}) -> {R1: 1}
not {R1: 2}

The metadata is doing real work. It remembers enough per-replica structure to avoid losing concurrent increments or double-counting duplicate delivery.

Worked Example

Imagine three replicas gossip a grow-only counter.

R1: {R1: 1, R2: 0, R3: 0}
R2: {R1: 0, R2: 1, R3: 0}
R3: {R1: 0, R2: 0, R3: 0}

R1 and R2 both incremented during a partition. When gossip resumes, R3 may receive R2 first:

R3 merge R2 -> {R1: 0, R2: 1, R3: 0}

Then it receives R1:

R3 merge R1 -> {R1: 1, R2: 1, R3: 0}

If the messages arrive in the opposite order, the final state is the same:

R3 merge R1 -> {R1: 1, R2: 0, R3: 0}
R3 merge R2 -> {R1: 1, R2: 1, R3: 0}

If R3 receives R1 twice, the second merge has no further effect. That is idempotence in action.

This is why CRDTs fit gossip so naturally. The transport can be redundant and unordered because the data type is designed to absorb redundancy and reordering.

Data Model Constraints

CRDTs are powerful when the domain can tolerate merge-safe semantics:

They are much harder when the domain needs strict global invariants:

Those constraints often need coordination, escrow, leases, consensus, or application-specific conflict handling. A CRDT can sometimes help with part of the state, but it does not remove the invariant problem by itself.

The main trade-off is availability and convergence versus invariant strength. CRDTs let replicas accept updates locally and merge later, but they can only guarantee semantics that the data type actually encodes.

Common Failure Modes

Assuming delivery implies convergence

Gossip can move all updates and still lose meaning if the merge rule is wrong. Convergence is a property of both dissemination and merge semantics.

Using last-write-wins for semantic conflicts

Last-write-wins is simple, but it discards concurrent intent. That may be acceptable for some fields and disastrous for counters, carts, or collaborative edits.

Ignoring metadata cost

CRDTs often carry per-replica, causal, or tombstone metadata. That metadata is the price of safe merge and must be bounded or compacted in production systems.

Applying CRDTs where invariants dominate

If the main requirement is a strict global constraint, a CRDT alone may be the wrong tool. Coordination may be necessary for the part of the system that enforces the invariant.

Connections

Merkle trees localize divergence. CRDTs define what safe convergence means once divergent state is exchanged.

Vector clocks, next, expose causal relationships. They help systems distinguish "this version includes that history" from "these versions are concurrent," which is often useful alongside CRDTs and conflict handling.

Membership and gossip still matter. CRDTs do not move information by themselves; they rely on dissemination and anti-entropy loops to keep replicas exchanging state.

Resources

Key Takeaways

  1. Gossip moves updates, but CRDT merge semantics decide whether replicas converge without losing meaning.
  2. Associative, commutative, and idempotent merge behavior lets replicas tolerate reordering and duplicate delivery.
  3. CRDTs often pay for safe convergence with metadata and data-model constraints.
  4. They are best for domains whose concurrent updates can be represented by merge-safe semantics; strict invariants may still require coordination.
PREVIOUS Anti-entropy with Merkle Trees NEXT Vector Clocks - Causal Ordering in Gossip