CRDTs and Coordination Avoidance: Causal Context, Version Vectors, and Dots

LESSON

CRDTs and Coordination Avoidance

006 30 min intermediate

CRDTs and Coordination Avoidance: Causal Context, Version Vectors, and Dots

Core Insight

Imagine a multi-region note app where two people can edit the same project checklist while offline. Region A sees Ana add buy batteries. Region B, still disconnected, sees Bruno add book venue. Later, Ana removes buy batteries after seeing her own add but before seeing Bruno's update. When the replicas reconnect, the system must answer a small but crucial question: which updates did each action know about when it happened?

CRDTs need that question because "arrived later" is not the same as "happened later." A remove should usually affect the adds it observed, not unrelated concurrent adds. A register write may replace the values the writer read, while preserving concurrent values the writer could not have known about. A tombstone may be safe to compact only after every relevant replica has seen the removed dot.

Causal context is the metadata that summarizes what a replica, operation, delta, or value already knows.

Two small tools carry most of the lesson. A dot is a unique name for one update, commonly shaped like (replica_id, counter). A version vector is a compact summary of many known dots, commonly storing the highest contiguous counter known for each replica.

Together, dots and version vectors let the system distinguish prior, concurrent, duplicate, and obsolete facts without asking a central coordinator.

The non-obvious insight is that causal context is not a timestamp replacement. It is evidence about observation. It tells a merge function whether one fact includes another fact in its past.

The trade-off is precision versus metadata cost. More context gives cleaner conflict behavior and safer compaction, but every dot, vector entry, and retention rule becomes part of the replicated object's operational weight.

Naming Updates With Dots

A CRDT update needs a stable identity. Without one, a replica cannot reliably tell whether an incoming add is new, duplicated, removed, or already included in a larger state transfer.

A dot gives one update a name:

dot = (replica_id, local_counter)

examples:
  A:17
  B:4
  eu-west:982

The local counter increases at the actor that creates dots. The pair is unique as long as the actor identity is stable and the counter is not reused. The dot names a single causal event, not a wall-clock instant.

For an observed-remove set, an add might create one dot:

add("buy batteries") at replica A:
  create dot A:17
  store "buy batteries" -> {A:17}

A remove should not mean "delete every add with this text forever." It means "remove the adds I have observed for this element." If replica A has observed only A:17, the remove can name or cover that dot. If replica B concurrently creates B:4 for the same element, A's remove should not erase B's concurrent add unless the data type is intentionally remove-wins.

That is why dots are so useful. They let the data type talk about individual effects:

add dot:
  "buy batteries" has add A:17

remove context:
  remove the observed add A:17

concurrent add:
  B:4 is not covered by the remove context

This is not only for sets. A register write can have a dot. A map field update can have a dot. A delta can include dots. A tombstone can remember dots until they are safe to compact.

Causal Histories And Version Vectors

The simplest causal context is a set of all dots a replica knows:

causal history at replica C:
  {A:1, A:2, A:3, B:1, B:2, C:1}

Set inclusion gives the causal relation. In plain terms: if one context contains all the dots from another context, it knows at least that much history.

If update x has context {A:1, A:2} and update y has context {A:1, A:2, B:1}, then y knows at least everything x knew. If neither context includes the other, the updates may be concurrent.

Keeping every dot forever is usually too expensive. Version vectors compact the common case where each actor produces a contiguous sequence of dots:

full dot set:
  {A:1, A:2, A:3, B:1, B:2, C:1}

version vector:
  A -> 3
  B -> 2
  C -> 1

The vector means "I know all dots from A up to 3, all dots from B up to 2, and all dots from C up to 1." Merging two vectors is a pointwise maximum:

V1 = {A:3, B:1}
V2 = {A:2, B:4, C:1}

join(V1, V2) = {A:3, B:4, C:1}

Comparison is also direct:

V1 <= V2 when every entry in V1 is <= the matching entry in V2

If V1 <= V2, then V2 includes the causal past represented by V1. If neither vector is less than or equal to the other, the represented histories are concurrent.

This is why a version vector is not a total order. It does not force unrelated updates into a fake sequence. It preserves the fact that two writes can be concurrent. For CRDTs, that is a feature, because the merge policy needs to see concurrency instead of hiding it behind whichever message arrived last.

Dotted Version Vectors

Plain version vectors summarize what is known, but a new value often needs to separate "my causal past" from "the new event I just created." A dotted version vector does that by carrying both:

context:
  {A:9, B:4}

new dot:
  A:10

dotted version vector:
  context {A:9, B:4} + dot A:10

The context says what the write had already observed. The dot says which new event this write contributes. That distinction matters for conflict handling.

Suppose a replicated profile field stores multiple concurrent values until application logic resolves them:

initial context at A and B:
  {A:9, B:4}

Ana writes "Paris" at A:
  value = "Paris"
  context = {A:9, B:4}
  dot = A:10

Bruno writes "Lisbon" at B concurrently:
  value = "Lisbon"
  context = {A:9, B:4}
  dot = B:5

When the replicas merge, neither write's context includes the other's dot. The system should treat the values as concurrent siblings:

"Paris" with dot A:10
"Lisbon" with dot B:5

Now a client reads both values and chooses "Lisbon":

resolution write at A:
  context = {A:10, B:5}
  dot = A:11
  value = "Lisbon"

The new write's context includes both prior dots, so it can supersede both siblings. The merge did not need wall-clock time or a central sequencer. It needed evidence that the resolving write had observed the earlier versions.

Worked Example: Observed-Remove Checklist

Return to the shared checklist. The object stores add dots per item and a causal context that records removed or already-known dots.

state:
  entries:
    item -> add dots still visible

  context:
    version vector of dots known by this replica

Replica A adds an item:

A context before:
  {A:16, B:3}

add("buy batteries"):
  dot = A:17
  entries["buy batteries"].add(A:17)
  context = {A:17, B:3}

Replica B concurrently adds the same visible text:

B context before:
  {A:16, B:3}

add("buy batteries"):
  dot = B:4
  entries["buy batteries"].add(B:4)
  context = {A:16, B:4}

Replica A removes the item after seeing only A:17:

remove("buy batteries"):
  observed = {A:17}
  entries["buy batteries"].remove(A:17)
  context covers A:17

When A and B merge:

A's remove covers:
  A:17

B's concurrent add:
  B:4

visible result:
  "buy batteries" remains because B:4 was not observed by the remove

That behavior is add-wins for concurrent add/remove. A remove-wins set would store and interpret context differently, but it still needs the same basic question: did this remove observe the add it claims to remove?

This example also shows why causal context is not optional decoration. Without dots, the system might collapse both adds into one item string and accidentally erase B's concurrent work. Without a context summary, the system might retain tombstones forever because it cannot prove what is safe to forget.

Compaction And Metadata Trade-offs

Causal metadata gives CRDTs precision, but it is not free.

The most common trade-off is precision versus size. A full set of dots is exact but can grow with the number of updates. A version vector is compact but assumes each actor's history can be summarized as contiguous counters. A dotted version vector preserves the identity of a new event while compacting the past, but it still needs stable actors and careful pruning.

Actor choice matters. If every client device gets its own vector entry, metadata can grow with the number of clients. If servers assign dots on behalf of clients, vectors may stay smaller, but the system must route writes through those actors and preserve their counters durably.

Pruning is also subtle. Removing old vector entries or tombstones too early can make a replica forget that it already saw a removed update. Forgetting can create false conflicts, duplicate effects, or resurrection of deleted data. Keeping everything forever avoids those errors but turns metadata into the dominant storage cost.

The design question is therefore practical:

Which actors create dots?
How long do dots and tombstones remain visible?
What evidence proves a dot is causally stable?
When can a compact vector replace explicit dot sets?
What fallback repairs a replica with old or pruned context?

These are operational choices, not just data structure details.

Failure Modes

Practice

Design causal metadata for a replicated "favorite color" field.

Specify:

Then decide whether the field should keep concurrent siblings, choose last-writer-wins, or use a domain-specific merge rule. The metadata can identify concurrency, but the application still chooses what concurrency means.

Connections

Resources

Key Takeaways

PREVIOUS CRDTs and Coordination Avoidance: Delta-State Synchronization and Anti-Entropy NEXT CRDTs and Coordination Avoidance: Counters, Registers, and Last-Writer-Wins Trade-Offs