CRDTs and Coordination Avoidance: Causal Context, Version Vectors, and Dots

LESSON

006 30 min intermediate

CRDTs and Coordination Avoidance: Causal Context, Version Vectors, and Dots

Core Insight

Imagine a multi-region note app where two people can edit the same project checklist while offline. Region A sees Ana add buy batteries. Region B, still disconnected, sees Bruno add book venue. Later, Ana removes buy batteries after seeing her own add but before seeing Bruno's update. When the replicas reconnect, the system must answer a small but crucial question: which updates did each action know about when it happened?

CRDTs need that question because "arrived later" is not the same as "happened later." A remove should usually affect the adds it observed, not unrelated concurrent adds. A register write may replace the values the writer read, while preserving concurrent values the writer could not have known about. A tombstone may be safe to compact only after every relevant replica has seen the removed dot.

Causal context is the metadata that summarizes what a replica, operation, delta, or value already knows.

Two small tools carry most of the lesson. A dot is a unique name for one update, commonly shaped like (replica_id, counter). A version vector is a compact summary of many known dots, commonly storing the highest contiguous counter known for each replica.

Together, dots and version vectors let the system distinguish prior, concurrent, duplicate, and obsolete facts without asking a central coordinator.

The non-obvious insight is that causal context is not a timestamp replacement. It is evidence about observation. It tells a merge function whether one fact includes another fact in its past.

The trade-off is precision versus metadata cost. More context gives cleaner conflict behavior and safer compaction, but every dot, vector entry, and retention rule becomes part of the replicated object's operational weight.

Naming Updates With Dots

A CRDT update needs a stable identity. Without one, a replica cannot reliably tell whether an incoming add is new, duplicated, removed, or already included in a larger state transfer.

A dot gives one update a name:

dot = (replica_id, local_counter)

examples:
  A:17
  B:4
  eu-west:982

The local counter increases at the actor that creates dots. The pair is unique as long as the actor identity is stable and the counter is not reused. The dot names a single causal event, not a wall-clock instant.

For an observed-remove set, an add might create one dot:

add("buy batteries") at replica A:
  create dot A:17
  store "buy batteries" -> {A:17}

A remove should not mean "delete every add with this text forever." It means "remove the adds I have observed for this element." If replica A has observed only A:17, the remove can name or cover that dot. If replica B concurrently creates B:4 for the same element, A's remove should not erase B's concurrent add unless the data type is intentionally remove-wins.

That is why dots are so useful. They let the data type talk about individual effects:

add dot:
  "buy batteries" has add A:17

remove context:
  remove the observed add A:17

concurrent add:
  B:4 is not covered by the remove context

This is not only for sets. A register write can have a dot. A map field update can have a dot. A delta can include dots. A tombstone can remember dots until they are safe to compact.

Causal Histories And Version Vectors

The simplest causal context is a set of all dots a replica knows:

causal history at replica C:
  {A:1, A:2, A:3, B:1, B:2, C:1}

Set inclusion gives the causal relation. In plain terms: if one context contains all the dots from another context, it knows at least that much history.

If update x has context {A:1, A:2} and update y has context {A:1, A:2, B:1}, then y knows at least everything x knew. If neither context includes the other, the updates may be concurrent.

Keeping every dot forever is usually too expensive. Version vectors compact the common case where each actor produces a contiguous sequence of dots:

full dot set:
  {A:1, A:2, A:3, B:1, B:2, C:1}

version vector:
  A -> 3
  B -> 2
  C -> 1

The vector means "I know all dots from A up to 3, all dots from B up to 2, and all dots from C up to 1." Merging two vectors is a pointwise maximum:

V1 = {A:3, B:1}
V2 = {A:2, B:4, C:1}

join(V1, V2) = {A:3, B:4, C:1}

Comparison is also direct:

V1 <= V2 when every entry in V1 is <= the matching entry in V2

If V1 <= V2, then V2 includes the causal past represented by V1. If neither vector is less than or equal to the other, the represented histories are concurrent.

This is why a version vector is not a total order. It does not force unrelated updates into a fake sequence. It preserves the fact that two writes can be concurrent. For CRDTs, that is a feature, because the merge policy needs to see concurrency instead of hiding it behind whichever message arrived last.

Dotted Version Vectors

Plain version vectors summarize what is known, but a new value often needs to separate "my causal past" from "the new event I just created." A dotted version vector does that by carrying both:

context:
  {A:9, B:4}

new dot:
  A:10

dotted version vector:
  context {A:9, B:4} + dot A:10

The context says what the write had already observed. The dot says which new event this write contributes. That distinction matters for conflict handling.

Suppose a replicated profile field stores multiple concurrent values until application logic resolves them:

initial context at A and B:
  {A:9, B:4}

Ana writes "Paris" at A:
  value = "Paris"
  context = {A:9, B:4}
  dot = A:10

Bruno writes "Lisbon" at B concurrently:
  value = "Lisbon"
  context = {A:9, B:4}
  dot = B:5

When the replicas merge, neither write's context includes the other's dot. The system should treat the values as concurrent siblings:

"Paris" with dot A:10
"Lisbon" with dot B:5

Now a client reads both values and chooses "Lisbon":

resolution write at A:
  context = {A:10, B:5}
  dot = A:11
  value = "Lisbon"

The new write's context includes both prior dots, so it can supersede both siblings. The merge did not need wall-clock time or a central sequencer. It needed evidence that the resolving write had observed the earlier versions.

Worked Example: Observed-Remove Checklist

Return to the shared checklist. The object stores add dots per item and a causal context that records removed or already-known dots.

state:
  entries:
    item -> add dots still visible

  context:
    version vector of dots known by this replica

Replica A adds an item:

A context before:
  {A:16, B:3}

add("buy batteries"):
  dot = A:17
  entries["buy batteries"].add(A:17)
  context = {A:17, B:3}

Replica B concurrently adds the same visible text:

B context before:
  {A:16, B:3}

add("buy batteries"):
  dot = B:4
  entries["buy batteries"].add(B:4)
  context = {A:16, B:4}

Replica A removes the item after seeing only A:17:

remove("buy batteries"):
  observed = {A:17}
  entries["buy batteries"].remove(A:17)
  context covers A:17

When A and B merge:

A's remove covers:
  A:17

B's concurrent add:
  B:4

visible result:
  "buy batteries" remains because B:4 was not observed by the remove

That behavior is add-wins for concurrent add/remove. A remove-wins set would store and interpret context differently, but it still needs the same basic question: did this remove observe the add it claims to remove?

This example also shows why causal context is not optional decoration. Without dots, the system might collapse both adds into one item string and accidentally erase B's concurrent work. Without a context summary, the system might retain tombstones forever because it cannot prove what is safe to forget.

Compaction And Metadata Trade-offs

Causal metadata gives CRDTs precision, but it is not free.

The most common trade-off is precision versus size. A full set of dots is exact but can grow with the number of updates. A version vector is compact but assumes each actor's history can be summarized as contiguous counters. A dotted version vector preserves the identity of a new event while compacting the past, but it still needs stable actors and careful pruning.

Actor choice matters. If every client device gets its own vector entry, metadata can grow with the number of clients. If servers assign dots on behalf of clients, vectors may stay smaller, but the system must route writes through those actors and preserve their counters durably.

Pruning is also subtle. Removing old vector entries or tombstones too early can make a replica forget that it already saw a removed update. Forgetting can create false conflicts, duplicate effects, or resurrection of deleted data. Keeping everything forever avoids those errors but turns metadata into the dominant storage cost.

The design question is therefore practical:

Which actors create dots?
How long do dots and tombstones remain visible?
What evidence proves a dot is causally stable?
When can a compact vector replace explicit dot sets?
What fallback repairs a replica with old or pruned context?

These are operational choices, not just data structure details.

Failure Modes

Using wall-clock time as causal evidence: A later timestamp does not prove that a write observed an earlier write.
Reusing actor counters: If a restarted replica reuses A:17, duplicate detection and remove semantics break.
Collapsing concurrent writes too early: A register or map may discard a value that no writer actually observed and replaced.
Pruning tombstones without stability evidence: Removed dots can reappear when an old add arrives through anti-entropy.
Letting client identities explode metadata: One vector entry per transient client can make small CRDT values expensive.
Treating causal context as global truth: Context says what a replica or operation has observed, not what every replica in the system has already applied.

Practice

Design causal metadata for a replicated "favorite color" field.

Specify:

which actors create dots
what context is attached to a write
how two concurrent writes are represented
how a later resolving write proves it has observed both prior writes
when old sibling dots can be compacted
what happens if a replica receives a write whose dot is older than its current context

Then decide whether the field should keep concurrent siblings, choose last-writer-wins, or use a domain-specific merge rule. The metadata can identify concurrency, but the application still chooses what concurrency means.

Connections

004.md used dots and causal delivery to explain why remove operations need observation evidence.
005.md showed that delta-state repair may carry dots, tombstones, and causal summaries inside joinable fragments.
007.md applies this vocabulary to counters, registers, and last-writer-wins designs, where the conflict policy becomes visible.

Resources

[ARTICLE] Why Logical Clocks are Easy
- Focus: Use causal histories as the simple model behind vector clocks, version vectors, and dots.
[PAPER] Dotted Version Vectors: Logical Clocks for Optimistic Replication
- Focus: Study how dotted version vectors preserve precise causality while avoiding some metadata scaling problems.
[PAPER] A comprehensive study of Convergent and Commutative Replicated Data Types
- Focus: Connect causal metadata back to CRDT merge behavior and operation delivery assumptions.
[PAPER] Delta State Replicated Data Types
- Focus: Look for how causal CRDT kernels use compact context and dots inside delta-state designs.

Key Takeaways

A dot names one update; causal context summarizes which updates were already observed.
Version vectors compact contiguous causal histories by keeping the highest known counter per actor.
Dotted version vectors separate a value's causal past from the new event that value contributes.
The main trade-off is precision versus metadata cost: richer context improves merge and compaction decisions, but it must be retained, pruned, and repaired carefully.

← Back to CRDTs and Coordination Avoidance

← Back to Distributed Systems

← Back to Learning Hub