CRDTs and Coordination Avoidance: Coordination Boundaries and Convergence Goals

LESSON

001 30 min intermediate

CRDTs and Coordination Avoidance: Coordination Boundaries and Convergence Goals

Core Insight

CartService lets a customer add items from a phone on a train, remove one from a laptop at home, and keep browsing while one region is temporarily disconnected. The product team wants the cart to stay responsive everywhere. The database team wants replicas to converge without asking a global leader to approve every edit. Those goals are compatible for some facts and dangerous for others.

The core idea behind CRDTs is not "conflicts do not matter." The idea is that some state can be designed so concurrent updates have a deterministic merge result. If every replica applies compatible updates and later exchanges enough information, they can converge on the same value without coordinating each write at the moment it happens.

The boundary is the hard part. Adding an item to a wishlist is often safe to merge. Selling the last unit of inventory to exactly one buyer is not the same problem.

In this track, an invariant is a promise that must stay true. "A cart eventually contains all added items" is a different promise from "inventory never goes below zero." Coordination avoidance works when the promise can tolerate concurrent local decisions and still converge to a valid result. When the promise requires one globally exclusive answer, the system needs coordination somewhere.

The main trade-off is latency and availability versus invariant strength. Avoiding coordination gives faster local writes and better partition tolerance, but only for operations whose conflicts can be merged without violating the business rule.

What Coordination Avoidance Means

Coordination is the waiting step before a write is accepted. A replica asks some other authority, such as a leader, quorum, lock service, or transaction system, "is this operation allowed to commit now?"

A quorum write, a leader round trip, a distributed lock, and a serializable transaction can all be coordination mechanisms.

Coordination avoidance means moving some decisions out of that synchronous agreement path. A local replica can accept an operation, store enough causal or merge metadata, and later reconcile with other replicas.

coordinated write:
  client -> replica A -> leader/quorum -> commit -> response

coordination-avoiding write:
  client -> replica A -> local durable update -> response
                      later: exchange state or operations with B and C

Avoiding coordination does not remove distributed reasoning. It changes the question.

Instead of asking "who decides first?", the system asks "if two replicas make valid local decisions at the same time, can those decisions be merged later without breaking the promise?"

For CartService, the answer depends on the fact being modeled:

adding a note to a collaborative document can often be merged
incrementing a like counter can usually be merged
adding an item to a wishlist can usually be merged
reserving the last seat on a flight usually needs coordination or preallocated rights
enforcing one username per account usually needs coordination or a partitioned allocation scheme

This is why CRDT design starts with the application promise, not with a data structure catalog.

Convergence Goals

A CRDT should make a clear convergence promise:

If replicas receive the same set of updates, possibly in different orders,
and communication eventually resumes, they converge to equivalent state.

That promise has several parts.

First, replicas may see updates in different orders. The merge logic cannot depend on one universal arrival order unless that order is itself coordinated.

Second, replicas may be temporarily incomplete. A replica can answer from local state, but its answer may lag concurrent work elsewhere.

Third, convergence is not the same as correctness. Two replicas can converge to a value that violates the application invariant if the data type was poorly chosen. A cart that converges to "item count = -1" is convergent but wrong.

Fourth, convergence needs a delivery assumption. If an update is permanently lost and no replica retains the state needed to reconstruct it, the CRDT cannot magically recover it. CRDTs reduce coordination; they do not remove durability, anti-entropy, or operational obligations.

The Coordination Boundary

Use a coordination boundary to separate facts that can be merged from facts that require agreement.

can be local and mergeable:
  user draft text
  cart additions
  reaction counters
  "seen" markers
  cached preference updates

needs stronger control:
  unique username
  exactly one order number assignment
  non-negative inventory
  mutually exclusive calendar slot
  one winning payment capture

The boundary can move. Sometimes a supposedly coordinated invariant can be redesigned. For inventory, each region might receive a bounded allocation of rights, so local decrements are safe until that local budget is exhausted. That design avoids coordination on most purchases but uses coordination to refill or rebalance rights.

The lesson is not that coordination is bad. The lesson is that coordination is expensive enough to deserve a precise reason. CRDTs are useful when they let the system keep the fast path local while preserving the promises the user actually cares about.

Worked Example

Suppose CartService stores a cart as a set of item identifiers.

Replica A sees:

add item: book
add item: charger

Replica B, temporarily offline from A, sees:

add item: headphones

When the replicas reconnect, a mergeable cart can combine those additions:

{book, charger} merge {headphones}
= {book, charger, headphones}

No replica needed to ask permission before accepting an add. The concurrent work is compatible.

Now change the promise:

The user can apply only one coupon to the cart.

Replica A applies coupon SAVE10. Replica B applies coupon FREESHIP. A naive merge that keeps both coupons converges, but violates the rule. A naive merge that picks one coupon by timestamp may surprise the user and can depend on clock behavior. The application needs a policy:

coordinate before accepting a coupon
allow multiple coupons and change the business rule
model coupon choice as a deterministic register with clear conflict semantics
allocate coupon authority to a home region or owner device

That decision is the coordination boundary in action. The data structure cannot be chosen independently from the promise.

Design Questions

Before choosing a CRDT, ask:

What operation should remain available during a partition?
What exact value must replicas converge to?
Which concurrent operations are compatible?
Which invariants must hold at all times, not only after convergence?
What metadata is needed to distinguish duplicate, stale, or concurrent updates?
What state can grow forever if compaction is not designed?
What user-visible behavior is acceptable before convergence finishes?

These questions prevent a common failure: using a CRDT to merge bytes while leaving the product semantics undefined.

Common Failure Modes

Treating convergence as correctness: Replicas agree, but agree on a state that violates a domain invariant.
Hiding policy in timestamps: Last-writer-wins resolves conflicts, but the winner may reflect clock skew rather than user intent.
Making every field mergeable: Some fields are independent, while others need atomicity or coordination with related fields.
Ignoring metadata cost: Causal context, tombstones, dots, or version vectors can grow unless the system has compaction rules.
Forgetting deletion semantics: Removing a value safely under concurrent adds is harder than it first appears.

Practice

Take a multi-region profile service with these fields:

display_name
bio
follower_count
blocked_users
primary_email
account_balance

Classify each field as:

mergeable without coordination
mergeable with a specific conflict policy
coordination required
possible with preallocated rights or ownership

For each answer, name the user promise that drives the classification. Do not classify by data type alone. A counter can be safe for likes and unsafe for money.

Connections

distributed-systems-foundations provides the failure and partition vocabulary behind the coordination boundary.
consistency-and-replication explains why replica-visible order and user-visible guarantees are different surfaces.
002.md turns this boundary into the algebraic shape CRDTs rely on: monotonic state, joins, and semilattices.

Resources

[PAPER] A comprehensive study of Convergent and Commutative Replicated Data Types
- Focus: Use it for the formal convergence model and the distinction between state-based and operation-based CRDTs.
[PAPER] Coordination Avoidance in Database Systems
- Focus: Read for the connection between invariants, coordination, and application correctness.
[PAPER] The Bloom language and the CALM principle
- Focus: Use the CALM framing to understand why monotonic programs can avoid coordination.

Key Takeaways

CRDTs are useful when concurrent local updates can be merged into a deterministic result that preserves the application promise.
Coordination avoidance is a boundary decision: some facts can be local and mergeable, while others need agreement, ownership, or rights allocation.
Convergence is necessary but not sufficient; the merged value must also respect the domain invariant.
The first design step is to name the user-visible promise before choosing counters, sets, registers, or maps.

← Back to CRDTs and Coordination Avoidance

← Back to Distributed Systems

← Back to Learning Hub