CRDTs and Coordination Avoidance: Coordination Boundaries and Convergence Goals

LESSON

CRDTs and Coordination Avoidance

001 30 min intermediate

CRDTs and Coordination Avoidance: Coordination Boundaries and Convergence Goals

Core Insight

CartService lets a customer add items from a phone on a train, remove one from a laptop at home, and keep browsing while one region is temporarily disconnected. The product team wants the cart to stay responsive everywhere. The database team wants replicas to converge without asking a global leader to approve every edit. Those goals are compatible for some facts and dangerous for others.

The core idea behind CRDTs is not "conflicts do not matter." The idea is that some state can be designed so concurrent updates have a deterministic merge result. If every replica applies compatible updates and later exchanges enough information, they can converge on the same value without coordinating each write at the moment it happens.

The boundary is the hard part. Adding an item to a wishlist is often safe to merge. Selling the last unit of inventory to exactly one buyer is not the same problem.

In this track, an invariant is a promise that must stay true. "A cart eventually contains all added items" is a different promise from "inventory never goes below zero." Coordination avoidance works when the promise can tolerate concurrent local decisions and still converge to a valid result. When the promise requires one globally exclusive answer, the system needs coordination somewhere.

The main trade-off is latency and availability versus invariant strength. Avoiding coordination gives faster local writes and better partition tolerance, but only for operations whose conflicts can be merged without violating the business rule.

What Coordination Avoidance Means

Coordination is the waiting step before a write is accepted. A replica asks some other authority, such as a leader, quorum, lock service, or transaction system, "is this operation allowed to commit now?"

A quorum write, a leader round trip, a distributed lock, and a serializable transaction can all be coordination mechanisms.

Coordination avoidance means moving some decisions out of that synchronous agreement path. A local replica can accept an operation, store enough causal or merge metadata, and later reconcile with other replicas.

coordinated write:
  client -> replica A -> leader/quorum -> commit -> response

coordination-avoiding write:
  client -> replica A -> local durable update -> response
                      later: exchange state or operations with B and C

Avoiding coordination does not remove distributed reasoning. It changes the question.

Instead of asking "who decides first?", the system asks "if two replicas make valid local decisions at the same time, can those decisions be merged later without breaking the promise?"

For CartService, the answer depends on the fact being modeled:

This is why CRDT design starts with the application promise, not with a data structure catalog.

Convergence Goals

A CRDT should make a clear convergence promise:

If replicas receive the same set of updates, possibly in different orders,
and communication eventually resumes, they converge to equivalent state.

That promise has several parts.

First, replicas may see updates in different orders. The merge logic cannot depend on one universal arrival order unless that order is itself coordinated.

Second, replicas may be temporarily incomplete. A replica can answer from local state, but its answer may lag concurrent work elsewhere.

Third, convergence is not the same as correctness. Two replicas can converge to a value that violates the application invariant if the data type was poorly chosen. A cart that converges to "item count = -1" is convergent but wrong.

Fourth, convergence needs a delivery assumption. If an update is permanently lost and no replica retains the state needed to reconstruct it, the CRDT cannot magically recover it. CRDTs reduce coordination; they do not remove durability, anti-entropy, or operational obligations.

The Coordination Boundary

Use a coordination boundary to separate facts that can be merged from facts that require agreement.

can be local and mergeable:
  user draft text
  cart additions
  reaction counters
  "seen" markers
  cached preference updates

needs stronger control:
  unique username
  exactly one order number assignment
  non-negative inventory
  mutually exclusive calendar slot
  one winning payment capture

The boundary can move. Sometimes a supposedly coordinated invariant can be redesigned. For inventory, each region might receive a bounded allocation of rights, so local decrements are safe until that local budget is exhausted. That design avoids coordination on most purchases but uses coordination to refill or rebalance rights.

The lesson is not that coordination is bad. The lesson is that coordination is expensive enough to deserve a precise reason. CRDTs are useful when they let the system keep the fast path local while preserving the promises the user actually cares about.

Worked Example

Suppose CartService stores a cart as a set of item identifiers.

Replica A sees:

add item: book
add item: charger

Replica B, temporarily offline from A, sees:

add item: headphones

When the replicas reconnect, a mergeable cart can combine those additions:

{book, charger} merge {headphones}
= {book, charger, headphones}

No replica needed to ask permission before accepting an add. The concurrent work is compatible.

Now change the promise:

The user can apply only one coupon to the cart.

Replica A applies coupon SAVE10. Replica B applies coupon FREESHIP. A naive merge that keeps both coupons converges, but violates the rule. A naive merge that picks one coupon by timestamp may surprise the user and can depend on clock behavior. The application needs a policy:

That decision is the coordination boundary in action. The data structure cannot be chosen independently from the promise.

Design Questions

Before choosing a CRDT, ask:

These questions prevent a common failure: using a CRDT to merge bytes while leaving the product semantics undefined.

Common Failure Modes

Practice

Take a multi-region profile service with these fields:

display_name
bio
follower_count
blocked_users
primary_email
account_balance

Classify each field as:

For each answer, name the user promise that drives the classification. Do not classify by data type alone. A counter can be safe for likes and unsafe for money.

Connections

Resources

Key Takeaways

NEXT CRDTs and Coordination Avoidance: Semilattices, Joins, and Monotonic State