CRDTs and Coordination Avoidance: Collaborative Editing, Sequences, and Rich Text
LESSON
CRDTs and Coordination Avoidance: Collaborative Editing, Sequences, and Rich Text
Core Insight
Imagine two people editing the same paragraph while one laptop is offline. Ana places the cursor between c and a in cat and types h. Bruno, on another replica, places the cursor between a and t and types r. Both edits feel local and immediate.
If the document is represented as a plain array of characters, both edits may say "insert at index 1" or "insert at index 2" depending on what each user had already seen. After the replicas exchange updates, the system cannot safely replay those indexes against a document that has changed underneath them. Indexes describe a position in one replica's momentary view, not a durable fact that other replicas can merge later.
Collaborative text CRDTs solve this by making order itself mergeable. Characters, embedded objects, or formatting markers get stable identifiers. An insertion says "place this new element after that known element" or "place it at this generated position identifier." A deletion usually marks an element as removed rather than pretending it never existed. Replicas may receive updates in different orders, but they eventually compute the same visible sequence.
The trade-off is that convergence is not the same as perfect user intention. The CRDT can guarantee that replicas agree without a central lock. It still needs careful ordering rules, tombstone or compaction strategy, and rich text semantics that make formatting behave like formatting rather than like accidental character data.
Why Index-Based Editing Breaks
Start with a tiny document:
visible text:
cat
positions:
0 c
1 a
2 t
Ana sees cat and inserts h after c:
Ana operation:
insert "h" at index 1
Ana sees:
chat
Bruno sees the original cat and inserts r after a:
Bruno operation:
insert "r" at index 2
Bruno sees:
cart
Now Ana receives Bruno's operation. Should index 2 mean after a, as Bruno intended, or should it mean the third slot in Ana's current text chat?
replay by raw index:
chat
insert r at index 2
chart
maybe acceptable here, but accidental
Try a deletion and the problem becomes sharper:
Ana:
delete character at index 1 # deletes a
Bruno:
insert "r" at index 2 # after a, before t
After Ana's deletion, index 2 no longer names the same place. The operation was written in coordinates that expired.
A collaborative editor needs operations that name stable context:
insert "r" after element id A
delete element id A
The visible index can be computed for the user's screen. The replicated operation should refer to identifiers that survive lag, concurrency, and retry.
Sequence CRDTs
A sequence CRDT is a replicated data type for ordered elements. The elements may be characters, paragraphs, list items, image embeds, or formatting boundaries. The core requirement is simple to state and hard to engineer well:
if replicas receive the same set of insert/delete operations,
they must render the same sequence,
even if they received those operations in different orders.
One common family, often called linked-list or RGA-style sequence CRDTs, gives each inserted element a unique identifier:
element:
id: Madrid:42
value: "h"
after: C:1
visible: true
The document is not "array index 0, 1, 2." It is a set of elements plus order constraints:
HEAD
C:1 -> "c"
A:1 -> "a"
T:1 -> "t"
Ana inserts h after C:1:
H:7 -> "h", after C:1
Bruno inserts r after A:1:
R:3 -> "r", after A:1
When all operations arrive, both replicas can render:
HEAD
C:1 -> "c"
H:7 -> "h"
A:1 -> "a"
R:3 -> "r"
T:1 -> "t"
visible:
chart
What if two users insert at the same place concurrently?
Ana:
X:9 -> "x", after A:1
Bruno:
Y:4 -> "y", after A:1
The CRDT needs a deterministic tie-breaker. It may order by actor ID, Lamport timestamp, generated position identifier, or a more careful rule that preserves causal insertion order. The exact choice affects how natural the result feels.
both replicas eventually choose:
A:1, X:9, Y:4
or:
A:1, Y:4, X:9
but not one order on Madrid and another on Dublin
Another family, used by Logoot/LSEQ-like designs, gives each element a position identifier drawn from a dense ordered space:
before:
c at [10]
a at [20]
t at [30]
insert h between c and a:
h at [15]
If two replicas need a position between the same neighbors, the identifier includes enough extra digits and replica identity to stay unique and ordered.
Both families turn "where is the cursor?" into "which stable element or position does this operation reference?" That shift is the small win: local editing can stay fast while merge behavior becomes a data structure problem instead of a global-lock problem.
Deletion, Tombstones, and Compaction
Deletion in a text CRDT is usually not "remove the element from history." It is closer to:
element A:1:
value: "a"
visible: false
That hidden element may still be needed because later operations refer to it:
Bruno operation:
insert "r" after A:1
If Ana deletes A:1 before receiving Bruno's insert, the tombstone lets her place r in the intended neighborhood once the insert arrives.
history:
C:1 -> "c"
A:1 -> "a" (deleted)
R:3 -> "r"
T:1 -> "t"
visible:
crt
This is the same theme as earlier lessons on causal context and compaction. Metadata that looks useless to the visible document may still be structural evidence for future merges. Deleting it safely requires a stability signal:
safe to compact tombstone A:1 only when:
every replica that can send old operations has seen the delete
or the system has an epoch/snapshot boundary
or old clients must resync from a canonical snapshot
The trade-off is operational. Tombstones make merges easier and preserve references, but they grow with editing history. Compaction reduces cost, but it needs membership assumptions, causal stability, or a repair path for clients that return from long offline periods.
Rich Text Is More Than Characters
Plain text already has difficult ordering cases. Rich text adds another layer: formatting has structure.
Suppose Ana selects cat and turns it bold:
Ana:
bold range from C:1 to T:1
At the same time, Bruno inserts s after a:
Bruno:
insert "s" after A:1
Should the inserted s be bold?
possible renderings:
cats where s is bold
cats where s is not bold
Neither answer is universally correct. It depends on the user's intention and the editor's model. If Bruno typed inside a bold word, users often expect the new character to inherit bold. If Bruno inserted a comment marker or pasted unformatted text, inheritance may be wrong.
That means rich text cannot be treated as "characters plus CSS after the fact." The formatting model needs merge semantics too.
One practical approach represents formatting as annotations anchored to stable positions:
annotation:
type: bold
start: before C:1
end: after T:1
behavior_at_edges:
include_insertions_inside
exclude_insertions_after_end
Another approach represents rich text as tree-like or JSON-like replicated state:
paragraph:
children:
- text run "ca"
- text run "t" with bold=true
- mention embed user:42
Now the editor must merge both sequence edits and attribute edits. Concurrent formatting changes need rules:
Ana:
set bold=true on word
Bruno:
set bold=false on overlapping range
merge policy:
keep both as spans and normalize
or use multi-value conflict state
or apply a deterministic attribute resolver
This is where coordination avoidance becomes a product decision. A CRDT can make replicas converge, but the application must decide what a meaningful document is. For code, legal contracts, access-control policies, or structured documents, "both edits converged" may be less important than "the resulting document still respects the domain."
Worked Example: Offline Comment Edit
Consider a collaborative issue tracker with rich text comments. A comment body is edited offline on a phone and in a browser.
Initial text:
Ship the cache fix
The underlying sequence uses element IDs:
S:1 h:1 i:1 p:1 SPACE:1 t:1 h:2 e:1 ...
Phone edit:
insert "safe " after "the "
Browser edit:
bold "cache"
The phone does not need to ask the browser for permission. It creates local sequence operations:
P:77 -> "s", after SPACE:2
P:78 -> "a", after P:77
P:79 -> "f", after P:78
P:80 -> "e", after P:79
P:81 -> " ", after P:80
The browser creates a formatting annotation:
B:31:
bold=true
range:
start before c:1
end after e:2
After synchronization, both replicas can render:
Ship the safe cache fix
---- *****
The sequence merge placed the phone's words. The annotation merge kept the browser's formatting attached to stable document positions, not to fragile indexes.
Now change the browser edit:
delete "cache"
When the phone's insertion and browser's deletion merge, the editor may render:
Ship the safe fix
The deleted elements remain as tombstones until compaction is safe. If the bold annotation was attached to the deleted range, the editor must decide whether to drop it, keep it hidden for undo/history, or transform it into a normalized no-op. That is not just storage detail; it affects undo, comments, selections, and user trust.
Failure Modes
- Replaying raw indexes: Indexes are screen coordinates, not stable replicated identifiers.
- Forgetting concurrent inserts at the same place: A sequence CRDT needs deterministic ordering for same-neighbor inserts.
- Treating convergence as intention preservation: Replicas can agree on a result that still surprises users.
- Deleting structural evidence too early: Tombstone cleanup can break late operations that reference deleted elements.
- Formatting by offsets only: A bold range stored as "characters 5 through 10" drifts under concurrent edits.
- Ignoring edge behavior: Rich text annotations need rules for insertions at the start, inside, and end of a range.
- Using one resolver for every document type: A chat comment, source code file, spreadsheet formula, and legal clause may need different conflict boundaries.
- Letting metadata growth hide in the UI budget: Long-lived documents need compaction, snapshots, or archival strategy.
Practice
Take this text:
red fox
Represent it as sequence elements:
R:1 e:1 d:1 SPACE:1 f:1 o:1 x:1
Two offline edits happen:
Ana:
insert "quick " after SPACE:1
Bruno:
bold "fox"
Sketch:
1. The sequence operations Ana emits.
2. The annotation Bruno emits.
3. The final visible text after both merge.
4. Whether "quick" should inherit any formatting, and why.
5. Which tombstones or anchors must remain if Bruno deletes "fox" instead.
Then answer the coordination question: which part of this design can remain coordination-free, and which part would need an authority if the document were a contract whose clause numbers must be unique and legally stable?
Connections
006.mdintroduced dots and causal context; sequence operations use that kind of identity to name edits reliably.015.mdcovered tombstone collection; collaborative text makes the cost of retaining structural evidence very visible.016.mdseparated atomic visibility from stronger transactions; an editor uses similar grouping when one paste operation inserts many elements and annotations.018.mdmoves the same ideas to offline-first clients and edge replicas, where users expect local edits to work before the network catches up.
Resources
- [PAPER] A comprehensive study of Convergent and Commutative Replicated Data Types
- Focus: Review the sequence CRDT families and the role of identifiers, causality, and tombstones.
- [PAPER] Logoot: A Scalable Optimistic Replication Algorithm for Collaborative Editing on P2P Networks
- Focus: Study position identifiers as an alternative to raw array indexes.
- [PAPER] A Conflict-Free Replicated JSON Datatype
- Focus: Connect sequence editing to nested JSON-like collaborative data.
- [PAPER] Peritext: A CRDT for Rich-Text Collaboration
- Focus: Pay attention to formatting ranges, insertion behavior at boundaries, and rich text intent.
- [DOC] Y.Text shared type
- Focus: Compare the lesson's model with a production-oriented CRDT text API.
Key Takeaways
- Collaborative text CRDTs replace fragile indexes with stable element or position identifiers.
- Deletions often leave tombstones because future operations may still need the deleted element as an anchor.
- Rich text needs merge semantics for annotations, ranges, embeds, and edge behavior; it is not just plain text plus styling.
- Coordination-free editing gives local responsiveness, but domain-specific correctness still needs explicit boundaries and sometimes authority.