Day 235: Lock-Free Data Structures - Progress Without a Mutex
Locks preserve invariants by forcing one thread at a time through a critical section. Lock-free structures ask a harder question: can we preserve correctness with atomic state transitions and retries, so that one stalled thread does not stop everyone else?
Today's "Aha!" Moment
After learning locks, it is easy to think concurrency has a standard recipe:
- find the shared state
- put a mutex around it
- keep the critical section small
That is often the right first answer.
But locks have a cost:
- contending threads serialize
- a slow or preempted lock holder can stall others
- lock ordering becomes part of the program's correctness story
Lock-free data structures appear when we want a different kind of progress guarantee.
The aha is:
- lock-free does not mean "no coordination"
- it means coordination through atomic primitives and retry instead of through exclusive ownership of a mutex
For example, instead of saying:
- "I hold the lock, so nobody else may touch this pointer"
we say:
- "I will read the pointer, compute the new desired state, and use an atomic compare-and-swap to install it only if nobody changed it first"
If another thread wins the race, we do not block behind it. We retry with the new state.
That changes the failure mode completely:
- less lock blocking
- more retries and subtle correctness conditions
Why This Matters
Imagine many worker threads pushing jobs onto a shared stack or queue.
With a coarse lock, correctness is straightforward:
- one thread enters
- updates the head or tail
- unlocks
But under heavy contention, that one lock becomes a choke point.
Now consider a lock-free stack.
Each push can try to:
- read the current head pointer
- make the new node point to that old head
- atomically replace head with the new node, but only if head still equals the old value
If another thread changed the head first, the compare-and-swap fails and we retry.
No one waits for a lock holder. No thread can freeze the entire structure just by getting descheduled while "holding" something.
That sounds strictly better, but it is not.
Now the design has to answer harder questions:
- what if the pointer changes away and then back again? (
ABA) - when is it safe to reclaim removed nodes?
- how expensive are repeated CAS failures under contention?
This is why lock-free structures matter:
- they solve one class of coordination bottlenecks
- by introducing a sharper class of correctness and memory-lifetime problems
Learning Objectives
By the end of this session, you will be able to:
- Explain why lock-free structures exist - Describe when mutex contention or blocking progress makes a retry-based design attractive.
- Trace the core mechanism - Show how compare-and-swap style updates preserve correctness without a traditional lock.
- Evaluate the real trade-off - Connect lock-freedom to retry costs, ABA, memory reclamation, and the coming need to reason about memory ordering.
Core Concepts Explained
Concept 1: Lock-Free Means Global Progress Without Requiring a Thread to Hold a Mutex
The word "lock-free" is easy to misread.
It does not mean:
- no atomic operations
- no contention
- no retries
- no waiting of any kind
What it usually means is:
- the system as a whole can still make progress even if an individual thread is delayed or loses races
That is a different guarantee from a lock-based design, where one unlucky thread can delay everyone by holding a lock too long.
A simple contrast:
Lock-based:
one thread owns the critical section
others wait
Lock-free:
many threads may race
losers retry
some thread still succeeds
This is why lock-free structures are especially attractive under:
- high contention
- unpredictable scheduling
- runtimes where a paused thread holding a lock is painful
But notice what changed:
- we traded blocked waiting for optimistic racing
That is the heart of the design.
Concept 2: The Core Primitive Is Usually Compare-and-Swap Plus Retry
Take a lock-free stack push.
Suppose head points to the current top node.
Thread 1 wants to push node N.
It does something conceptually like:
old_head = head
N.next = old_head
CAS(head, old_head, N)
Where CAS means:
- compare
headwithold_head - if they are still equal, replace
headwithN - otherwise fail without changing
head
ASCII sketch:
initial:
head -> A -> B
Thread 1 prepares:
N -> A -> B
CAS succeeds:
head -> N -> A -> B
If another thread changed head first, the CAS fails:
Thread 1 saw head = A
Thread 2 changed head to X
Thread 1 CAS(head, A, N) -> fail
Thread 1 retries with head = X
The structure remains correct because no thread installs an update based on stale assumptions without rechecking the shared state atomically.
That is the central pattern of lock-free design:
- optimistic read
- compute candidate update
- atomic conditional install
- retry if the world changed underneath you
Concept 3: Removing Locks Also Removes Some Simplicity, Especially Around ABA and Memory Reclamation
This is where lock-free programming becomes much harder than it first appears.
Two classic problems:
- ABA problem: a pointer changes from
AtoBand back toA; a CAS may succeed even though the world changed in an important way - memory reclamation: if one thread removes a node, when is it safe to free it, given that another thread may still hold an old pointer snapshot?
Those problems do not disappear just because the top-level algorithm looks elegant.
Example intuition:
Thread 1 reads head = A
Thread 2 pops A, reuses/frees it, then head becomes A-looking again
Thread 1 CAS succeeds based on an assumption that is no longer logically valid
That is why practical lock-free designs often need extra machinery:
- tagged pointers
- hazard pointers
- epoch-based reclamation
- reference-count style protection
And there is one more subtle cost:
- lock-free algorithms depend heavily on atomic semantics and memory ordering
That is the bridge to the next lesson.
A lock hid some of the ordering complexity behind a stronger primitive. Lock-free code exposes more of that complexity directly.
Troubleshooting
Issue: "Lock-free means threads never wait."
Why it happens / is confusing: The name suggests all waiting disappears.
Clarification / Fix: Threads may still spin, retry, and lose races repeatedly. The key property is not zero waiting but the absence of a single mutex holder who can block global progress.
Issue: "If CAS succeeds, the algorithm is automatically correct."
Why it happens / is confusing: Atomic success sounds like proof that the whole invariant is safe.
Clarification / Fix: CAS only protects the specific compare-and-install step. Correctness still depends on the surrounding invariant, on how nodes are linked, and on safe memory reclamation.
Issue: "Lock-free is always faster than locking."
Why it happens / is confusing: Avoiding a mutex sounds like pure win.
Clarification / Fix: Under low contention, simple locks may be faster and easier to maintain. Lock-free algorithms shine only when their progress properties or contention profile match the workload.
Advanced Connections
Connection 1: Lock-Free Data Structures <-> Locks & Synchronization
The parallel: Both protect shared invariants. Locks do it by exclusive access; lock-free structures do it by atomic conditional updates and retries.
Connection 2: Lock-Free Data Structures <-> Memory Models & Ordering
The parallel: Lock-free algorithms depend directly on the guarantees of atomic operations and memory ordering, which is why the next lesson becomes essential rather than optional.
Resources
- [PAPER] Wait-Free Synchronization
- [PAPER] Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
- [DOC] Fast Concurrent Queue Algorithms
- [BOOK] Operating Systems: Three Easy Pieces
Key Insights
- Lock-free does not remove coordination, it changes its form - Atomic compare-and-swap plus retry replaces mutex ownership as the core coordination mechanism.
- The main win is progress under contention or delays - A stalled thread does not hold a mutex that blocks everyone else.
- The main cost is subtle correctness work - ABA, reclamation, and ordering semantics make lock-free designs much harder to implement and verify.
Knowledge Check
-
What is the main idea behind a lock-free update?
- A) Disable the scheduler while the update runs
- B) Use an atomic conditional update such as CAS and retry if another thread changed the state first
- C) Remove all shared memory from the process
-
Why can lock-free structures still perform poorly under contention?
- A) Because retries and failed CAS operations can waste work
- B) Because lock-free code cannot use atomic instructions
- C) Because lock-free algorithms force all threads into one critical section
-
Why is memory reclamation a central difficulty in lock-free structures?
- A) Because the compiler removes all freed nodes automatically
- B) Because another thread may still hold a stale pointer to a node that looks removable
- C) Because lock-free algorithms never delete nodes
Answers
1. B: A lock-free operation typically reads shared state optimistically, attempts an atomic conditional install, and retries if the state changed first.
2. A: Lock-free avoids mutex blocking, but heavy contention can still create lots of wasted retry work.
3. B: A removed node may still be visible to another thread that observed older state, so freeing it safely is a major part of real lock-free design.