Day 235: Lock-Free Data Structures - Progress Without a Mutex

Locks preserve invariants by forcing one thread at a time through a critical section. Lock-free structures ask a harder question: can we preserve correctness with atomic state transitions and retries, so that one stalled thread does not stop everyone else?

Today's "Aha!" Moment

After learning locks, it is easy to think concurrency has a standard recipe:

find the shared state
put a mutex around it
keep the critical section small

That is often the right first answer.

But locks have a cost:

contending threads serialize
a slow or preempted lock holder can stall others
lock ordering becomes part of the program's correctness story

Lock-free data structures appear when we want a different kind of progress guarantee.

The aha is:

lock-free does not mean "no coordination"
it means coordination through atomic primitives and retry instead of through exclusive ownership of a mutex

For example, instead of saying:

"I hold the lock, so nobody else may touch this pointer"

we say:

"I will read the pointer, compute the new desired state, and use an atomic compare-and-swap to install it only if nobody changed it first"

If another thread wins the race, we do not block behind it. We retry with the new state.

That changes the failure mode completely:

less lock blocking
more retries and subtle correctness conditions

Why This Matters

Imagine many worker threads pushing jobs onto a shared stack or queue.

With a coarse lock, correctness is straightforward:

one thread enters
updates the head or tail
unlocks

But under heavy contention, that one lock becomes a choke point.

Now consider a lock-free stack.

Each push can try to:

read the current head pointer
make the new node point to that old head
atomically replace head with the new node, but only if head still equals the old value

If another thread changed the head first, the compare-and-swap fails and we retry.

No one waits for a lock holder. No thread can freeze the entire structure just by getting descheduled while "holding" something.

That sounds strictly better, but it is not.

Now the design has to answer harder questions:

what if the pointer changes away and then back again? (ABA)
when is it safe to reclaim removed nodes?
how expensive are repeated CAS failures under contention?

This is why lock-free structures matter:

they solve one class of coordination bottlenecks
by introducing a sharper class of correctness and memory-lifetime problems

Learning Objectives

By the end of this session, you will be able to:

Explain why lock-free structures exist - Describe when mutex contention or blocking progress makes a retry-based design attractive.
Trace the core mechanism - Show how compare-and-swap style updates preserve correctness without a traditional lock.
Evaluate the real trade-off - Connect lock-freedom to retry costs, ABA, memory reclamation, and the coming need to reason about memory ordering.

Core Concepts Explained

Concept 1: Lock-Free Means Global Progress Without Requiring a Thread to Hold a Mutex

The word "lock-free" is easy to misread.

It does not mean:

no atomic operations
no contention
no retries
no waiting of any kind

What it usually means is:

the system as a whole can still make progress even if an individual thread is delayed or loses races

That is a different guarantee from a lock-based design, where one unlucky thread can delay everyone by holding a lock too long.

A simple contrast:

Lock-based:
  one thread owns the critical section
  others wait

Lock-free:
  many threads may race
  losers retry
  some thread still succeeds

This is why lock-free structures are especially attractive under:

high contention
unpredictable scheduling
runtimes where a paused thread holding a lock is painful

But notice what changed:

we traded blocked waiting for optimistic racing

That is the heart of the design.

Concept 2: The Core Primitive Is Usually Compare-and-Swap Plus Retry

Take a lock-free stack push.

Suppose head points to the current top node.

Thread 1 wants to push node N.

It does something conceptually like:

old_head = head
N.next = old_head
CAS(head, old_head, N)

Where CAS means:

compare head with old_head
if they are still equal, replace head with N
otherwise fail without changing head

ASCII sketch:

initial:
head -> A -> B

Thread 1 prepares:
N -> A -> B

CAS succeeds:
head -> N -> A -> B

If another thread changed head first, the CAS fails:

Thread 1 saw head = A
Thread 2 changed head to X
Thread 1 CAS(head, A, N) -> fail
Thread 1 retries with head = X

The structure remains correct because no thread installs an update based on stale assumptions without rechecking the shared state atomically.

That is the central pattern of lock-free design:

optimistic read
compute candidate update
atomic conditional install
retry if the world changed underneath you

Concept 3: Removing Locks Also Removes Some Simplicity, Especially Around ABA and Memory Reclamation

This is where lock-free programming becomes much harder than it first appears.

Two classic problems:

ABA problem: a pointer changes from A to B and back to A; a CAS may succeed even though the world changed in an important way
memory reclamation: if one thread removes a node, when is it safe to free it, given that another thread may still hold an old pointer snapshot?

Those problems do not disappear just because the top-level algorithm looks elegant.

Example intuition:

Thread 1 reads head = A
Thread 2 pops A, reuses/frees it, then head becomes A-looking again
Thread 1 CAS succeeds based on an assumption that is no longer logically valid

That is why practical lock-free designs often need extra machinery:

tagged pointers
hazard pointers
epoch-based reclamation
reference-count style protection

And there is one more subtle cost:

lock-free algorithms depend heavily on atomic semantics and memory ordering

That is the bridge to the next lesson.

A lock hid some of the ordering complexity behind a stronger primitive. Lock-free code exposes more of that complexity directly.

Troubleshooting

Issue: "Lock-free means threads never wait."

Why it happens / is confusing: The name suggests all waiting disappears.

Clarification / Fix: Threads may still spin, retry, and lose races repeatedly. The key property is not zero waiting but the absence of a single mutex holder who can block global progress.

Issue: "If CAS succeeds, the algorithm is automatically correct."

Why it happens / is confusing: Atomic success sounds like proof that the whole invariant is safe.

Clarification / Fix: CAS only protects the specific compare-and-install step. Correctness still depends on the surrounding invariant, on how nodes are linked, and on safe memory reclamation.

Issue: "Lock-free is always faster than locking."

Why it happens / is confusing: Avoiding a mutex sounds like pure win.

Clarification / Fix: Under low contention, simple locks may be faster and easier to maintain. Lock-free algorithms shine only when their progress properties or contention profile match the workload.

Advanced Connections

Connection 1: Lock-Free Data Structures <-> Locks & Synchronization

The parallel: Both protect shared invariants. Locks do it by exclusive access; lock-free structures do it by atomic conditional updates and retries.

Connection 2: Lock-Free Data Structures <-> Memory Models & Ordering

The parallel: Lock-free algorithms depend directly on the guarantees of atomic operations and memory ordering, which is why the next lesson becomes essential rather than optional.

Resources

[PAPER] Wait-Free Synchronization
[PAPER] Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
[DOC] Fast Concurrent Queue Algorithms
[BOOK] Operating Systems: Three Easy Pieces

Key Insights

Lock-free does not remove coordination, it changes its form - Atomic compare-and-swap plus retry replaces mutex ownership as the core coordination mechanism.
The main win is progress under contention or delays - A stalled thread does not hold a mutex that blocks everyone else.
The main cost is subtle correctness work - ABA, reclamation, and ordering semantics make lock-free designs much harder to implement and verify.

Knowledge Check

What is the main idea behind a lock-free update?
- A) Disable the scheduler while the update runs
- B) Use an atomic conditional update such as CAS and retry if another thread changed the state first
- C) Remove all shared memory from the process
Why can lock-free structures still perform poorly under contention?
- A) Because retries and failed CAS operations can waste work
- B) Because lock-free code cannot use atomic instructions
- C) Because lock-free algorithms force all threads into one critical section
Why is memory reclamation a central difficulty in lock-free structures?
- A) Because the compiler removes all freed nodes automatically
- B) Because another thread may still hold a stale pointer to a node that looks removable
- C) Because lock-free algorithms never delete nodes

Answers

1. B: A lock-free operation typically reads shared state optimistically, attempts an atomic conditional install, and retries if the state changed first.

2. A: Lock-free avoids mutex blocking, but heavy contention can still create lots of wasted retry work.

3. B: A removed node may still be visible to another thread that observed older state, so freeing it safely is a major part of real lock-free design.

← Back to Learning