Memory Models & Ordering - When Program Order Is Not Enough

Day 236: Memory Models & Ordering - When Program Order Is Not Enough

Lock-free code depends on atomic operations, but atomics alone are not the whole story. We also need rules about what writes become visible to other threads, and in what order. Memory models are those rules.


Today's "Aha!" Moment

Programmers naturally read concurrent code as if source order were the only order that matters.

For one thread, we write:

data = 42
ready = true

and we instinctively assume another thread that observes ready == true must therefore also see data == 42.

That assumption is exactly what memory models force us to examine.

The aha is:

Compilers reorder instructions. CPUs reorder memory operations. Caches delay when one core's writes become visible to another.

So concurrent correctness needs a contract that says which observations are allowed and which are forbidden.

That contract is the memory model.

Once we see that, acquire, release, relaxed, fences, and happens-before stop looking like arbitrary jargon. They become tools for ruling out specific bad observations.

Why This Matters

Take a producer-consumer handoff:

Thread 1:
  data = 42
  ready = true

Thread 2:
  if ready:
      print(data)

The business intent is obvious:

But without the right ordering guarantees, Thread 2 may observe:

That sounds absurd if we think only in source order. It is perfectly possible once compiler transformations, CPU pipelines, store buffers, and caches enter the picture.

This matters because:

So memory ordering is not a niche topic for compiler engineers. It is the hidden layer that explains why some concurrent programs are correct only by accident.

Learning Objectives

By the end of this session, you will be able to:

  1. Explain why memory models exist - Describe why source order alone is not enough to reason about what one thread observes from another.
  2. Differentiate the main ordering ideas - Understand the practical role of relaxed operations, acquire-release synchronization, and stronger sequentially consistent reasoning.
  3. Evaluate the trade-off - Connect stronger ordering to easier reasoning and weaker ordering to better optimization freedom but greater correctness risk.

Core Concepts Explained

Concept 1: A Memory Model Defines Allowed Cross-Thread Observations

In sequential single-thread reasoning, the program order usually feels like reality.

In concurrent execution, that intuition is too strong.

Why?

Because the system is trying to optimize:

So the question becomes:

That is what a memory model answers.

It does not say "how the CPU must internally work." It says:

This is why memory models are really about reasoning boundaries.

They define when one thread's operations become ordered with respect to another thread's operations, and when they do not.

Without that contract, concurrent programs would be impossible to reason about portably.

Concept 2: Acquire and Release Express Publication and Observation

Return to the producer-consumer example.

We want:

Thread 1:
  data = 42
  publish ready = true

Thread 2:
  observe ready == true
  then safely read data

Acquire-release is the classic way to express that intent.

A simplified mental model:

ASCII sketch:

Thread 1                         Thread 2
--------                         --------
data = 42                        if load_acquire(ready):
store_release(ready, true)           read data  -> must see 42

This creates a happens-before relationship:

That is the practical purpose of acquire-release:

It is weaker than "everything in one total global order," but much stronger than relaxed atomics.

Concept 3: Stronger Ordering Makes Reasoning Easier; Weaker Ordering Gives More Optimization Freedom

We can think of common orderings as a spectrum:

weaker ----------------------------------------------> stronger
relaxed -> acquire/release -> seq_cst

Relaxed operations give atomicity for that variable, but little ordering with surrounding operations.

Acquire/release is great for publication patterns, handoff, and many lock-free structures.

Sequentially consistent (seq_cst) is stronger and often easier to reason about because it more closely matches the intuition of one shared interleaving.

The trade-off is:

This is why memory ordering is both a performance topic and a correctness topic.

And this is also why lock-free programming is hard:

So when choosing ordering strength, the right question is not:

It is:

Troubleshooting

Issue: "The code is in the right order, so another thread must see it in that order."

Why it happens / is confusing: Source order is the easiest model for the human brain.

Clarification / Fix: Source order inside one thread is not automatically visibility order across threads. Correct cross-thread publication needs explicit synchronization semantics.

Issue: "Atomic means fully synchronized."

Why it happens / is confusing: The word sounds stronger than it is.

Clarification / Fix: Atomicity only guarantees indivisible access to that variable. It does not automatically guarantee the surrounding memory operations are observed in the intended order.

Issue: "We should just use the weakest ordering for speed."

Why it happens / is confusing: Weaker sounds cheaper.

Clarification / Fix: The cost of a subtle concurrency bug is usually far greater than the micro-optimization. Start from the clearest correct ordering and weaken only when you can defend the happens-before story.

Advanced Connections

Connection 1: Memory Models & Ordering <-> Lock-Free Data Structures

The parallel: Lock-free algorithms are where memory ordering stops being optional background theory and becomes part of the algorithm itself.

Connection 2: Memory Models & Ordering <-> Locks & Synchronization

The parallel: Locks are easier to use partly because they package ordering guarantees for you. Memory-model reasoning becomes more exposed when you leave that shelter.

Resources

Key Insights

  1. Memory models define what cross-thread observations are legal - They exist because compiler and hardware optimizations break naive "source order equals visible order" reasoning.
  2. Acquire-release is the basic publication pattern - It lets one thread publish data and another observe that publication with the intended visibility guarantees.
  3. Ordering strength is a trade-off between reasoning simplicity and optimization freedom - Stronger orderings are easier to reason about; weaker ones demand sharper proofs.

Knowledge Check

  1. What problem does a memory model primarily solve?

    • A) It decides how much RAM a process can allocate
    • B) It defines which cross-thread visibility and ordering outcomes are allowed
    • C) It replaces the scheduler
  2. What is the practical role of a release store followed by an acquire load that observes it?

    • A) It creates a publication/observation relationship that orders surrounding memory effects
    • B) It disables compiler optimization entirely
    • C) It makes all future operations globally sequential forever
  3. Why can atomic operations still be insufficient if the chosen ordering is too weak?

    • A) Because atomicity alone does not guarantee the surrounding reads and writes are seen in the intended order
    • B) Because atomics only work on single-core machines
    • C) Because atomics remove values from caches

Answers

1. B: A memory model defines what one thread is allowed to observe from another in the presence of compiler and hardware reordering.

2. A: Acquire-release is the standard way to express safe publication and observation across threads.

3. A: The variable access may be atomic while the overall concurrent reasoning is still wrong if visibility ordering is not strong enough.



← Back to Learning