Day 236: Memory Models & Ordering - When Program Order Is Not Enough

Lock-free code depends on atomic operations, but atomics alone are not the whole story. We also need rules about what writes become visible to other threads, and in what order. Memory models are those rules.

Today's "Aha!" Moment

Programmers naturally read concurrent code as if source order were the only order that matters.

For one thread, we write:

data = 42
ready = true

and we instinctively assume another thread that observes ready == true must therefore also see data == 42.

That assumption is exactly what memory models force us to examine.

The aha is:

program order is not automatically the same as visibility order to other threads

Compilers reorder instructions. CPUs reorder memory operations. Caches delay when one core's writes become visible to another.

So concurrent correctness needs a contract that says which observations are allowed and which are forbidden.

That contract is the memory model.

Once we see that, acquire, release, relaxed, fences, and happens-before stop looking like arbitrary jargon. They become tools for ruling out specific bad observations.

Why This Matters

Take a producer-consumer handoff:

Thread 1:
  data = 42
  ready = true

Thread 2:
  if ready:
      print(data)

The business intent is obvious:

once ready becomes true, data should already be published

But without the right ordering guarantees, Thread 2 may observe:

ready == true
while still seeing the old value of data

That sounds absurd if we think only in source order. It is perfectly possible once compiler transformations, CPU pipelines, store buffers, and caches enter the picture.

This matters because:

lock-free algorithms depend on atomic ordering for correctness
even lock-based designs rely on the language and hardware giving locks their publication semantics
bugs caused by ordering often appear only under load, on different hardware, or after compiler optimization

So memory ordering is not a niche topic for compiler engineers. It is the hidden layer that explains why some concurrent programs are correct only by accident.

Learning Objectives

By the end of this session, you will be able to:

Explain why memory models exist - Describe why source order alone is not enough to reason about what one thread observes from another.
Differentiate the main ordering ideas - Understand the practical role of relaxed operations, acquire-release synchronization, and stronger sequentially consistent reasoning.
Evaluate the trade-off - Connect stronger ordering to easier reasoning and weaker ordering to better optimization freedom but greater correctness risk.

Core Concepts Explained

Concept 1: A Memory Model Defines Allowed Cross-Thread Observations

In sequential single-thread reasoning, the program order usually feels like reality.

In concurrent execution, that intuition is too strong.

Why?

Because the system is trying to optimize:

compilers reorder independent instructions
CPUs execute out of order
stores may sit in buffers before becoming visible elsewhere
cores may temporarily observe writes at different times

So the question becomes:

what is another thread allowed to see?

That is what a memory model answers.

It does not say "how the CPU must internally work." It says:

which observable outcomes are legal for a correct compiler + hardware + runtime combination

This is why memory models are really about reasoning boundaries.

They define when one thread's operations become ordered with respect to another thread's operations, and when they do not.

Without that contract, concurrent programs would be impossible to reason about portably.

Concept 2: Acquire and Release Express Publication and Observation

Return to the producer-consumer example.

We want:

Thread 1:
  data = 42
  publish ready = true

Thread 2:
  observe ready == true
  then safely read data

Acquire-release is the classic way to express that intent.

A simplified mental model:

a release store says: all earlier writes in this thread must become visible before this publication point
an acquire load says: after I observe that publication, later reads in this thread must not float before it

ASCII sketch:

Thread 1                         Thread 2
--------                         --------
data = 42                        if load_acquire(ready):
store_release(ready, true)           read data  -> must see 42

This creates a happens-before relationship:

if Thread 2's acquire load observes the value written by Thread 1's release store, then the prior write to data becomes visible in the intended way

That is the practical purpose of acquire-release:

publish data safely
observe published data safely

It is weaker than "everything in one total global order," but much stronger than relaxed atomics.

Concept 3: Stronger Ordering Makes Reasoning Easier; Weaker Ordering Gives More Optimization Freedom

We can think of common orderings as a spectrum:

weaker ----------------------------------------------> stronger
relaxed -> acquire/release -> seq_cst

Relaxed operations give atomicity for that variable, but little ordering with surrounding operations.

Acquire/release is great for publication patterns, handoff, and many lock-free structures.

Sequentially consistent (seq_cst) is stronger and often easier to reason about because it more closely matches the intuition of one shared interleaving.

The trade-off is:

stronger ordering reduces the set of weird outcomes you must reason about
weaker ordering gives compilers and hardware more room to optimize

This is why memory ordering is both a performance topic and a correctness topic.

And this is also why lock-free programming is hard:

the algorithm may look structurally correct
but the wrong memory order can still make the algorithm invalid

So when choosing ordering strength, the right question is not:

what is the weakest thing the compiler accepts?

It is:

what is the weakest ordering that still preserves the intended happens-before relationships?

Troubleshooting

Issue: "The code is in the right order, so another thread must see it in that order."

Why it happens / is confusing: Source order is the easiest model for the human brain.

Clarification / Fix: Source order inside one thread is not automatically visibility order across threads. Correct cross-thread publication needs explicit synchronization semantics.

Issue: "Atomic means fully synchronized."

Why it happens / is confusing: The word sounds stronger than it is.

Clarification / Fix: Atomicity only guarantees indivisible access to that variable. It does not automatically guarantee the surrounding memory operations are observed in the intended order.

Issue: "We should just use the weakest ordering for speed."

Why it happens / is confusing: Weaker sounds cheaper.

Clarification / Fix: The cost of a subtle concurrency bug is usually far greater than the micro-optimization. Start from the clearest correct ordering and weaken only when you can defend the happens-before story.

Advanced Connections

Connection 1: Memory Models & Ordering <-> Lock-Free Data Structures

The parallel: Lock-free algorithms are where memory ordering stops being optional background theory and becomes part of the algorithm itself.

Connection 2: Memory Models & Ordering <-> Locks & Synchronization

The parallel: Locks are easier to use partly because they package ordering guarantees for you. Memory-model reasoning becomes more exposed when you leave that shelter.

Resources

[DOC] Rust Nomicon: Atomics
[DOC] Rust Ordering enum
[DOC] Linux kernel memory barriers documentation
[BOOK] Operating Systems: Three Easy Pieces

Key Insights

Memory models define what cross-thread observations are legal - They exist because compiler and hardware optimizations break naive "source order equals visible order" reasoning.
Acquire-release is the basic publication pattern - It lets one thread publish data and another observe that publication with the intended visibility guarantees.
Ordering strength is a trade-off between reasoning simplicity and optimization freedom - Stronger orderings are easier to reason about; weaker ones demand sharper proofs.

Knowledge Check

What problem does a memory model primarily solve?
- A) It decides how much RAM a process can allocate
- B) It defines which cross-thread visibility and ordering outcomes are allowed
- C) It replaces the scheduler
What is the practical role of a release store followed by an acquire load that observes it?
- A) It creates a publication/observation relationship that orders surrounding memory effects
- B) It disables compiler optimization entirely
- C) It makes all future operations globally sequential forever
Why can atomic operations still be insufficient if the chosen ordering is too weak?
- A) Because atomicity alone does not guarantee the surrounding reads and writes are seen in the intended order
- B) Because atomics only work on single-core machines
- C) Because atomics remove values from caches

Answers

1. B: A memory model defines what one thread is allowed to observe from another in the presence of compiler and hardware reordering.

2. A: Acquire-release is the standard way to express safe publication and observation across threads.

3. A: The variable access may be atomic while the overall concurrent reasoning is still wrong if visibility ordering is not strong enough.

← Back to Learning