Day 236: Memory Models & Ordering - When Program Order Is Not Enough
Lock-free code depends on atomic operations, but atomics alone are not the whole story. We also need rules about what writes become visible to other threads, and in what order. Memory models are those rules.
Today's "Aha!" Moment
Programmers naturally read concurrent code as if source order were the only order that matters.
For one thread, we write:
data = 42
ready = true
and we instinctively assume another thread that observes ready == true must therefore also see data == 42.
That assumption is exactly what memory models force us to examine.
The aha is:
- program order is not automatically the same as visibility order to other threads
Compilers reorder instructions. CPUs reorder memory operations. Caches delay when one core's writes become visible to another.
So concurrent correctness needs a contract that says which observations are allowed and which are forbidden.
That contract is the memory model.
Once we see that, acquire, release, relaxed, fences, and happens-before stop looking like arbitrary jargon. They become tools for ruling out specific bad observations.
Why This Matters
Take a producer-consumer handoff:
Thread 1:
data = 42
ready = true
Thread 2:
if ready:
print(data)
The business intent is obvious:
- once
readybecomes true,datashould already be published
But without the right ordering guarantees, Thread 2 may observe:
ready == true- while still seeing the old value of
data
That sounds absurd if we think only in source order. It is perfectly possible once compiler transformations, CPU pipelines, store buffers, and caches enter the picture.
This matters because:
- lock-free algorithms depend on atomic ordering for correctness
- even lock-based designs rely on the language and hardware giving locks their publication semantics
- bugs caused by ordering often appear only under load, on different hardware, or after compiler optimization
So memory ordering is not a niche topic for compiler engineers. It is the hidden layer that explains why some concurrent programs are correct only by accident.
Learning Objectives
By the end of this session, you will be able to:
- Explain why memory models exist - Describe why source order alone is not enough to reason about what one thread observes from another.
- Differentiate the main ordering ideas - Understand the practical role of relaxed operations, acquire-release synchronization, and stronger sequentially consistent reasoning.
- Evaluate the trade-off - Connect stronger ordering to easier reasoning and weaker ordering to better optimization freedom but greater correctness risk.
Core Concepts Explained
Concept 1: A Memory Model Defines Allowed Cross-Thread Observations
In sequential single-thread reasoning, the program order usually feels like reality.
In concurrent execution, that intuition is too strong.
Why?
Because the system is trying to optimize:
- compilers reorder independent instructions
- CPUs execute out of order
- stores may sit in buffers before becoming visible elsewhere
- cores may temporarily observe writes at different times
So the question becomes:
- what is another thread allowed to see?
That is what a memory model answers.
It does not say "how the CPU must internally work." It says:
- which observable outcomes are legal for a correct compiler + hardware + runtime combination
This is why memory models are really about reasoning boundaries.
They define when one thread's operations become ordered with respect to another thread's operations, and when they do not.
Without that contract, concurrent programs would be impossible to reason about portably.
Concept 2: Acquire and Release Express Publication and Observation
Return to the producer-consumer example.
We want:
Thread 1:
data = 42
publish ready = true
Thread 2:
observe ready == true
then safely read data
Acquire-release is the classic way to express that intent.
A simplified mental model:
- a release store says: all earlier writes in this thread must become visible before this publication point
- an acquire load says: after I observe that publication, later reads in this thread must not float before it
ASCII sketch:
Thread 1 Thread 2
-------- --------
data = 42 if load_acquire(ready):
store_release(ready, true) read data -> must see 42
This creates a happens-before relationship:
- if Thread 2's acquire load observes the value written by Thread 1's release store, then the prior write to
databecomes visible in the intended way
That is the practical purpose of acquire-release:
- publish data safely
- observe published data safely
It is weaker than "everything in one total global order," but much stronger than relaxed atomics.
Concept 3: Stronger Ordering Makes Reasoning Easier; Weaker Ordering Gives More Optimization Freedom
We can think of common orderings as a spectrum:
weaker ----------------------------------------------> stronger
relaxed -> acquire/release -> seq_cst
Relaxed operations give atomicity for that variable, but little ordering with surrounding operations.
Acquire/release is great for publication patterns, handoff, and many lock-free structures.
Sequentially consistent (seq_cst) is stronger and often easier to reason about because it more closely matches the intuition of one shared interleaving.
The trade-off is:
- stronger ordering reduces the set of weird outcomes you must reason about
- weaker ordering gives compilers and hardware more room to optimize
This is why memory ordering is both a performance topic and a correctness topic.
And this is also why lock-free programming is hard:
- the algorithm may look structurally correct
- but the wrong memory order can still make the algorithm invalid
So when choosing ordering strength, the right question is not:
- what is the weakest thing the compiler accepts?
It is:
- what is the weakest ordering that still preserves the intended happens-before relationships?
Troubleshooting
Issue: "The code is in the right order, so another thread must see it in that order."
Why it happens / is confusing: Source order is the easiest model for the human brain.
Clarification / Fix: Source order inside one thread is not automatically visibility order across threads. Correct cross-thread publication needs explicit synchronization semantics.
Issue: "Atomic means fully synchronized."
Why it happens / is confusing: The word sounds stronger than it is.
Clarification / Fix: Atomicity only guarantees indivisible access to that variable. It does not automatically guarantee the surrounding memory operations are observed in the intended order.
Issue: "We should just use the weakest ordering for speed."
Why it happens / is confusing: Weaker sounds cheaper.
Clarification / Fix: The cost of a subtle concurrency bug is usually far greater than the micro-optimization. Start from the clearest correct ordering and weaken only when you can defend the happens-before story.
Advanced Connections
Connection 1: Memory Models & Ordering <-> Lock-Free Data Structures
The parallel: Lock-free algorithms are where memory ordering stops being optional background theory and becomes part of the algorithm itself.
Connection 2: Memory Models & Ordering <-> Locks & Synchronization
The parallel: Locks are easier to use partly because they package ordering guarantees for you. Memory-model reasoning becomes more exposed when you leave that shelter.
Resources
- [DOC] Rust Nomicon: Atomics
- [DOC] Rust
Orderingenum - [DOC] Linux kernel memory barriers documentation
- [BOOK] Operating Systems: Three Easy Pieces
Key Insights
- Memory models define what cross-thread observations are legal - They exist because compiler and hardware optimizations break naive "source order equals visible order" reasoning.
- Acquire-release is the basic publication pattern - It lets one thread publish data and another observe that publication with the intended visibility guarantees.
- Ordering strength is a trade-off between reasoning simplicity and optimization freedom - Stronger orderings are easier to reason about; weaker ones demand sharper proofs.
Knowledge Check
-
What problem does a memory model primarily solve?
- A) It decides how much RAM a process can allocate
- B) It defines which cross-thread visibility and ordering outcomes are allowed
- C) It replaces the scheduler
-
What is the practical role of a release store followed by an acquire load that observes it?
- A) It creates a publication/observation relationship that orders surrounding memory effects
- B) It disables compiler optimization entirely
- C) It makes all future operations globally sequential forever
-
Why can atomic operations still be insufficient if the chosen ordering is too weak?
- A) Because atomicity alone does not guarantee the surrounding reads and writes are seen in the intended order
- B) Because atomics only work on single-core machines
- C) Because atomics remove values from caches
Answers
1. B: A memory model defines what one thread is allowed to observe from another in the presence of compiler and hardware reordering.
2. A: Acquire-release is the standard way to express safe publication and observation across threads.
3. A: The variable access may be atomic while the overall concurrent reasoning is still wrong if visibility ordering is not strong enough.