LESSON
Day 241: Cache Fundamentals - CPU to CDN
A cache is a copy placed closer, faster, or cheaper than the authoritative source, and every cache is a trade-off between speed and freshness.
Today's "Aha!" Moment
The insight: CPU caches, page cache, Redis, and CDNs look different on the surface, but they all exist for the same reason: the fastest storage is always too small, and the largest storage is always too far away or too slow.
Why this matters: Teams often treat caches as isolated tricks: L1 cache belongs to hardware, Redis belongs to backend, CDN belongs to frontend. That misses the reusable design pattern underneath. Once you see caching as one idea repeated across layers, system design becomes much easier to reason about.
The universal pattern: keep a smaller, faster copy near the work -> pay a miss penalty when it is absent or stale -> accept complexity around eviction and invalidation.
Concrete anchor: Imagine a product page request. The CPU cache speeds the code path itself, the OS page cache speeds file and block access, Redis may speed repeated database reads, and a CDN may serve the whole response or image without touching origin at all. Different layers, same idea.
How to recognize when this applies:
- A slow path is repeatedly asked for the same data or nearby data.
- The expensive source is farther away in latency, bandwidth, or cost.
- Most requests concentrate on a hot subset of a much larger dataset.
Common misconceptions:
- [INCORRECT] "A cache is just a performance bonus layer you add later."
- [INCORRECT] "A cache hit is always good, so more caching is always better."
- [CORRECT] The truth: A cache is a design decision about locality, authority, freshness, and failure modes.
Real-world examples:
- CPU hierarchy: L1/L2/L3 exist because DRAM is far slower than the core.
- Web delivery: CDNs exist because origin round trips across continents are too expensive for every request.
Why This Matters
The problem: Modern systems spend much of their time waiting on data, not merely computing. If every access goes to the deepest, farthest, or most authoritative layer, latency rises, throughput falls, and cost climbs.
Before:
- Every read pays the full cost of the authoritative source.
- Latency is dominated by distance, serialization, disk, or remote dependencies.
- Hot keys and hot objects repeatedly overload the same deep layer.
After:
- Common reads stay close to the consumer.
- Expensive systems are protected from repeated identical work.
- Performance becomes dominated by hit rate, miss penalty, and invalidation discipline rather than raw source latency alone.
Real-world impact: Good caching changes not just speed, but system shape. It reduces origin load, absorbs bursts, lowers infrastructure cost, and can make an otherwise impossible latency target achievable.
Learning Objectives
By the end of this session, you will be able to:
- Explain why caches exist at every layer - Connect locality and miss penalty from CPU to CDN.
- Describe the core mechanics of a cache - Reason about keys, hits, misses, eviction, and freshness.
- Evaluate cache trade-offs - Decide when a cache helps, when it hurts, and what new failure modes it introduces.
Core Concepts Explained
Concept 1: Caches Exist Because Access Cost Is Uneven
The most important fact about real systems is that access is not uniform. Reading a register is not like reading DRAM. Reading DRAM is not like reading SSD. Reading SSD is not like calling a remote service in another region.
That unevenness creates hierarchies:
- small and very fast storage close to computation
- larger and slower storage deeper in the system
- authoritative sources that are too expensive to hit every time
Caching is the generic response to that structure.
The intuition is simple:
- if a small hot subset is accessed far more often than the cold rest
- and if storing that subset near the consumer is cheaper than paying the deep-access cost every time
- then a cache is worth considering
This is why the same idea appears repeatedly:
- CPU caches keep recently used memory lines near the core.
- The OS page cache keeps disk-backed pages in RAM.
- Application caches keep decoded or computed values near the service.
- Redis keeps hot objects in memory instead of re-deriving them from deeper stores.
- CDNs keep content near users instead of pulling every object from origin.
The central trade-off appears immediately:
- you gain speed by serving a copy instead of the source
- but you accept that the copy may be missing, stale, or expensive to maintain
That is why a cache is never just storage. It is also policy.
Concept 2: Every Cache Is Defined by Four Questions
Once you strip away implementation details, nearly every cache can be understood by answering four questions.
1. What is the authoritative source?
2. What key identifies cached entries?
3. When do entries leave?
4. How does the cache learn the source changed?
These questions are more important than the brand of cache.
1. What is authoritative?
The cache is a copy, not the truth. The source of truth might be DRAM behind the CPU cache, disk behind the page cache, a database behind Redis, or origin behind a CDN.
If you are vague about authority, invalidation becomes impossible to reason about.
2. What is the key?
The cache has to decide what counts as "the same thing." In a CPU cache that may be an address line. In Redis it may be an application key. In a CDN it may be URL plus headers plus policy.
Bad keys are a silent failure mode:
- too broad -> wrong data reused
- too narrow -> poor hit rate
3. When does an entry leave?
Caches are smaller than the universe of possible data. So something must leave. That means eviction policy, which the next lesson will deepen.
4. How does freshness work?
This is the real pain point. A cache becomes dangerous when the copy outlives its validity. TTLs, validators, write-through, write-behind, purges, and revalidation are all answers to this one question.
This is why cache design should feel repetitive in a good way. No matter the layer, we keep coming back to authority, identity, capacity, and freshness.
Concept 3: Hit Rate Alone Is Not the Whole Story
Teams often celebrate hit rate as if it were the single cache metric that matters. It is important, but by itself it is incomplete.
The useful mental model is:
effective latency
~= hit_rate * hit_cost + miss_rate * miss_cost
This immediately shows why a cache with a "good" hit rate can still disappoint:
- misses may be catastrophically expensive
- refills may stampede the source
- staleness may be unacceptable
- cache maintenance may cost too much CPU or memory
A high hit rate on the wrong thing can even be misleading. For example:
- a CDN can have great hit rate while still serving stale content after a broken purge strategy
- an application cache can reduce DB load but increase correctness bugs if authorization-sensitive data is keyed badly
- a CPU cache can have decent average behavior while a workload still suffers from pathological misses in its hot loop
So the deeper lesson is:
- a cache is successful only when it improves the whole system's bottleneck
- not merely when it accumulates many hits
This is also where failure modes enter:
- cold starts
- thundering herds on miss
- stale reads
- cache pollution from large cold scans
- memory cost for entries that do not actually reduce expensive work
That is why caching is one of the most reused patterns in systems engineering and also one of the easiest to misuse. It sits exactly at the boundary between performance optimization and correctness risk.
Troubleshooting
Issue: "We added a cache, but latency did not improve much."
Why it happens / is confusing: The team measured hit rate but not miss penalty, refill cost, or whether the actual bottleneck moved elsewhere.
Clarification / Fix: Measure the full path: hit cost, miss cost, refill amplification, and source protection. A cache only helps if it meaningfully changes the dominant cost.
Issue: "If the cache serves stale data, the answer is to remove caching."
Why it happens / is confusing: Freshness failures feel like proof that the cache was a mistake.
Clarification / Fix: The real problem is usually an unclear authority or invalidation model. Fix the freshness contract before deciding the entire cache pattern is wrong.
Issue: "All caches are basically the same."
Why it happens / is confusing: The same vocabulary appears at many layers.
Clarification / Fix: The pattern is shared, but the constraints differ. CPU caches optimize nanoseconds and hardware coherence. Redis optimizes service-level latency and load shedding. CDNs optimize distance and global distribution.
Advanced Connections
Connection 1: Cache Fundamentals <-> Coherence and Invalidation
The parallel: Once multiple copies exist, the hard problem stops being "How do we copy fast?" and becomes "How do we keep copies acceptably fresh?" That leads directly to MESI, invalidation strategies, and write policies later in the month.
Real-world case: The same conceptual tension appears in CPU cache coherence and CDN purge workflows: many fast copies are useful until they disagree with reality.
Connection 2: Cache Fundamentals <-> Performance Profiling
The parallel: Cache behavior is one of the main reasons average latency and tail latency diverge. Profiling later in the month becomes more meaningful once you can distinguish a hit path from a miss path.
Real-world case: A system may look compute-bound on hits but dependency-bound on misses, so the performance story is incomplete until both paths are visible.
Resources
Optional Deepening Resources
- [DOCS] MDN Web Docs: HTTP Caching
- Link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching
- Focus: Use it as the clearest practical reference for cache freshness, validators, and revalidation semantics at the HTTP layer.
- [DOCS] Redis documentation: key eviction
- Link: https://redis.io/docs/latest/develop/reference/eviction/
- Focus: Read it to connect generic cache capacity pressure with concrete in-memory eviction policies used in production.
- [DOCS] Cloudflare Learning Center: What is a CDN?
- Link: https://www.cloudflare.com/learning/cdn/what-is-a-cdn/
- Focus: Treat it as a practical example of caching at global network scale rather than as a frontend-only concept.
- [PAPER] What Every Programmer Should Know About Memory
- Link: https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
- Focus: Use the early sections to deepen your intuition for locality and why CPU-side caching exists in the first place.
- [DOCS] Amazon CloudFront Developer Guide: How CloudFront works
- Link: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Introduction.html
- Focus: Compare CDN caching to lower-level caches so the same pattern stays visible across layers.
Key Insights
- Caching is one pattern repeated across layers - From CPU to CDN, the core idea is always to keep a faster copy near the work and pay a miss penalty when that copy is absent or stale.
- A cache is defined by policy, not just storage - Authority, key design, eviction, and freshness are what make a cache useful or dangerous.
- Hit rate is necessary but not sufficient - The real question is whether the cache improves the dominant system cost without introducing unacceptable correctness or operational risk.