Day 039: Caching Across Storage Layers

A cache is not just fast memory. It is a decision about which requests stay near the work and which ones are forced to pay a deeper, slower path.

Today's "Aha!" Moment

Engineers often talk about caches as if they were optional accelerators that sit on the side of the real system. That picture misses the important part. Once a cache exists, the system no longer has one read path. It has at least two: the cheap hit path and the expensive miss path. Performance, cost, and even failure behavior start depending on how often requests stay in the fast path and what happens when they do not.

Consider the learning platform serving lesson pages. A request for a thumbnail might hit the browser cache, then a CDN edge, then an application cache, then object storage. A request for lesson metadata might hit in-process memory, then Redis, then the database. These are not separate tricks. They are layers in one hierarchy, each designed to avoid a different kind of cost: CPU work, local disk access, cross-network lookup, deep storage access, or long geographic distance.

That is the key shift: caching is really about shaping where load lands. A good cache keeps common requests away from deeper, more expensive layers. A bad cache, or a good cache under the wrong workload, can do the opposite by causing miss storms, stale reads, and refill cascades that hammer the authority underneath. This is why hit rate alone is never the whole story.

So the right question is not "should we add a cache?" It is "which expensive path are we trying to avoid, how reusable is the data, who owns freshness, and what happens when many callers miss at once?" Once you ask those questions, cache design becomes much more concrete.

Why This Matters

The problem: Teams often add caches reactively to fix latency or cost, but without a clear model of hit paths, miss paths, refill behavior, and freshness requirements.

Before:

Each cache layer is treated as an isolated optimization.
Hit rate is watched without understanding miss cost or downstream load transfer.
Staleness and invalidation are considered later, after the performance gain is already relied upon.

After:

Cache layers are treated as one coordinated hierarchy.
Designers reason explicitly about what each layer saves and what happens when it fails to save it.
Freshness, refill, and eviction are handled as correctness and stability concerns, not just tuning details.

Real-world impact: Better latency, lower origin load, fewer thundering-herd incidents, and clearer decisions about what should be cached locally, shared in Redis, or pushed out to the edge.

Learning Objectives

By the end of this session, you will be able to:

Explain caching as a hierarchy - Relate page cache, in-process cache, Redis, and CDN layers through the same design logic.
Reason about hits, misses, and refill paths - Understand how caches move work across the system instead of merely making reads faster.
Treat freshness as part of cache correctness - Evaluate TTLs, invalidation, and stale-data risk alongside performance gains.

Core Concepts Explained

Concept 1: A Cache Is Valuable Only Because the Miss Path Is Expensive Enough

Suppose the lesson platform keeps thumbnail metadata in memory near the application. If the lookup hits, the request avoids a network trip to Redis or the database. If it misses, the request falls through to a deeper authoritative layer and then may repopulate the cache. That is the whole economic logic of caching: a miss path exists, it is costly enough, and the workload is repetitive enough that avoiding it repeatedly pays for the extra layer.

This is why a cache is not automatically helpful. If the underlying data is rarely reused, extremely volatile, or cheap to fetch, the cache may add complexity without returning enough value. Caches help when three conditions align:

the miss path is meaningfully slower or more expensive
data is reused often enough for hits to matter
the fast layer can hold a useful working set

One concise picture is:

request
  -> check fast layer
     -> hit: finish cheaply
     -> miss: pay expensive path, maybe refill

The trade-off is simple. You gain lower latency and reduced pressure on deeper layers, but you add another component whose capacity, eviction, and correctness must now be managed.

Concept 2: Different Cache Layers Solve Different Distances and Sharing Patterns

A local in-process cache, a Redis cache, and a CDN are all "caches," but they solve different problems. They are not interchangeable because they sit at different points in the hierarchy and save different costs.

For example, lesson metadata might use:

in-process cache to avoid even the network hop to Redis
Redis to share hot values across many app instances
the database as the authority

Static thumbnails might use:

browser cache for repeat views on one client
CDN edge cache for geographic proximity
object storage as the durable origin

client/browser -> edge cache -> app cache -> shared cache -> authority

Each layer answers a different question:

how close can we keep the hottest data to this exact caller?
how much sharing do we need across processes or machines?
which layer owns durability and correctness?

That is why "one big cache" rarely replaces a hierarchy. A local cache solves per-process locality. A shared cache solves cross-instance duplication. An edge cache solves geographic distance. They overlap, but they are not substitutes.

The trade-off is coordination versus locality. More layers can eliminate more expensive paths, but they also make the system harder to reason about unless ownership and refill flow are clear.

Concept 3: Freshness, Invalidation, and Refill Storms Are the Real Hard Parts

Most cache pain appears not on hits but when the cached answer stops being trustworthy or when many callers miss together. If a course title changes in the database but stale values remain in Redis and the CDN, the system becomes fast and wrong. If a popular key expires and thousands of requests fall through to the database at once, the system may become correct and overloaded.

This is why cache correctness has two major concerns:

freshness: how stale is acceptable, and who decides?
refill behavior: what happens when misses cluster on the same object?

key expires
-> many callers miss
-> deep store gets hammered
-> refill slows down
-> even more callers wait or retry

That is the classic cache stampede or thundering-herd shape. Good cache design therefore includes more than a TTL. It may require versioned keys, explicit invalidation, request coalescing, background refresh, stale-while-revalidate, or careful negative caching for absent objects.

def read_lesson_title(lesson_id, local_cache, redis_cache, database):
    title = local_cache.get(lesson_id)
    if title is not None:
        return title

    title = redis_cache.get(lesson_id)
    if title is not None:
        local_cache.put(lesson_id, title)
        return title

    title = database.fetch_title(lesson_id)
    redis_cache.put(lesson_id, title)
    local_cache.put(lesson_id, title)
    return title

The code looks harmless, but it hides the real questions: what if database.fetch_title is slow, what if many requests miss simultaneously, how long may a cached title remain stale, and who invalidates it after an update? Those are cache-design questions, not implementation leftovers.

The trade-off is between speed and semantic complexity. Caches can radically improve performance, but only if freshness rules and refill behavior are designed as carefully as the happy-path hit logic.

Troubleshooting

Issue: Hit rate is treated as the main success metric.

Why it happens / is confusing: Hit rate is easy to graph and sounds intuitive, so it becomes the default summary of cache health.

Clarification / Fix: Pair hit rate with miss cost, refill latency, stampede risk, and downstream saturation. A small hit-rate drop on an expensive path can matter more than a large hit-rate drop on a cheap one.

Issue: Staleness is treated as a minor detail after the cache already proves fast.

Why it happens / is confusing: Performance gains are immediate and visible, while stale-data bugs often appear later and only for some users.

Clarification / Fix: Decide up front what may be stale, for how long, and how invalidation or revalidation works. Fast but wrong answers are still failures.

Advanced Connections

Connection 1: Caching ↔ Memory Hierarchies

The parallel: From CPU caches to page cache to Redis and CDN edges, the same idea repeats: keep the working set closer than the full dataset because locality makes it worthwhile.

Real-world case: Systems that already reason well about hot and cold data in memory often design better multi-layer caches in distributed environments too.

Connection 2: Caching ↔ Cost Shaping

The parallel: Caches do not just reduce latency. They decide which expensive layers are exercised and how often, which directly affects compute, storage, and network cost.

Real-world case: A CDN in front of object storage can cut both user latency and origin egress cost precisely because it changes where repeated requests land.

Resources

Optional Deepening Resources

These resources are optional and are not required for the core 30-minute path.
[DOC] Redis Key Eviction
- Link: https://redis.io/docs/latest/develop/reference/eviction/
- Focus: See how limited fast storage decides what remains hot under pressure.
[DOC] Amazon CloudFront Developer Guide
- Link: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Introduction.html
- Focus: Extend cache thinking outward to edge delivery and global request locality.
[BOOK] Designing Data-Intensive Applications
- Link: https://dataintensive.net/
- Focus: Revisit invalidation, derived views, and the trade-offs around stale versus fresh reads.

Key Insights

A cache changes the shape of the read path - Once it exists, the system has distinct hit and miss behaviors that matter architecturally.
Different cache layers save different costs - Local, shared, and edge caches each solve a different distance or sharing problem.
Freshness and refill behavior are first-class concerns - Invalidating stale data and surviving clustered misses matter as much as raw speed.

Knowledge Check (Test Questions)

What makes a cache layer worth adding?
- A) The miss path is expensive enough and the data is reused enough to justify a faster layer.
- B) The system stores any data at all.
- C) The cache product is easy to deploy.
Why might a system use both an in-process cache and a shared Redis cache?
- A) Because one saves local per-process cost while the other saves cross-instance duplication and deeper reads.
- B) Because caches only work if they are duplicated blindly.
- C) Because local caches guarantee perfect freshness.
Why is invalidation central to cache design?
- A) Because caches are only useful if the system can control when a fast answer is still acceptable to serve.
- B) Because invalidation removes the need for an authoritative source.
- C) Because invalidation guarantees 100% hit rate.

Answers

1. A: A cache pays for itself when it repeatedly avoids a path that is slow, expensive, or overloaded enough to matter.

2. A: Local and shared caches solve different locality levels, so using both can remove different kinds of cost from the read path.

3. A: Cached data must still meet freshness expectations, so invalidation or revalidation rules are part of correctness, not a side detail.

← Back to Learning