LESSON

014 30 min intermediate

Day 242: Cache Eviction Policies - LRU, LFU, ARC

Eviction is where a cache stops being just memory and becomes a prediction about the future.

Today's "Aha!" Moment

The insight: A cache is small by design, so it cannot keep everything. The eviction policy is the cache's bet about what will be useful next. LRU, LFU, and ARC are not just algorithms; they are different beliefs about what past access patterns mean.

Why this matters: Teams often talk about cache hits as if the cache were neutral. It is not. The cache keeps certain objects and throws others away according to a policy. If that policy mismatches the workload, a large cache can still behave badly.

The universal pattern: finite capacity -> some entries must leave -> the policy encodes what "valuable to keep" means.

Concrete anchor: Imagine an API cache serving product pages. If users repeatedly revisit a small popular set, LFU can shine. If the hot set changes rapidly with current navigation, LRU may do better. If the workload mixes both behaviors, a hybrid like ARC becomes attractive.

How to recognize when this applies:

The cache is full and hit rate matters.
Different traffic patterns produce very different cache behavior.
Large scans or sudden popularity shifts make the cache feel "unstable."

Common misconceptions:

[INCORRECT] "Eviction is just deleting the oldest item."
[INCORRECT] "A bigger cache solves a bad eviction policy."
[CORRECT] The truth: Eviction is workload prediction under limited capacity, and the wrong prediction can waste memory and hurt latency.

Real-world examples:

Web/CDN caching: A one-time content sweep can pollute a recency-based cache if recent does not mean reusable.
In-memory databases: Stable hot keys reward frequency-aware policies more than purely recent access.

Why This Matters

The problem: Once a cache reaches capacity, every new admission forces a choice. If the system evicts the wrong item, the next request pays the miss penalty even though the cache did have enough memory to help something else.

Before:

Teams treat eviction as an implementation detail.
Cache misses are blamed on capacity alone.
Workload shape is ignored when choosing a policy.

After:

Teams reason explicitly about recency, frequency, scans, and phase shifts.
Cache behavior is understood as part of performance engineering, not just storage.
Policy choice becomes tied to workload shape and miss cost.

Real-world impact: A good eviction policy improves hit rate without increasing memory, protects deeper systems from repeated misses, and keeps the cache useful under changing traffic instead of only under ideal steady-state loads.

Learning Objectives

By the end of this session, you will be able to:

Explain why eviction exists - Connect finite capacity to the need for workload-informed replacement decisions.
Compare LRU, LFU, and ARC - Understand what each policy assumes about future reuse.
Evaluate policy fit in practice - Recognize when recency, frequency, or adaptive hybrids are more appropriate.

Core Concepts Explained

Concept 1: Eviction Is the Cache's Forecasting Layer

The first cache lesson established that a cache is a smaller, faster copy. This lesson adds the missing pressure: the cache is smaller than the working set, so not everything can stay.

That means a cache must answer one uncomfortable question repeatedly:

If I need room for one more item, which existing item is least worth keeping?

That question is harder than it looks because the cache does not know the future. It only knows the past. So every eviction policy turns past access patterns into a forecast:

recently used probably means useful again soon
frequently used probably means globally important
a mix of both may be needed if the workload changes phase

This is why eviction policy matters even when cache size is fixed. Two caches with the same memory budget can produce very different hit rates if they throw away different items.

The trade-off begins here:

simple policies are cheap to maintain and easy to explain
but simple policies often fail badly on workloads they were not designed for

So eviction is not merely about deletion. It is about preserving the most future value per byte of scarce fast storage.

Concept 2: `LRU` and `LFU` Encode Different Assumptions About Reuse

LRU and LFU are the two classic baselines because they represent two strong but different intuitions.

LRU - Least Recently Used

LRU assumes that recency predicts future reuse. If an item was touched recently, it is more likely to be touched again soon.

This works well when workloads have temporal locality, such as:

navigational sessions
hot loops
working sets that drift but stay clustered in time

Its strengths:

intuitive
usually effective for recency-heavy access
relatively straightforward to reason about

Its failure mode is equally important:

a large one-time scan can push out genuinely useful items simply because the scan touched many new objects recently

LFU - Least Frequently Used

LFU assumes long-term popularity predicts future value. Items accessed many times deserve to stay even if they were not touched in the last few moments.

This works well when the workload has stable hot items, such as:

repeatedly requested product metadata
popular configuration values
skewed key popularity in caches and key-value stores

Its strengths:

good at protecting truly popular items
resists some recency noise better than LRU

Its failure mode:

it can become sticky, keeping yesterday's winners even after the workload changes

So LRU tends to adapt faster but can be fooled by bursts and scans. LFU tends to remember long-term value but can adapt too slowly when the world changes.

That is the core contrast:

LRU bets on what was hot recently
LFU bets on what has been important over time

Neither is universally right. Each wins when its workload assumption is true.

Concept 3: `ARC` Tries to Adapt When Recency and Frequency Keep Taking Turns

ARC exists because real workloads rarely stay pure. Many systems mix:

bursts of recent exploration
stable popular objects
shifts between one pattern and the other

ARC can be understood as an adaptive compromise. Instead of hard-coding one balance between recency and frequency, it keeps evidence about both and adjusts online.

At a high level:

one structure tracks items that look valuable because they were seen recently
another tracks items that look valuable because they have been reused
"ghost" history tracks what was evicted, so the algorithm can learn whether it is throwing away the wrong kind of object

That last part matters. ARC does not just store live cache entries; it also stores metadata about recently evicted entries so it can infer whether the workload is currently rewarding recency or frequency more.

The intuition is:

If recently-evicted recent items are requested again,
lean more toward recency.

If recently-evicted frequent items are requested again,
lean more toward frequency.

That is why ARC is conceptually powerful. It turns eviction into a learning problem rather than a fixed one-parameter guess.

But it is not free.

Costs include:

more bookkeeping than plain LRU
more implementation complexity
more state to maintain and reason about

This gives us the practical lesson:

adaptive policies can protect a cache from workload shifts
but they cost more, so they make sense only when that adaptability is worth the added machinery

In practice, many production systems also use approximations rather than textbook-perfect policies. That is a sign of engineering realism: exact policy quality matters, but so does metadata overhead, concurrency cost, and simplicity of maintenance.

Troubleshooting

Issue: "The cache is big, but hit rate still collapses during scans."

Why it happens / is confusing: A recency-heavy policy like LRU may treat the scan as if all scanned objects suddenly mattered.

Clarification / Fix: Check whether the workload includes large one-time sweeps or bursty exploration. A frequency-aware or adaptive policy may fit better, or the scan path may need different admission logic.

Issue: "Our LFU cache keeps stale popularity and adapts too slowly."

Why it happens / is confusing: Long-term frequency can preserve yesterday's hot items after demand shifts.

Clarification / Fix: Use aging/decay, approximate counters, or a more adaptive policy when phase changes matter. Frequency without forgetting becomes inertia.

Issue: "ARC is obviously better, so we should always choose it."

Why it happens / is confusing: Adaptive policies sound like universal winners.

Clarification / Fix: ARC is attractive when workloads alternate between recency-heavy and frequency-heavy behavior, but its extra bookkeeping and complexity must still be justified against the miss cost and system constraints.

Advanced Connections

Connection 1: Cache Eviction Policies <-> Allocators and Memory Pressure

The parallel: Both are scarce-resource policies. Allocators decide how scarce fast memory gets carved up; eviction decides which cached entries deserve to stay once that memory is full.

Real-world case: In in-memory systems, poor eviction and poor allocator fit can amplify each other: the cache keeps the wrong entries, while fragmentation makes the memory budget feel even smaller.

Connection 2: Cache Eviction Policies <-> CDN and Application Caches

The parallel: Even at higher layers, the same recency vs frequency tension appears. A CDN facing flash crowds and a local application cache facing hot keys are both solving "what do we keep?" under finite capacity.

Real-world case: A policy that looks fine in a lab can fail in production because real traffic includes scans, bursts, and popularity shifts that stress the wrong heuristic.

Resources

Optional Deepening Resources

[DOCS] Redis eviction policies
- Link: https://redis.io/docs/latest/develop/reference/eviction/
- Focus: Use it to connect textbook replacement policies with a production system that implements practical approximations of LRU and LFU.
[PAPER] ARC: A Self-Tuning, Low Overhead Replacement Cache
- Link: https://www.cs.cmu.edu/~natassa/courses/15-721/papers/arcfast.pdf
- Focus: Read it for the original adaptive idea behind balancing recency and frequency online.
[DOCS] Caffeine eviction wiki
- Link: https://github.com/ben-manes/caffeine/wiki/Eviction
- Focus: See how a modern production cache talks about eviction in terms of policy quality and operational trade-offs rather than textbook purity alone.
[PAPER] LRU-K page replacement algorithm
- Link: https://sigmodrecord.org/?smd_process_download=1&download_id=9425
- Focus: Use it to deepen your intuition for why plain recency is sometimes too weak and why richer history can help.

Key Insights

Eviction is prediction, not cleanup - Once a cache is full, every replacement decision encodes a guess about what future requests will value most.
LRU and LFU are different bets - LRU trusts recent reuse, LFU trusts durable popularity, and each fails when its workload assumption stops matching reality.
Adaptive policies exist because workloads shift - ARC is useful precisely because real systems alternate between recency-heavy and frequency-heavy phases instead of staying pure.

← Back to Caching, Workers, and Performance

← Back to Learning Hub

Cache Eviction Policies - LRU, LFU, ARC