Cache Purging Strategies - CDN Cache Invalidation

LESSON

Caching, Workers, and Performance

023 30 min intermediate

Day 251: Cache Purging Strategies - CDN Cache Invalidation

Purging is not "deleting cache." It is coordinating when thousands of distributed copies must stop being safe to serve.


Today's "Aha!" Moment

The insight: In a CDN, a purge is not a local erase operation. It is a control-plane action that must reach many PoPs, match the right cache keys, and avoid turning a freshness fix into an origin traffic disaster.

Why this matters: Teams often treat invalidation as an afterthought. They assume a deploy changes origin content, the edge will somehow "catch up," and a purge simply forces that to happen faster. In practice, purge strategy determines whether stale content disappears cleanly, whether variants are actually invalidated, and whether the origin survives the refill.

The universal pattern: authoritative content changes -> distributed cached copies may still look valid -> control plane spreads an invalidation signal -> edge nodes stop serving or revalidate those copies.

Concrete anchor: A homepage is cached in 180 edge locations, with variants by language and device. Product changes the hero banner and wants it live immediately. The real problem is no longer "What is the new HTML?" but "How do we make every unsafe copy stop being served without causing a global miss storm back to origin?"

How to recognize when this applies:

Common misconceptions:

Real-world examples:

  1. Asset deploys: Teams often prefer versioned URLs so new assets avoid broad purge traffic entirely.
  2. Editorial updates: HTML, landing pages, product pages, and API responses often need selective purge, soft purge, or tag-based purge because the name stays stable while the content changes.

Why This Matters

The problem: CDN caching makes repeated reads cheap, but freshness corrections are expensive because the copies are geographically distributed and often shared across many users.

Before:

After:

Real-world impact: A good purge strategy shortens incident duration, reduces rollout risk, and keeps the CDN helping even during fast content change. A bad one creates stale leaks, surprising misses, and global traffic spikes at exactly the wrong moment.


Learning Objectives

By the end of this session, you will be able to:

  1. Explain why purging is a distributed systems problem - Connect CDN invalidation to control-plane propagation, cache keys, and refill behavior.
  2. Describe the main purge strategies - Compare full purge, URL purge, tag/key purge, and soft purge vs hard purge.
  3. Evaluate operational trade-offs - Decide how to remove stale content while minimizing origin shock, missed variants, and purge blast radius.

Core Concepts Explained

Concept 1: Purging Is About Scope, Not Just Speed

At a local cache level, invalidation can feel simple:

At CDN scale, the question becomes much harder:

That depends on the effective cache key, not just the URL.

If edge caching varies by:

then "purge /home" may or may not cover the actual population of cached objects that users can receive.

This makes purge design mostly a scope problem:

That is why mature CDN setups tend to rely on a small set of explicit invalidation handles:

The important mindset shift is:

Concept 2: Hard Purge and Soft Purge Optimize for Different Failure Modes

There are two broad behaviors a CDN can take after invalidation.

Hard purge means the object is no longer treated as servable:

This is simple and sometimes necessary, but dangerous at scale because many PoPs may refill simultaneously.

Soft purge means the object is marked stale rather than immediately thrown away:

Soft purge is valuable because it trades "immediate cold miss everywhere" for "controlled transition toward freshness."

That helps with:

So the real trade-off is:

This is also where CDN behavior connects back to HTTP validators and cache semantics:

The CDN works best when purge and revalidation cooperate. If purge exists alone, the system often replaces stale-risk with origin-risk.

Concept 3: The Hard Part Is What Happens After the Purge

Most teams think about the command:

But the bigger design question is what the system looks like one second later.

After invalidation, several things can go wrong:

This is why purge is tightly linked to:

The previous lesson matters directly here. If an edge function rewrites headers or expands the cache key space, purge becomes more complex because the number of distinct copies grows.

The next lesson also follows naturally: CDN optimization is partly about making these refill and propagation behaviors cheaper and more predictable.

So the mature mental model is:


Troubleshooting

Issue: "We purged the page, but some users still saw stale content."

Why it happens / is confusing: Teams assume one URL maps to one cached object everywhere.

Clarification / Fix: Check the real cache key and variant space. If headers, device class, geo, or language affect the key, some copies may not have matched the purge request.

Issue: "Purge-all fixed freshness, but the origin fell over right after."

Why it happens / is confusing: The purge looks successful from the CDN control plane perspective.

Clarification / Fix: The problem is the refill wave after invalidation. Prefer versioned assets, narrower purge scope, soft purge, shielding, and request coalescing where available.

Issue: "Edge logic and purge strategy seem unrelated."

Why it happens / is confusing: One looks like request computation and the other like cache administration.

Clarification / Fix: Edge logic often changes cache keys and response variants. That directly changes purge scope, blast radius, and the likelihood of stale leftovers.


Advanced Connections

Connection 1: Cache Purging <-> Edge Functions

The parallel: Edge functions can rewrite requests, add variant dimensions, or alter cacheability. That means they indirectly define how difficult it will be to invalidate content later.

Real-world case: A personalized edge rule that varies cache on language and device can silently multiply the number of objects a later purge must cover.

Connection 2: Cache Purging <-> CDN Optimization Techniques

The parallel: Purging is one of the clearest places where control-plane correctness and data-plane efficiency meet. Optimization is not only about hit rate, but about how safely the system transitions after invalidation.

Real-world case: Two CDNs may both support purge-by-tag, but the operational difference appears in propagation behavior, shielding, refill amplification, and stale handling under load.


Resources

Optional Deepening Resources


Key Insights

  1. Purge scope matters more than the button you press - The real problem is matching the invalidation to the effective cache-key space.
  2. Hard purge and soft purge protect different things - One prioritizes immediate freshness, the other prioritizes refill stability and origin safety.
  3. Invalidation is followed by a transition period - The operational risk often comes after the purge, when the edge and origin renegotiate freshness under real traffic.

PREVIOUS Edge Functions - Compute at the CDN Edge NEXT CDN Optimization Techniques - Performance at Scale

← Back to Caching, Workers, and Performance

← Back to Learning Hub