LESSON

023 30 min intermediate

Day 251: Cache Purging Strategies - CDN Cache Invalidation

Purging is not "deleting cache." It is coordinating when thousands of distributed copies must stop being safe to serve.

Today's "Aha!" Moment

The insight: In a CDN, a purge is not a local erase operation. It is a control-plane action that must reach many PoPs, match the right cache keys, and avoid turning a freshness fix into an origin traffic disaster.

Why this matters: Teams often treat invalidation as an afterthought. They assume a deploy changes origin content, the edge will somehow "catch up," and a purge simply forces that to happen faster. In practice, purge strategy determines whether stale content disappears cleanly, whether variants are actually invalidated, and whether the origin survives the refill.

The universal pattern: authoritative content changes -> distributed cached copies may still look valid -> control plane spreads an invalidation signal -> edge nodes stop serving or revalidate those copies.

Concrete anchor: A homepage is cached in 180 edge locations, with variants by language and device. Product changes the hero banner and wants it live immediately. The real problem is no longer "What is the new HTML?" but "How do we make every unsafe copy stop being served without causing a global miss storm back to origin?"

How to recognize when this applies:

The same response exists in many edge caches at once.
Content changes before TTL expiry.
Cache keys include variants such as device, language, cookie buckets, or edge-added headers.

Common misconceptions:

[INCORRECT] "Purging means the content is physically removed everywhere instantly."
[INCORRECT] "Purge by URL is enough even when the cache key includes headers or tags."
[CORRECT] The truth: Purging is a distributed invalidation problem whose difficulty depends on variant space, propagation time, and how aggressively the system refills from origin afterward.

Real-world examples:

Asset deploys: Teams often prefer versioned URLs so new assets avoid broad purge traffic entirely.
Editorial updates: HTML, landing pages, product pages, and API responses often need selective purge, soft purge, or tag-based purge because the name stays stable while the content changes.

Why This Matters

The problem: CDN caching makes repeated reads cheap, but freshness corrections are expensive because the copies are geographically distributed and often shared across many users.

Before:

Stale content persists until TTL expiry.
Emergency fixes rely on blunt purge-all operations.
Origin gets overwhelmed when too many objects refill at once.

After:

Purge scope matches the real content boundary.
The edge stops serving invalid content with controlled blast radius.
Origin is protected through revalidation, soft purge, shielding, and targeted refill patterns.

Real-world impact: A good purge strategy shortens incident duration, reduces rollout risk, and keeps the CDN helping even during fast content change. A bad one creates stale leaks, surprising misses, and global traffic spikes at exactly the wrong moment.

Learning Objectives

By the end of this session, you will be able to:

Explain why purging is a distributed systems problem - Connect CDN invalidation to control-plane propagation, cache keys, and refill behavior.
Describe the main purge strategies - Compare full purge, URL purge, tag/key purge, and soft purge vs hard purge.
Evaluate operational trade-offs - Decide how to remove stale content while minimizing origin shock, missed variants, and purge blast radius.

Core Concepts Explained

Concept 1: Purging Is About Scope, Not Just Speed

At a local cache level, invalidation can feel simple:

mark entry invalid
next read misses
refill from authority

At CDN scale, the question becomes much harder:

which copies should stop being served?

That depends on the effective cache key, not just the URL.

If edge caching varies by:

path
query string
country
device type
language
selected headers
surrogate tags

then "purge /home" may or may not cover the actual population of cached objects that users can receive.

This makes purge design mostly a scope problem:

too broad -> you invalidate far more than necessary
too narrow -> stale variants remain live

That is why mature CDN setups tend to rely on a small set of explicit invalidation handles:

versioned asset names when possible
URL purge for narrow object updates
tag/surrogate-key purge when many related objects must move together
purge-all only for exceptional cases

The important mindset shift is:

purge does not fix correctness by force
purge fixes correctness only if its target model matches the real cache key space

Concept 2: Hard Purge and Soft Purge Optimize for Different Failure Modes

There are two broad behaviors a CDN can take after invalidation.

Hard purge means the object is no longer treated as servable:

next request becomes a miss
edge must go back to origin

This is simple and sometimes necessary, but dangerous at scale because many PoPs may refill simultaneously.

Soft purge means the object is marked stale rather than immediately thrown away:

edge knows the object should no longer be considered fresh
but can often revalidate or continue serving stale briefly while refresh happens

Soft purge is valuable because it trades "immediate cold miss everywhere" for "controlled transition toward freshness."

That helps with:

origin protection
smoother refills
avoiding thundering herds after global invalidation

So the real trade-off is:

hard purge optimizes freshness certainty
soft purge optimizes refill stability

This is also where CDN behavior connects back to HTTP validators and cache semantics:

ETag
Last-Modified
conditional requests
stale-while-revalidate style patterns

The CDN works best when purge and revalidation cooperate. If purge exists alone, the system often replaces stale-risk with origin-risk.

Concept 3: The Hard Part Is What Happens After the Purge

Most teams think about the command:

purge URL
purge tag
purge everything

But the bigger design question is what the system looks like one second later.

After invalidation, several things can go wrong:

many users request the same object at once
many PoPs re-fetch in parallel
origin or shield sees a sudden wave of misses
cache fragmentation reduces refill efficiency
some variants refill correctly while others stay stale because the purge key was incomplete

This is why purge is tightly linked to:

origin shielding
request collapsing / coalescing
revalidation instead of unconditional re-fetch
careful cache-key design
edge logic that avoids accidental variant explosion

The previous lesson matters directly here. If an edge function rewrites headers or expands the cache key space, purge becomes more complex because the number of distinct copies grows.

The next lesson also follows naturally: CDN optimization is partly about making these refill and propagation behaviors cheaper and more predictable.

So the mature mental model is:

purge is not the end of the cache lifecycle
purge is the start of a transition period whose stability matters just as much as the invalidation itself

Troubleshooting

Issue: "We purged the page, but some users still saw stale content."

Why it happens / is confusing: Teams assume one URL maps to one cached object everywhere.

Clarification / Fix: Check the real cache key and variant space. If headers, device class, geo, or language affect the key, some copies may not have matched the purge request.

Issue: "Purge-all fixed freshness, but the origin fell over right after."

Why it happens / is confusing: The purge looks successful from the CDN control plane perspective.

Clarification / Fix: The problem is the refill wave after invalidation. Prefer versioned assets, narrower purge scope, soft purge, shielding, and request coalescing where available.

Issue: "Edge logic and purge strategy seem unrelated."

Why it happens / is confusing: One looks like request computation and the other like cache administration.

Clarification / Fix: Edge logic often changes cache keys and response variants. That directly changes purge scope, blast radius, and the likelihood of stale leftovers.

Advanced Connections

Connection 1: Cache Purging <-> Edge Functions

The parallel: Edge functions can rewrite requests, add variant dimensions, or alter cacheability. That means they indirectly define how difficult it will be to invalidate content later.

Real-world case: A personalized edge rule that varies cache on language and device can silently multiply the number of objects a later purge must cover.

Connection 2: Cache Purging <-> CDN Optimization Techniques

The parallel: Purging is one of the clearest places where control-plane correctness and data-plane efficiency meet. Optimization is not only about hit rate, but about how safely the system transitions after invalidation.

Real-world case: Two CDNs may both support purge-by-tag, but the operational difference appears in propagation behavior, shielding, refill amplification, and stale handling under load.

Resources

Optional Deepening Resources

[DOCS] Amazon CloudFront Developer Guide: Invalidate files to remove content
- Link: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html
- Focus: Use it to understand invalidation mechanics, trade-offs with versioned file names, and operational limits in a major CDN.
[DOCS] Cloudflare API: Cache purge
- Link: https://developers.cloudflare.com/api/node/resources/cache/methods/purge/
- Focus: Read it for concrete purge modes such as purge everything and granular purge by URL or custom cache-key context.
[DOCS] Fastly Documentation: Purging
- Link: https://www.fastly.com/documentation/reference/api/purging/
- Focus: Use it to study soft vs hard purge and surrogate-key-based invalidation in a CDN built around edge cache control.
[DOCS] MDN Web Docs: HTTP Caching
- Link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching
- Focus: Revisit validators and freshness semantics so purge, revalidation, and stale handling fit into one mental model.

Key Insights

Purge scope matters more than the button you press - The real problem is matching the invalidation to the effective cache-key space.
Hard purge and soft purge protect different things - One prioritizes immediate freshness, the other prioritizes refill stability and origin safety.
Invalidation is followed by a transition period - The operational risk often comes after the purge, when the edge and origin renegotiate freshness under real traffic.

← Back to Caching, Workers, and Performance

← Back to Learning Hub