Cache Purge, Surrogate Keys, and Content Invalidation

LESSON

020 25 min intermediate

Cache Purge, Surrogate Keys, and Content Invalidation

The core idea: cache invalidation works best when it targets the content model behind many URLs, not just individual paths, because the same product, article, or asset can appear in several cached representations at once.

Core Insight

Imagine a shop changes the price of product 42 from 49 euros to 39 euros. The product page is cached at the CDN. So is a category page, a search result page, a homepage promotion, and a JSON response used by the mobile app. The database is correct. The origin renderer is correct. Some users still see the old price because edges around the world have cached representations that were correct when they were stored.

The naive fix is to purge the product URL:

/products/42

That helps only one representation. It does not necessarily purge /category/shoes, /search?q=running, /api/products/42, localized variants, AMP pages, or resized fragments that include the same price. Purging every URL manually is brittle because the content relationship lives in the product model, not in one path.

This is the pressure that surrogate keys solve. A surrogate key, also called a cache tag in some systems, is metadata attached to a cached response that names the content objects behind it. A response can have a URL-based cache key for lookup and also a set of surrogate keys for invalidation. The CDN uses the cache key to answer "does this request match an object?" It uses surrogate keys to answer "which cached objects depend on product 42?"

The trade-off is fast updates versus cache efficiency and control-plane complexity. Long TTLs and broad edge caching reduce origin load. Purge APIs and surrogate keys let you update cached content before TTL expiry. But every purge path becomes a distributed control-plane action: it needs correct tags, rate limits, retries, ordering with origin updates, and evidence that stale content has actually disappeared.

Cache Keys and Surrogate Keys Do Different Jobs

The previous lesson focused on cache keys. A cache key identifies one cached representation for request lookup:

host=www.shop.test
path=/products/42
query=currency=eur
language=es

That key is precise. It decides whether a request can reuse a stored response. But it is not enough for invalidation, because product 42 may appear in many representations:

/products/42
/products/42?currency=eur
/category/running-shoes?page=1
/search?q=trail
/api/products/42
/homepage

Surrogate keys add a second index. When the origin returns a response, it can attach tags such as:

Surrogate-Key: product:42 category:running-shoes price-list:eu homepage-promo
Cache-Control: public, max-age=3600, s-maxage=3600

Different CDNs use different header names and APIs, but the model is stable: each cached response can be associated with logical content identifiers. Later, the purge system can say:

purge all cached responses tagged product:42

That invalidates every edge object carrying the product:42 tag, even when those objects have different URLs and cache keys. The content model now has an invalidation handle.

There is a boundary sentence worth keeping clear: surrogate keys are not the same as cache keys. A cache key is used during a request to find the object. A surrogate key is used during a purge to find groups of objects that should stop being served.

What Happens When Content Changes

Return to the price change. A safe invalidation path has a few inspectable steps:

1. Product database updates product:42 price from 49 to 39.
2. Rendering layer can now produce the new price.
3. Content event says product:42 changed.
4. Invalidation worker maps that event to surrogate keys.
5. Worker calls CDN purge API for product:42 and related keys.
6. CDN marks matching cached objects invalid or stale.
7. Next requests miss or revalidate.
8. Origin returns fresh representations.
9. Edges refill with responses tagged product:42 again.

The ordering matters. If the purge happens before the origin can render the new price, the first post-purge miss may fetch the old price again and repopulate the cache with stale content. This is a common failure: the purge was real, but the refill source was not ready.

The invalidation worker also needs to map content to all affected surfaces. If product 42 appears on category pages, the event may need more than product:42:

product:42
category:running-shoes
search-index:products
homepage-promo
price-list:eu

This does not mean every update should purge the whole site. The point is to express the dependency graph at the right level. A typo fix in a product description may affect the product page and search snippet. A price change may affect product pages, category tiles, cart recommendations, and campaign pages. A new product image may be better handled with a versioned asset URL rather than a purge.

The purge API response is another boundary. Many APIs tell you that the purge request was accepted, not that every edge has already stopped serving the object. The application should not treat "HTTP 200 from purge API" as proof that every user worldwide sees the new price. It is evidence that the CDN control plane accepted work. The user-facing proof is a fresh response at the edge.

Hard Purge, Soft Purge, and Revalidation

Invalidation can mean several things.

A hard purge removes or invalidates an object so the next request cannot use the old stored response. It usually creates a miss. That miss goes upstream, possibly through an origin shield, and refills the cache with whatever the origin returns.

A soft purge marks an object stale while allowing controlled reuse or revalidation. The next request may trigger validation with the origin using ETag or Last-Modified. If the origin says the representation has not changed, the cache can keep using it. If it changed, the cache stores the new version. Soft purge can reduce origin shock because it does not always force a full refetch.

HTTP revalidation uses validators:

ETag: "product-42-v18"
Last-Modified: Tue, 16 Jun 2026 10:15:00 GMT

A cache can ask:

If-None-Match: "product-42-v18"

The origin can answer 304 Not Modified if the cached representation is still valid, or 200 OK with new bytes if it changed. This is not a replacement for purging when you need faster-than-TTL updates, but it is an important way to refresh stale objects cheaply.

stale-while-revalidate and stale-if-error add controlled tolerance. A cache may serve a stale response briefly while it refreshes in the background, or serve stale content when the origin is failing. That can protect availability, but it must match the content's risk. Stale product images are usually acceptable for a short window. Stale prices, legal copy, security state, or account-specific data may not be.

Worked Path: Price Change Without Purging the World

The shop has these cached responses:

URL: /products/42
tags: product:42 category:running-shoes price-list:eu

URL: /category/running-shoes
tags: category:running-shoes product:42 product:77 product:81

URL: /homepage
tags: homepage-promo product:42

URL: /assets/product-42.abc123.jpg
tags: asset:product-42-image

At 10:00, the price changes. The content system emits:

{
  "type": "product_price_changed",
  "product_id": "42",
  "price_list": "eu",
  "new_price": 39
}

The invalidation worker converts this to:

purge surrogate keys:
product:42
price-list:eu
homepage-promo

It does not purge every URL. It also does not purge the image asset, because the image did not change. Static assets use a different strategy: content-hashed URLs. If the image bytes change, the build can publish:

/assets/product-42.def456.jpg

The new URL becomes the new cache key. Old URLs can keep a long TTL because they refer to old immutable bytes. Versioned assets reduce the need for purges.

After the purge is accepted, the worker samples edge reads:

GET /products/42              -> Age: 0, price 39
GET /category/running-shoes   -> Age: 0, tile price 39
GET /homepage                 -> Age: 0, promo price 39
GET /assets/product-42.abc123.jpg -> cache HIT, unchanged

That is the expected result. The content that depends on the price refilled. The image stayed cached. The purge targeted the content model, not the whole site.

Now imagine the worker had only purged /products/42. Category and homepage edges would still serve old prices until TTL expiry. Imagine it had purged everything. The price would update, but the origin might receive a global cold-cache surge for unrelated images, CSS, JavaScript, and pages. Both mistakes come from not separating content dependencies from URL lookup.

Operational Failure Modes

Failure: missing tags at render time. If /category/running-shoes includes product 42 but lacks the product:42 surrogate key, a product purge will miss that cached page. Treat tags as part of the response contract and test them.

Failure: purge accepted before origin is ready. A purge followed by an old origin render refills stale content. Order database writes, index updates, rendering readiness, and purge calls deliberately.

Failure: purge everything as a routine tool. Full purges destroy cache warmth, hide dependency modeling problems, and can overload origin. Reserve them for rare cases where the blast radius is intended.

Failure: URL purges miss variants. A single path may have language, currency, device, query, or header variants. Purging by logical key avoids chasing every concrete cache key by hand.

Failure: purge storms exceed control-plane limits. Large updates can produce many purge calls. Batch keys, deduplicate events, retry with backoff, and monitor rejected or delayed purge requests.

Useful signals include purge requests by key, purge acceptance and error rate, queue depth in the invalidation worker, time from content change to first fresh edge response, top stale keys, Age header, cache status after purge, origin refill rate, shield miss rate, and sampled content correctness by region.

Readiness Check

Close the lesson and choose one cached page that combines several content objects. Write down:

page URL:
cache key dimensions:
content objects included:
surrogate keys emitted:
which update event should purge it:
which updates should not purge it:
how to prove the edge is fresh after purge:
what happens if the purge API is delayed:

If you cannot list the surrogate keys from the page's rendered content, invalidation will depend on luck. The goal is not to purge faster by hand. The goal is to make the dependency between cached representations and content objects visible enough that a small event can invalidate exactly the right edge objects.

Connections

The CDN lesson explained how cache keys decide whether a request matches a stored representation. This lesson adds the other index: surrogate keys and purge workflows decide which stored representations must stop being served after content changes.

The next lesson moves away from cached page delivery into long-lived client updates. The shared theme is still freshness, but the mechanism changes: instead of invalidating stored HTTP responses, the system keeps connections or polling loops alive enough to deliver new events.

Resources

[RFC] HTTP Caching RFC 9111
- Focus: Use it for freshness, validation, stored responses, stale behavior, and shared cache semantics.
[RFC] HTTP Semantics RFC 9110
- Focus: Use it for validators such as ETag, conditional requests, and representation metadata.
[DOC] Cloudflare: Purge cache
- Focus: Use it for concrete purge modes such as URL, tag, hostname, prefix, and purge everything, plus operational limits.
[DOC] MDN: Cache-Control
- Focus: Use it for practical meanings of public, private, max-age, s-maxage, stale-while-revalidate, and stale-if-error.
[DOC] Fastly: Getting started with surrogate keys
- Focus: Use it for a concrete implementation of tagging responses and purging groups of cached objects by content key.

Key Takeaways

Cache keys identify request variants; surrogate keys identify content dependencies for invalidation.
Purging by URL is useful but incomplete when one content object appears across many pages, APIs, and variants.
Hard purge, soft purge, revalidation, and versioned assets solve different freshness problems; use the smallest mechanism that matches the risk.
A good invalidation workflow has ordering, batching, retries, and evidence that edge content is fresh, not just an accepted purge API call.

← Back to HTTP Protocol and Content Delivery

← Back to Distributed Systems

← Back to Learning Hub