URLs, Authorities, Origins, and Resource Identity

LESSON

002 25 min intermediate

URLs, Authorities, Origins, and Resource Identity

The core idea: a URL is not just a route string; it is the public name a client, browser, cache, proxy, and server use to decide what is being addressed, which authority owns it, and which security or caching rules apply.

Core Insight

Imagine the commerce API from the previous lesson growing into a real web system. The product page loads from https://www.shop.test/products/42, JavaScript calls https://api.shop.test/products/42, images come from https://cdn.shop.test/products/42.jpg, and an internal gateway forwards all three to services in the same cluster. From the backend team's point of view, these requests may eventually touch the same product database. From HTTP's point of view, they are not the same thing.

The useful insight is that resource identity is a public contract. A URL tells the client what it is addressing. The authority tells the network and server side which host owns the request. The origin tells a browser which security boundary applies. The path and query help identify the resource or selected representation. A cache, redirect rule, cookie policy, CORS check, access log, and incident dashboard can all treat two strings differently even if your reverse proxy sends both to the same service.

A common mistake is to treat URLs as implementation paths: /service-a/table-products/read?id=42, /api/node-7/products/42, or /v1/getProduct?productId=42. Those names reveal deployment or handler shape instead of naming the thing the caller cares about. They are easy to generate from code and hard to keep stable. When the implementation moves, the public contract has to move with it, and every client, cache key, link, bookmark, metric, and runbook inherits the churn.

The central trade-off is stable names versus implementation leakage. Stable resource names require thought: what is the durable thing, which host is authoritative for it, and which parts of the request select a variant? Implementation-shaped names are faster to invent, but they make later routing, caching, security, and migration work more fragile.

The Pieces of Addressing

A URL has several pieces that participate in different decisions:

https://api.shop.test:443/products/42?view=summary#reviews
|___|  |______________| |__________| |__________| |_____|
scheme     authority       path        query     fragment

The scheme says which protocol family and security expectation the client is using. https and http are not cosmetic variants. They imply different security properties and different browser behavior. Later lessons will go deeper on HTTPS, but the naming decision starts here: a resource exposed over https://api.shop.test/products/42 has a different origin from http://api.shop.test/products/42.

The authority is the host and optional port that the client is asking for. In HTTP/1.1 it appears in the Host field; in HTTP/2 and HTTP/3 it appears as the :authority pseudo-header. This lets many domains share one IP address, one load balancer, or one gateway while still preserving distinct public names. api.shop.test and www.shop.test can route to the same cluster, but they are different authorities.

The path is the hierarchical name inside that authority. HTTP does not know that /products/42 means a product row or that /orders/2024/abc means a checkout order. Your application gives those paths meaning. Good path design names durable resources and relationships; weak path design exposes handlers, shards, build versions, or temporary storage layout.

The query string refines the request. It is often used for filters, pagination, search terms, feature variants, or representation selection: /products?category=mugs&page=2. Query parameters are visible to caches and logs, so they should be treated as part of the public request contract. A query string that includes secrets, session tokens, or unstable implementation flags will leak into places you did not intend.

The fragment, the part after #, is different: browsers use it locally and do not send it in the HTTP request. #reviews may scroll the product page to a section, but the origin server normally never sees that fragment. This boundary is small but important. If server behavior depends on information after #, the design is already confused.

An origin is the browser security boundary formed by scheme, host, and port. https://www.shop.test and https://api.shop.test are different origins even if they are owned by the same company and served by the same gateway. That is why cookies, CORS, storage, and script access care about origin. Backend routing and browser security are related, but they are not the same decision.

Worked Path: One Product, Four Names

Take one product, 42, and compare four URLs:

1. https://www.shop.test/products/42
2. https://api.shop.test/products/42
3. https://api.shop.test/products/42?view=summary
4. https://cdn.shop.test/products/42.jpg

The first URL names a human-facing product page. It may return HTML, set browser-visible cookies, and include links or scripts. Its origin is https://www.shop.test.

The second URL names an API resource. It may return JSON, use CORS rules, and be called by JavaScript. Its origin is https://api.shop.test, not the same as the page origin. If the browser page at www.shop.test calls the API at api.shop.test, the browser must decide whether cross-origin access is allowed. That decision does not come from the product service's database schema; it comes from URL origins and response headers.

The third URL names either the same resource with a selected representation or a distinct resource view, depending on the API contract. A cache will usually treat the full URL, including the query string, as part of the cache key unless configured otherwise. If ?view=summary and ?view=full produce different JSON but the CDN ignores view, users may receive the wrong representation. If the query includes meaningless tracking parameters and the CDN includes all of them, the cache may fragment into many low-hit variants.

The fourth URL names an image, likely served from a CDN authority. It may be public and aggressively cached, while the API representation may be private or short-lived. Both may describe product 42, but they are not interchangeable HTTP resources. One is image bytes at a CDN authority; one is a JSON representation at an API authority; one is an HTML document at a web authority.

This is the core mechanism: URL pieces feed different downstream decisions.

scheme + authority -> origin, virtual host, TLS/certificate expectations
path               -> application resource selection
query              -> representation variant, filters, cache key pressure
fragment           -> client-side navigation, not server request input

Those decisions can happen in different places. A browser computes the origin before JavaScript can read a response. A TLS endpoint and gateway use the authority to select a certificate, virtual host, and route. A cache uses the request target and selected request fields to decide whether a stored response matches. The application handler finally interprets the path and query as product concepts. When a URL is vague or implementation-shaped, each layer inherits that vagueness in a different way.

Design Pressure

The first pressure is migration. Suppose the product service moves from a monolith to a separate service. If public URLs were shaped like /monolith/controllers/product?id=42, migration forces clients to learn the move. If public URLs were shaped like /products/42, the gateway can change internal routing while the public name stays stable. The URL should name the resource, not the current implementation owner.

The second pressure is cache behavior. Cache keys are often built from scheme, authority, path, query, and selected request fields. If resource identity is sloppy, the cache either shares too much or too little. Sharing too much creates correctness and privacy failures. Sharing too little destroys hit rate and pushes load back to origin. Later caching lessons will cover this in detail, but the naming foundation starts here.

The third pressure is security boundaries. Browser origins are intentionally strict. Moving an endpoint from www.shop.test to api.shop.test changes same-origin assumptions. Moving from https to http changes security expectations. Adding a port changes the origin. These details can decide whether cookies attach, whether JavaScript can read a response, and whether a redirect is safe.

The trade-off is that stable names require an abstraction layer. You have to choose public nouns and boundaries before the implementation is final. That can feel slower than exposing whatever route the framework generated. The payoff is that clients can keep using the same names while your internals evolve.

Failure Modes and Boundaries

One common failure mode is changing URLs when implementation changes. A path like /k8s/product-service-v2/products/42 might help one operator during a migration, but it turns deployment vocabulary into a client contract. Put that information in logs, traces, service discovery, or routing configuration instead.

Another failure mode is query parameter drift. Teams add ?debug=true, ?newCache=false, ?source=email, or ?token=... and forget that the URL is visible in logs, browser history, referrers, metrics, and cache keys. Some query parameters are legitimate resource selectors; others are operational hints that should live in headers, configuration, or request bodies instead.

A third failure mode is confusing authority with ownership. The authority in the URL tells the client which host it is addressing; it does not prove which internal team owns the data or which service will serve the request. A gateway may route several authorities into one service, or one authority into many services. Keep public authority names stable and use internal routing to manage ownership changes.

The boundary sentence to remember: a URL identifies a resource from the client's point of view; it is not a promise about which database row, service process, pod, or file path will handle the request.

Readiness Check

Pick one endpoint and decompose its URL into scheme, authority, path, query, and fragment. Then answer four questions from memory: which origin would a browser assign, which pieces should be part of the cache key, which pieces name durable product concepts, and which pieces leak implementation detail?

Resources

[RFC] HTTP Semantics RFC 9110
- Focus: Use it for URI references, target resources, authority, and request semantics.
[SPEC] WHATWG URL Standard
- Focus: Use it for URL parsing, origins, hosts, paths, queries, and fragments as browsers understand them.
[ARTICLE] MDN Same-origin policy
- Focus: Use it for the browser security meaning of scheme, host, and port.
[RFC] HTTP Caching RFC 9111
- Focus: Use it for how request targets and selected fields influence cache behavior.

Key Takeaways

URLs are public resource names, not private implementation paths.
Scheme, authority, path, query, fragment, and origin each participate in different HTTP, browser, routing, and caching decisions.
Stable resource identity lets internals move without forcing every client, cache, link, and dashboard to move with them.
The main trade-off is stable names versus implementation leakage; prefer names that describe durable resources and hide service topology.

← Back to HTTP Protocol and Content Delivery

← Back to Distributed Systems

← Back to Learning Hub