Status Codes and Failure Contracts

LESSON

005 25 min intermediate

Status Codes and Failure Contracts

The core idea: HTTP status codes are boundary evidence: they tell clients, intermediaries, retries, alerts, and humans what the server believes happened, while a good failure contract adds the missing detail needed to act safely.

Core Insight

Imagine the checkout API from the previous lesson. A mobile client sends POST /payments with an idempotency key. The API calls a payment provider, waits too long, and returns a response to the client. The response body might contain a careful explanation, but the first signal most software sees is not the body. It is the status code: 201 Created, 202 Accepted, 409 Conflict, 429 Too Many Requests, 503 Service Unavailable, or perhaps a vague 500 Internal Server Error.

That number is not decoration. A status code is a compact claim made at the HTTP boundary. It says whether the request was understood, whether the server accepted or completed the work, whether the client must change something, whether the operation conflicts with current state, or whether the server is temporarily unable to serve the request. Clients use that claim to decide whether to retry, show validation errors, ask the user to sign in, poll a status resource, or stop.

The common mistake is to treat status codes as a thin mapping from exceptions to numbers: validation exception becomes 400, database exception becomes 500, and every business failure is hidden in a 200 OK envelope. That makes the handler easy to write, but it weakens the system. Load balancers, SDKs, logs, dashboards, retry libraries, and incident responders now have to parse private application bodies to know what happened.

The deeper mechanism is a failure contract. The status code gives the coarse class of outcome. Headers add machine-readable control signals such as Retry-After or Location. The error body gives stable details such as an error type, human-readable title, request identifier, and field-level validation information. The trade-off is semantic precision versus client compatibility: precise contracts require discipline and documentation, while loose 200 or 500 responses feel simpler until partial failure arrives.

What the Status Class Tells You

Start with the class before memorizing individual codes:

2xx means the request was successfully received, understood, and accepted in some way.
3xx means the client needs to follow another resource or representation path.
4xx means the server believes the request cannot be completed as sent.
5xx means the server failed to fulfill an apparently valid request.

This is a contract about what the server can claim from its side of the boundary. It is not a perfect statement about the whole world. A 504 Gateway Timeout from an API gateway does not prove the backend did nothing. It proves the gateway did not receive a timely response from the upstream. A 201 Created proves the server is willing to claim that a resource was created and should usually identify it. A 400 Bad Request says the client sent something the server cannot process as a valid request. Each one changes what the caller should do next.

In the checkout scenario, the client wants a decision table more than a number list:

response evidence       client action
------------------      ------------------------------------
201 Created             show payment success; keep receipt link
202 Accepted            show pending state; poll status resource
400 or 422              fix request data; do not retry blindly
401                     obtain credentials; repeat after auth
403                     stop or request permission; auth alone may not help
404                     target resource is not available at this URI
409                     resolve state conflict or inspect existing operation
429                     wait, back off, and honor Retry-After if present
500                     uncertain server fault; retry only under policy
503                     temporary unavailability; Retry-After may guide delay
504                     upstream timeout; operation outcome may be unknown

The important move is not picking the fanciest code. It is making the next safe action visible.

Completion, Acceptance, and Uncertainty

The most useful success codes separate "done" from "started."

200 OK is a general success response. For a GET /orders/842, it usually means the body contains the current representation of the order. For a command-like request, 200 can be fine when the response includes the resulting state. 204 No Content means success with no response body. It is common after idempotent updates when the client does not need a representation back.

201 Created is stronger. It says the request created a new resource. A good 201 response includes a Location header pointing to that resource, or a body that identifies it clearly. If POST /payments returns 201 Created with Location: /payments/pay_901, the client now has durable evidence: there is a payment resource to inspect, log, and show to support.

202 Accepted is different. It says the request has been accepted for processing, but processing has not completed. This is not a polite way to say "success." It is a promise that the server owns some future work but is not yet claiming the final outcome. A useful 202 response should tell the client where to check progress:

HTTP/1.1 202 Accepted
Location: /payment-attempts/pay_901
Retry-After: 5
Content-Type: application/problem+json

The status class is successful because the server accepted the request. The application outcome is still pending. That distinction matters for user experience and retries. If the client treats 202 as final success, it may show a paid order before the charge completes. If it treats 202 as failure, it may retry and create unnecessary pressure. The contract should say "accepted, check here, not before this delay."

Client Errors Are Not All the Same

The 4xx class means the request has a problem from the server's point of view. But "client error" covers several different fixes.

400 Bad Request is appropriate when the request syntax or basic shape is invalid: malformed JSON, a missing required field, or an impossible parameter format. 422 Unprocessable Content is often used when the syntax is valid but the domain validation fails, such as a payment amount that exceeds the order total. Some APIs use only 400 for both. That can be acceptable if the error body is precise, but the trade-off is that clients lose a coarse distinction between "cannot parse" and "understood but rejected."

401 Unauthorized is about authentication, despite the name. The caller needs valid credentials. 403 Forbidden means credentials may be known, but the server refuses the action. In a dashboard, 401 can trigger a login flow; 403 can show "you do not have permission." If every auth failure becomes 401, clients may send users through login loops that cannot fix the problem.

404 Not Found says the target resource was not found or is not being revealed. Sometimes an API intentionally returns 404 instead of 403 to avoid leaking that a private resource exists. That is a reasonable security choice, but it should be a conscious contract, not an accident.

409 Conflict is where the previous lesson on method semantics starts to pay off. Suppose the client retries POST /payments with the same idempotency key but a different amount. The server can say:

HTTP/1.1 409 Conflict
Content-Type: application/problem+json

{
  "type": "https://api.shop.test/problems/idempotency-conflict",
  "title": "Idempotency key reused with different request data",
  "status": 409,
  "request_id": "req_7KQ9",
  "existing_payment": "/payments/pay_901"
}

That response is more useful than 500 and more honest than 200. The server understood the request, but the requested operation conflicts with stored evidence. Retrying the same conflicting request will not help. The client needs to inspect the existing operation or generate a new idempotency key for a genuinely new payment attempt.

Server Errors Need Retry Evidence

The 5xx class means the server or an intermediary failed while handling a request that looked valid enough to attempt. This is where sloppy contracts create retry storms.

500 Internal Server Error is the generic fallback. It should not be the default for validation, authorization, rate limits, or business conflicts. If a client gets 500, it often assumes the server had a transient fault. Automated clients may retry. Operators may page the server team. Error budgets may burn. Returning 500 for a bad coupon code teaches the whole system the wrong lesson.

502 Bad Gateway, 503 Service Unavailable, and 504 Gateway Timeout usually involve an intermediary or dependency boundary. A reverse proxy returns 502 when an upstream response is invalid or unusable. A service returns 503 when it is temporarily unavailable, overloaded, in maintenance, or unable to accept traffic. A gateway returns 504 when it timed out waiting for upstream.

The dangerous case is 504 after a side-effecting request. The gateway timed out; it does not know whether the payment provider charged the card. The failure contract should preserve uncertainty instead of pretending to know:

HTTP/1.1 504 Gateway Timeout
Content-Type: application/problem+json
Retry-After: 3

{
  "type": "https://api.shop.test/problems/upstream-timeout",
  "title": "Payment result is not known yet",
  "status": 504,
  "request_id": "req_8PA2",
  "idempotency_key": "pay_842_2026_06_18_a",
  "status_url": "/payment-attempts/pay_901"
}

This does not make the failure pleasant, but it prevents dangerous guessing. The client can poll the status URL or retry with the same idempotency key after a delay. Logs and traces have a stable request ID. Support can search for the payment attempt. Alerting can distinguish upstream timeouts from local handler crashes.

429 Too Many Requests belongs near this discussion even though it is a 4xx code. It says the request may be valid, but the client has exceeded a rate policy. A useful 429 response includes Retry-After or rate-limit headers. Without that evidence, clients guess delays, synchronize retries, and turn throttling into repeated bursts.

Worked Path: The Same Payment Through Five Outcomes

Trace the same request through five possible outcomes:

POST /payments
Idempotency-Key: pay_842_2026_06_18_a

First path: the API creates the payment resource and has final confirmation. It returns 201 Created, Location: /payments/pay_901, and a representation of the payment. The client can show success and store the payment URL.

Second path: the API stores the attempt but the provider is still processing. It returns 202 Accepted, Location: /payment-attempts/pay_901, and maybe Retry-After: 5. The client shows "processing" and polls, rather than creating a new payment.

Third path: the same idempotency key already exists with the same request body. The server returns the original result, perhaps again as 201 or 200, because this is a replay of the same operation. The client does not need to know whether it saw the first response or the replayed one.

Fourth path: the same idempotency key exists with different request data. The server returns 409 Conflict with an error type explaining the mismatch. The client must not retry blindly. It should either inspect the existing payment attempt or start a new operation with a new key.

Fifth path: the payment provider times out. The edge returns 504 or the API returns 503, includes Retry-After when it has a safe delay recommendation, and includes a request ID and status URL if the operation was recorded. The client now knows the difference between "the server rejected my request" and "the result is not known yet."

That worked path is the mental model: status code first, headers for machine control, body for stable detail, logs and traces for investigation.

Failure Modes to Review in Real APIs

Returning 200 OK for every application error. This is tempting because clients can always parse one envelope shape. The cost is that HTTP-aware infrastructure loses the signal. Metrics show success while users fail. Load balancers cannot distinguish bad requests from server faults. Generic SDKs do not know when to retry or stop.

Returning 500 for client-correctable problems. If missing fields, invalid state transitions, duplicate commands, and permission failures all become 500, clients learn to retry work that will never succeed. Use 4xx when the request must change, and reserve 5xx for server-side inability.

Using a precise status with an unstable body. A good status code cannot rescue an error body whose fields change every release. Error bodies are API surface. Give them stable type values, clear titles, a status, and correlation evidence such as request_id. Use field-level details when the user or client can fix input.

Omitting retry timing. 429 and 503 often need delay guidance. Retry-After is not perfect, and clients still need jitter and backoff, but an explicit hint is better than letting every client invent a tight retry loop.

Close the lesson and inspect one API response you depend on. Can you say what the status code claims, what header tells software what to do next, what body field remains stable across releases, and what identifier an operator would search during an incident? If not, the failure contract is still too implicit.

Connections

Method semantics say what kind of operation the client attempted. Status codes say what the server is willing to claim about that attempt. Together they define the retry surface: a timeout after POST is scary until the operation has identity, and a retry after 503 is safer when the server gives timing and correlation evidence.

The next lesson moves from outcome signals to representation signals. Once the client knows whether a request succeeded or failed, it still has to know what format the bytes use, which media type was negotiated, and what metadata makes the representation safe to interpret.

Resources

[RFC] HTTP Semantics RFC 9110
- Focus: Use the status code definitions and method semantics as the primary contract language.
[RFC] Problem Details for HTTP APIs RFC 9457
- Focus: Study how type, title, status, and extension fields make errors stable and machine-readable.
[DOC] MDN: HTTP response status codes
- Focus: Use it as a quick cross-check for common status meanings and neighboring codes.
[DOC] Stripe API: Errors
- Focus: Notice how a production API combines status, error type, request IDs, and client-facing detail.

Key Takeaways

A status code is boundary evidence: it tells generic clients and infrastructure what class of outcome the server claims.
A failure contract combines status, headers, stable error body fields, and correlation identifiers so the caller can act safely.
4xx usually means the request must change; 5xx means the server or an upstream dependency failed to fulfill an apparently valid request.
The best status code is the one that makes the next safe action visible without forcing clients to reverse-engineer private error text.

← Back to HTTP Protocol and Content Delivery

← Back to Distributed Systems

← Back to Learning Hub