Consistency Spectrum and API Semantics

LESSON

Consistency and Replication

048 30 min advanced

Day 477: Consistency Spectrum and API Semantics

The core idea: A consistency model matters only when an API states what a caller may observe after a write, because the real trade-off is coordination cost versus which anomalies the product is willing to expose.

Today's "Aha!" Moment

In 047.md, Harbor Point taught its reservation store how to resolve concurrent branches of hold H-8821. That still left support with a harder question. After a Lisbon agent extends the hold, which screen is allowed to show the old expiry time? Can the search grid lag for a second? Can the itinerary page lag after payment succeeds? Can the final booking confirmation ever lag? Those are not storage-engine questions by themselves. They are API questions.

At 14:05:02, POST /holds/H-8821/extend succeeds and returns version 918447. A follow-up GET /search/cabins?route=BCN-JFK can tolerate a slightly stale answer because search is advisory and the booking flow revalidates inventory before commit. A follow-up GET /holds/H-8821 for the same agent cannot safely return version 918442 without making the UI look broken. A follow-up POST /bookings/confirm is stricter still: it must not tell two regions that cabin C14 is theirs just because replicas have not converged yet.

The non-obvious insight is that "eventually consistent" is not a user-facing contract. It says replicas will converge at some point, but it does not tell a caller whether they will see their own write, whether reads can move backward, or whether a confirmation response represents a globally decisive check. If the API does not define those semantics explicitly, every client quietly assumes the strongest promise and discovers the weaker reality in production.

Why This Matters

By the time a system reaches Harbor Point's scale, one database label is not enough. The same replicated data set often serves at least four different jobs: browse availability quickly, show an agent the hold they just changed, sequence related side effects such as booking plus itinerary publication, and enforce a no-double-booking invariant. Treating all of those as "the consistency level of the database" forces one of two bad outcomes. Either every path pays the cost of the strongest guarantee, or some path silently inherits a weaker guarantee than the product can actually survive.

Making API semantics explicit fixes that. Search can declare bounded staleness. Session-bound views can promise read-your-write and monotonic reads. Cross-service workflows can preserve causality so that if the customer sees "payment succeeded," the trip page cannot omit the booking event that caused it. Final confirmation can use linearizable or transactional coordination and admit that it may be slower or temporarily unavailable during quorum loss. The trade-off becomes visible: stronger guarantees spend coordination budget, while weaker guarantees spend anomaly budget and require compensation logic.

This matters operationally because incidents rarely say "the system violated causal consistency." They show up as "I just extended the hold and the screen moved backward," "the email confirmation arrived before the itinerary page updated," or "two agents both thought cabin C14 was still free." Good API semantics turn those complaints into designed behaviors or clear bugs instead of surprises.

Core Walkthrough

Part 1: Start with the user promise, not the storage slogan

Harbor Point writes down the contract for each API surface instead of assigning one adjective to the whole platform:

API surface Consistency contract What the caller is allowed to assume
GET /search/cabins Bounded staleness up to 2s Results may lag slightly, but the booking flow will revalidate before committing
GET /holds/H-8821 after the same agent wrote to it Read-your-write plus monotonic reads Once the agent sees version 918447, later reads in that session cannot go backward
GET /customer-trips immediately after payment success Causal consistency across booking and trip projection If the caller has observed the payment-confirmed event, the trip view must include effects caused by that event
POST /bookings/confirm Linearizable check-and-commit on inventory ownership The success response means Harbor Point has definitively assigned the cabin, not merely queued an eventual reconciliation

This is the spectrum in practice. Eventual consistency is the weakest meaningful end of it: the system converges eventually, but the caller gets no bound and no session guarantee. Bounded staleness adds a limit on how old the answer may be. Session guarantees such as read-your-write and monotonic reads narrow the anomalies one caller can observe. Causal consistency preserves order for related actions across services. Linearizability makes a single operation look as if it happened at one globally agreed point in time. If Harbor Point ever needs a multi-row invariant such as "confirm cabin and decrement upgrade inventory atomically," it may need a transaction boundary stronger than a linearizable single-key read.

The important point is that these are not academic labels to paste into docs. Each one answers a different product question. "Can the agent trust the hold details page right after editing?" is a session-guarantee question. "Can payment success race ahead of the itinerary projection?" is a causal-consistency question. "Can two regions both commit the same cabin?" is a linearizability or transaction-boundary question.

Part 2: The API needs mechanism, not just terminology

Once Harbor Point defines the contracts, each one needs an implementation path.

For the hold-details endpoint, the server returns a session token with the successful write:

{
  "hold_id": "H-8821",
  "version": 918447,
  "session_observed": {
    "holds": 918447
  }
}

Later reads carry that observed version implicitly in the agent session or explicitly in a header. The read router chooses a replica only if two conditions are true:

  1. the replica is fresh enough for the endpoint's staleness budget, and
  2. the replica has applied at least version 918447.

If no nearby replica qualifies, the server must route to a fresher replica or leader. Silently serving an older version would violate the API's advertised semantics even if the database itself is healthy.

For the trip view, Harbor Point does something slightly different. The itinerary page is built from a projection service fed by booking events. After POST /bookings/confirm succeeds, the response includes a dependency token representing the confirmed booking event. When the customer immediately requests GET /customer-trips, the gateway waits until the projection has applied that dependency or routes the request to a view that has. That is causal consistency operationalized: effects must not appear before their causes, and a caller who has already observed the cause must not be sent to a read model that predates it.

Final confirmation uses the strongest path. Harbor Point performs a conditionally guarded write against the authoritative inventory shard:

def confirm_booking(cabin_id, hold_id, payment_id):
    return linearizable_transaction(
        read_key=("inventory", cabin_id),
        assert_predicate=lambda row: row.hold_id == hold_id and row.status == "held",
        writes=[
            ("inventory", cabin_id, {"status": "booked", "booking_payment": payment_id}),
            ("bookings", hold_id, {"status": "confirmed"}),
        ],
    )

That flow is slower and less available during quorum loss than a nearby stale read, but it buys the semantic guarantee Harbor Point needs: a success response means the cabin is no longer merely "likely booked once replicas catch up." It is booked.

Mechanism is why API semantics must be scoped endpoint by endpoint. The same replicated storage layer can support multiple contracts, but only if routing, tokens, retries, and write paths enforce the right one each time.

Part 3: Weakening or strengthening semantics must be explicit

The easiest way to break trust is to promise one consistency level and quietly deliver another under stress. Harbor Point therefore makes degradation rules part of the API design:

This is where API semantics shape client code. Stronger contracts often need extra metadata such as ETag, observed-version tokens, or explicit pending states. Weaker contracts need compensating product behavior such as revalidation before commit, freshness indicators, or UI copy that frames a view as advisory. The trade-off is not only infrastructure cost. It is also how much uncertainty the product and client code must absorb.

These choices are also the bridge to partitioning. In 049.md, Harbor Point will split data across shards. Once that happens, the cost of a strong guarantee depends heavily on whether the operation stays inside one shard or spans several. The API contracts defined here are the reason shard-key design matters in the next lesson.

Failure Modes and Misconceptions

Connections

Connection 1: 046.md gave Harbor Point a way to talk about bounded staleness

Lag budgets turned replica freshness into an explicit number. This lesson widens that idea into a full API contract: freshness bounds are only one point on the spectrum.

Connection 2: 047.md showed what happens when semantics are too weak for the write path

Conflict resolution exists because the system accepted concurrent branches. API semantics decide when that is an acceptable choice, when a client must retry, and when the operation must take a stronger path up front.

Connection 3: 049.md will make these contracts more expensive or cheaper depending on shard boundaries

Once data is partitioned, "strong enough" can no longer be discussed without asking whether the relevant read or write stays on one shard or fans out across many.

Resources

Key Takeaways

  1. A consistency model becomes useful only when the API says what a successful write lets the caller observe next.
  2. Different endpoints over the same replicated data can legitimately need different guarantees, from bounded staleness to causal consistency to linearizable confirmation.
  3. Tokens, routing rules, and explicit fallback behavior are the mechanisms that turn consistency vocabulary into an enforceable contract.
  4. The stronger the guarantee, the more coordination cost you pay, which is why the next step is designing shard boundaries that keep expensive guarantees local.
PREVIOUS Conflict Resolution Policies in Distributed Stores NEXT Partitioning Fundamentals and Shard Keys

← Back to Consistency and Replication

← Back to Learning Hub