Consistency Contracts and API Semantics

LESSON

Consistency and Replication

002 30 min intermediate

Consistency Contracts and API Semantics

The core idea: A consistency model is an API contract about which stories clients are allowed to observe, and every stronger story buys clarity by spending latency, coordination, metadata, or availability under failure.

Core Insight

Imagine an operations checklist used during a production incident. An engineer on mobile marks "database failover verified" as complete. A second engineer on the web dashboard refreshes the incident page. A third engineer opens the audit timeline to understand who did what. The same underlying replicated data now appears through three different user experiences.

If the mobile engineer immediately sees the item incomplete again, that feels broken. If the dashboard lags for two seconds but eventually catches up, the team may tolerate it. If the audit timeline shows a follow-up comment before the checklist item that caused it, people may draw the wrong conclusion during a stressful incident.

A consistency model names which of those observable histories are allowed. It is not just a storage-engine detail, and it is not the same thing as "fast" or "slow." It is the contract between the replicated system and the clients who are trying to make sense of its behavior.

The useful design move is to stop asking whether the whole product is "eventually consistent" and start asking what each API must rule out. Some endpoints need a strong single-copy story. Others only need convergence, session stability, or causal ordering. The trade-off is that stronger contracts usually require more coordination or more metadata, especially across replicas.

Client Stories, Not Database Labels

The incident checklist stores task state in multiple replicas so the product remains responsive across regions. A client does not see the replicas directly. It sees a sequence of observations:

t1  mobile client:  mark task complete
t2  mobile client:  refresh task
t3  web client:     refresh incident dashboard
t4  audit client:   load event timeline

The consistency contract answers questions like:

Those are API semantics. A vague promise like "replication is asynchronous" does not tell an application developer what behavior is safe to build on. A precise promise does:

Endpoint                         Client-visible promise
-------------------------------  --------------------------------------------
PATCH /tasks/:id/complete        writer receives a commit token
GET /tasks/:id with token         must include at least that committed version
GET /incident/:id/dashboard       may be up to 3 seconds stale
GET /incident/:id/audit-log       must preserve cause-before-effect order

This is why consistency belongs in API design, not only in database selection. The model determines what counts as a bug for a caller.

The Consistency Ladder

Different models rule out different surprising histories. A useful ladder is:

weaker ------------------------------------------------------> stronger
eventual -> read-your-writes -> monotonic reads -> causal -> sequential -> linearizable

Eventual consistency promises convergence if writes stop for long enough. It does not promise that a client immediately sees its own write, that reads move forward, or that dependent actions appear in causal order. This is often enough for counters, cached summaries, or recommendation feeds.

Read-your-writes says that after a client successfully writes, that same client can read the result. In the checklist app, the mobile engineer should not mark a task complete and then immediately see it incomplete on refresh.

Monotonic reads say that once a client has observed a version, later reads by that client should not go backward. This matters for dashboards that refresh repeatedly. Seeing "failover verified" appear, disappear, and then reappear can be more confusing than seeing a slightly stale page for a short time.

Causal consistency says effects should not appear before their causes. If a comment says "verified after failover," the client should not observe that comment without also being able to observe the failover task it depends on.

Sequential consistency gives one global order that respects each client's program order, but it does not have to match real time. Linearizability adds the real-time rule: if operation A finished before operation B began, every observer must be able to understand the history in that order.

Each step rules out more confusion. Each step also tends to cost more: extra coordination, larger version vectors, session tokens, quorum reads, leader routing, or waiting for remote replicas.

Worked Example: Three Endpoints, Three Contracts

The incident checklist does not need one consistency model for everything. It needs different contracts for different risks.

For the task completion endpoint, read-your-writes is the minimum acceptable contract. When a client receives success from PATCH /tasks/42/complete, the response can include a version token:

PATCH /tasks/42/complete  ->  200 OK, version=task-42@v18
GET /tasks/42?min_version=task-42@v18

The server can satisfy that read by routing to a replica that has applied version v18, waiting briefly for the local replica to catch up, or falling back to the leader. That may add latency, but it prevents the most personal and obvious inconsistency: "I just did this, and the system forgot."

For the incident dashboard, the team may choose monotonic reads with a freshness budget. A stale dashboard is acceptable if it does not jump backward for the same viewer and if the UI knows the data may lag:

dashboard contract:
- may lag the write path by up to 3 seconds
- must not show an older version than this client already saw
- should expose last_refreshed_at for operational clarity

For the audit timeline, causal ordering matters more than raw freshness. Showing a dependent comment before the task completion can create a false story. The system might attach causal dependencies to events or build the audit log from one ordered stream so clients do not observe effects without causes.

The design is not "strong everywhere" or "weak everywhere." It is a set of promises matched to user harm.

Choosing the Weakest Useful Promise

The strongest model is not automatically the best model. Linearizability is excellent when clients need one real-time truth, such as acquiring a lock, assigning a unique incident commander, or authorizing an irreversible action. It is expensive when every dashboard read pays cross-region coordination just to avoid a harmless one-second lag.

A practical review asks four questions:

Question                                      Example answer
--------------------------------------------  -----------------------------------------
Who is the observer?                          the writer, one session, all clients
What surprise must be impossible?             seeing my own write disappear
How long can stale state be acceptable?        3 seconds for dashboard summaries
What cost can this endpoint pay?              wait on leader, quorum, token, or cache

The phrase "eventually consistent" is too broad unless it is paired with those answers. Eventual convergence might be fine for a derived count. It is usually too weak for a workflow step that a human just completed. Causal consistency might be enough for conversations and timelines. Linearizability may be reserved for authority-changing operations.

This is the core API lesson: consistency is not only about where replicas live. It is about what histories callers are allowed to depend on.

Failure Modes

Calling eventual consistency "slightly stale." Slightly stale is a freshness budget. Eventual consistency by itself also permits session surprises, non-monotonic reads, and temporary disagreement unless the system adds stronger guarantees.

Promising read-your-writes without routing or tokens. A client cannot rely on seeing its write if a later read can land on any lagging replica. The API needs a version token, sticky routing, leader reads, quorum reads, or another mechanism that makes the promise real.

Using linearizability as a default comfort blanket. Strong real-time order can simplify reasoning, but it may add latency and reduce availability during replica or network trouble. Spend it where the product needs that exact guarantee.

Forgetting that semantics cross service boundaries. If a task service promises causal order but the audit service consumes events asynchronously without dependencies, the user can still see effects before causes. The contract has to survive the full read path.

Resources

Key Takeaways

  1. A consistency model is a contract over client-visible histories, not a vague label for a database.
  2. Different APIs in the same product can need different guarantees: convergence, session stability, causal order, or real-time global order.
  3. Stronger contracts rule out more confusing stories, but they usually spend coordination, metadata, latency, or availability under failure.
  4. Good API design names the weakest promise that still prevents the user-visible surprise the product cannot tolerate.
PREVIOUS Partition-Time Guarantees: CAP and PACELC NEXT Replication Topologies and Failure Domains