Replication Lag and Read-Your-Writes
LESSON
Replication Lag and Read-Your-Writes
The core idea: A read replica is a real copy of the database, but it serves only the committed history it has already replayed, so read scaling is safe only when the application makes freshness explicit: accept staleness, wait for catch-up, or route the session to a node that has already applied the write.
Core Insight
Harbor Point has just moved a risk dashboard onto desk_issuer_exposure_mv, an incrementally maintained projection table. The query is cheaper now, but the desk finds a new failure mode. A trader releases reservation R-88421 for issuer CA-MUNI, the primary commits the release, and the next page refresh still shows the old 9.9M exposure. The projection logic is correct. The read path is not.
The missing mental model is that a read replica is not "the same database, but cheaper." It is a database that has replayed some prefix of the primary's durable history. If the release committed at log position 0/16B4A98 and the replica has only replayed through 0/16B42C0, that replica can answer quickly and still be wrong for this session. Replica lag is therefore not just an operations graph. It changes the meaning of a successful read.
That is why read-your-writes is not a generic feel-good property. It is a session contract. After Harbor Point writes a reservation change, the next read in that same workflow must observe a node whose replay position is at least as new as the write that just committed. If the system cannot prove that, it should not pretend the replica is fresh enough. It should wait, fall back to the primary, or tell the caller that bounded staleness is the actual contract.
Once that clicks, read routing stops looking like a load balancer tweak and starts looking like part of the storage design. The trade-off is direct: replicas reduce primary load and improve read capacity, but only if the application says which reads may be stale and which reads need proof that a particular write is visible.
Freshness Is a Contract
Harbor Point wants the obvious performance win: send more reads to replicas so the primary can spend its budget on writes, lock management, WAL generation, and projection maintenance. That is reasonable. The mistake is collapsing all reads into one category. A regional dashboard that refreshes every few seconds can usually tolerate bounded staleness. A trader who just reduced exposure and immediately checks whether the release took effect usually cannot.
Without an explicit freshness contract, teams create two bad outcomes at once. They keep the primary overloaded because nobody trusts replica answers during incidents, and they still serve stale state in user flows that assumed read-your-writes. Support then gets a class of tickets that sound like "the write half-worked" even though the write committed correctly and only the read path was behind.
The production goal is therefore not "use read replicas aggressively." It is "assign the right freshness rule to each workflow, then make routing prove that rule." For Harbor Point, that means separating stale-tolerant dashboards from session-sensitive position checks, attaching a commit token to writes, and monitoring whether replicas can satisfy those tokens without silently falling behind.
Workflow Freshness contract
------------------------------- -------------------------------------------
risk approval check primary_only / newest authoritative state
trader checks own release read-your-writes for this session
regional exposure dashboard bounded staleness is acceptable
historical analytics export stale snapshot may be intentional
This table is not just product labeling. It tells the router which evidence it must obtain before choosing a read node. A stale-tolerant dashboard can use a healthy replica that is slightly behind. A session-sensitive page needs a replica that has already replayed the user's commit, or it needs another behavior.
Replicas Serve a Prefix of History
When Harbor Point commits the release of reservation R-88421, the primary does not instantly make every replica equivalent. The primary first makes the write durable locally, then replication ships the relevant log records, then the replica receives them, flushes them, and finally replays them into visible table state. Until the replay step finishes, a query on the replica can still see the pre-release exposure row even though the commit is already durable on the primary.
That sequence is why lag needs a position-based mental model. A replica can be "connected" and still not be current enough for a given session. In PostgreSQL terms, the important distinction is between what has been sent, written, flushed, and replayed. In MySQL terms, the gap might be expressed through GTID execution state. In both cases, the user-visible question is the same: has this node applied the specific transaction my session depends on?
Harbor Point's path looks like this:
trader releases reservation R-88421
|
v
primary commits release at LSN 0/16B4A98
|
v
replica receives WAL and flushes it
|
v
replica replays through 0/16B4A98
|
v
reads from that replica can now show the lower exposure
This also explains why "replica lag in seconds" is often the wrong dashboard for session routing. A node can report low transport delay yet still be missing one large transaction or be stalled in apply because of I/O pressure, replay serialization, or a conflict on the standby. For read-your-writes, Harbor Point cares less about an average lag number than about whether the replica has crossed the exact commit position the session requires.
Treating replicas as historical prefixes instead of interchangeable copies keeps the price visible. It also gives the application a clean mechanism: compare the state a session needs with the state a replica has already applied.
Read-Your-Writes With Commit Tokens
Suppose Harbor Point's API processes POST /reservations/R-88421/release and the primary commits the change at LSN=0/16B4A98. If the API only returns 200 OK, the next read has no proof of what freshness it requires. A better contract is to return or record a commit token for the session: an LSN, GTID, hybrid timestamp, or another monotonic position that identifies the newest state the client must be able to observe.
The router can then treat read-your-writes as a simple comparison problem rather than a guess:
def choose_read_node(session_token, endpoint_policy, replicas):
if endpoint_policy == "primary_only":
return "primary"
fresh_enough = [r for r in replicas if r.replay_position >= session_token]
if fresh_enough:
return least_loaded(fresh_enough)
if endpoint_policy == "wait_then_replica":
waited = wait_for_any_replica(session_token, timeout_ms=40)
if waited is not None:
return waited
return "primary"
That pseudocode captures three common production strategies. The simplest is primary stickiness: after a write, keep the session on the primary for a short interval. It is easy to implement and often good enough inside one region with tightly bounded lag. It is also imprecise, because time-based stickiness guesses that replicas will have caught up by then. A token-based strategy is better because it proves freshness instead of assuming it from the clock.
The second strategy is wait-for-replica. Harbor Point can try to keep the read on a nearby replica, but only after that replica's replay position reaches the session token. This preserves more read offload than blanket primary stickiness, at the cost of occasional wait latency and more routing logic. The third strategy is explicit bounded staleness: for endpoints that do not require read-your-writes, the router skips the token check and serves from a replica that is merely "fresh enough" by a weaker SLA.
One more production detail matters during failover. A node-local position such as an LSN may not be comparable across leadership changes unless the system also carries a timeline, term, or another epoch identifier. That is why products with multi-node failover often prefer GTIDs, globally comparable commit timestamps, or per-epoch tokens. Read-your-writes works only when the token means the same thing to the router after topology changes as it did before them.
Routing Policies and Their Costs
Harbor Point's workflows do not all need the same consistency surface. The risk-control path that decides whether a new reservation can be approved should read from the primary or another node that can prove an equally strong view of the latest committed state. A trader refreshing the position they just edited usually needs session consistency, which can be implemented by token-aware replica routing with a short wait budget and primary fallback. A regional oversight dashboard may be perfectly acceptable on a bounded-staleness replica if its contract is "usually within two seconds of primary."
Writing those categories down is what turns replica routing into an operable system:
primary_only: approval checks, risk limits, and any workflow where a stale answer can trigger a wrong business actionsession_consistent: trader self-service pages and APIs that must satisfy read-your-writes without paying primary cost for every read foreverbounded_stale: dashboards, analytics, and exploratory screens where the product explicitly accepts lag in exchange for lower latency and more read scale
Once Harbor Point adopts those classes, the right metrics become obvious. The team should watch replay position distance, wait-on-replica latency, fallback-to-primary rate, the percentage of traffic in each freshness class, and token mismatches after failover. Those metrics show whether the design is really saving primary load or merely hiding that session-sensitive reads already route back to the leader most of the time.
This classification also prepares the ground for later lessons on sharding and secondary indexes. A global secondary index is another read-optimization structure whose visibility can trail the source-of-truth writes. The technical surface changes, but the discipline stays the same: do not claim a lookup is current unless the system can prove the supporting structure has caught up far enough.
Failure Modes
Trusting generic lag graphs for session freshness. Replica lag graphs may measure transport delay or sampled wall-clock lag instead of the replica's actual replay position relative to the transaction that just committed. A replica can receive WAL quickly and still not have replayed the specific write the session cares about. Route read-your-writes by commit token, not by a generic "replica lag < N ms" threshold.
Solving every stale read with primary stickiness. Sticky-primary reads can hide stale results, but blanket stickiness turns many harmless follow-up reads into primary traffic. During the 09:30 burst, this can erase the scaling benefit of replicas. Apply read-your-writes only to endpoints whose user contract requires it, and use token-aware wait-for-replica where a short wait can preserve read offload.
Using tokens that fail during topology changes. After failover, some tokens may become ambiguous if they were tied only to the old primary's local position format. Use tokens that survive topology changes, such as GTID sets or timeline-aware commit positions, and test those comparisons in failover drills.
Treating derived-state correctness and read freshness as the same promise. Incremental maintenance can keep desk_issuer_exposure_mv correct on the primary while replica routing still makes the dashboard look stale. Harbor Point needs both promises: the projection must be updated correctly, and the chosen read node must have replayed far enough to show the update.
Resources
- [DOC] PostgreSQL Monitoring Statistics
- Focus: Interpret replication positions such as sent, written, flushed, and replayed LSNs when diagnosing why a replica still serves stale reads.
- [DOC] MySQL Replication with GTIDs
- Focus: See how GTIDs provide a portable commit token for routing or waiting on session-sensitive reads.
- [DOC] MongoDB Causal Consistency
- Focus: Study a production API that exposes read-your-writes and monotonic reads as session guarantees rather than assuming every replica is equally current.
- [DOC] CockroachDB Follower Reads
- Focus: Compare explicit bounded-staleness reads with session-consistent or primary-only paths.
Key Takeaways
- A replica serves a position in history, not "the database in general"; freshness depends on how far replay has advanced.
- Read-your-writes is enforced with proof, not optimism; commit tokens let the router compare what a session needs against what a replica has applied.
- Read scale improves when freshness classes are explicit, because primary-only, session-consistent, and bounded-stale reads have different costs.
- Failover and derived read structures must preserve the same rule: do not claim freshness unless the system can prove the required commit is visible.