LESSON
Day 427: Read Replicas, Lag, and Read-Your-Writes
The core idea: A read replica is a real copy of the database, but it serves only the committed history it has already replayed, so read scaling is safe only when the application makes freshness explicit: accept staleness, wait for catch-up, or route the session to a node that has already applied the write.
Today's "Aha!" Moment
In 10.md, Harbor Point moved its risk dashboard onto desk_issuer_exposure_mv, an incrementally maintained projection table. The query got cheaper immediately. Then the desk found a new failure mode. A trader released reservation R-88421 for issuer CA-MUNI, the primary committed the release, and the next page refresh still showed the old 9.9M exposure. The view logic was correct. The read path was not.
The missing mental model is that a read replica is not "the same database, but cheaper." It is a database that has replayed some prefix of the primary's durable history. If the release committed at log position 0/16B4A98 and the replica has only replayed through 0/16B42C0, that replica can answer quickly and still be wrong for this session. Replica lag is therefore not just an operations graph. It changes the meaning of a successful read.
That is why read-your-writes is not a generic feel-good property. It is a session contract. After Harbor Point writes a reservation change, the next read in that same workflow must observe a node whose replay position is at least as new as the write that just committed. If the system cannot prove that, it should not pretend the replica is fresh enough. It should wait, fall back to the primary, or tell the caller that bounded staleness is the actual contract.
Once that click happens, read routing stops looking like a load balancer tweak and starts looking like part of the storage design. The same thinking will matter again in 12.md, where cross-shard secondary indexes become another derived read path whose freshness cannot be assumed.
Why This Matters
Harbor Point wants the obvious performance win: send more reads to replicas so the primary can spend its budget on writes, lock management, WAL generation, and projection maintenance. That is reasonable. The mistake is collapsing all reads into one category. A regional dashboard that refreshes every few seconds can usually tolerate bounded staleness. A trader who just reduced exposure and immediately checks whether the release took effect usually cannot.
Without an explicit freshness contract, teams create two bad outcomes at once. They keep the primary overloaded because nobody trusts replica answers during incidents, and they still serve stale state in user flows that assumed read-your-writes. Support then gets a class of tickets that sound like "the write half-worked" even though the write committed correctly and only the read path was behind.
The production goal is therefore not "use read replicas aggressively." It is "assign the right freshness rule to each workflow, then make routing prove that rule." For Harbor Point, that means separating stale-tolerant dashboards from session-sensitive position checks, attaching a commit token to writes, and monitoring whether replicas can satisfy those tokens without silently falling behind.
Learning Objectives
By the end of this session, you will be able to:
- Explain what replica lag means at the storage-engine level - Show why a replica serves an older prefix of committed history rather than a magically current copy of primary state.
- Trace how read-your-writes is enforced in a session-aware router - Use commit positions such as LSNs or GTIDs to decide when a replica is fresh enough for a client's next read.
- Evaluate read-routing policies by product contract and operational cost - Choose among primary reads, wait-for-replica, and bounded-staleness follower reads for Harbor Point's workflows.
Core Concepts Explained
Concept 1: Replica lag means a replica is behind in applied history, not just behind in wall-clock time
When Harbor Point commits the release of reservation R-88421, the primary does not instantly make every replica equivalent. The primary first makes the write durable locally, then replication ships the relevant log records, then the replica receives them, flushes them, and finally replays them into visible table state. Until the replay step finishes, a query on the replica can still see the pre-release exposure row even though the commit is already durable on the primary.
That sequence is why lag needs a position-based mental model. A replica can be "connected" and still not be current enough for a given session. In PostgreSQL terms, the important distinction is between what has been sent, written, flushed, and replayed. In MySQL terms, the gap might be expressed through GTID execution state. In both cases, the user-visible question is the same: has this node applied the specific transaction my session depends on?
Harbor Point's path looks like this:
trader releases reservation R-88421
|
v
primary commits release at LSN 0/16B4A98
|
v
replica receives WAL and flushes it
|
v
replica replays through 0/16B4A98
|
v
reads from that replica can now show the lower exposure
This also explains why "replica lag in seconds" is often the wrong dashboard for session routing. A node can report low transport delay yet still be missing one large transaction or be stalled in apply because of I/O pressure, replay serialization, or a conflict on the standby. For read-your-writes, Harbor Point cares less about an average lag number than about whether the replica has crossed the exact commit position the session requires.
The trade-off is the whole reason read replicas exist. Harbor Point gets cheaper reads and failure isolation on the read path, but the price is that every replica read now carries a freshness question. Treating replicas as historical prefixes instead of interchangeable copies is what keeps that price visible.
Concept 2: Read-your-writes is a routing protocol built on commit tokens
Suppose Harbor Point's API processes POST /reservations/R-88421/release and the primary commits the change at LSN=0/16B4A98. If the API only returns 200 OK, the next read has no proof of what freshness it requires. A better contract is to return or record a commit token for the session: an LSN, GTID, hybrid timestamp, or another monotonic position that identifies the newest state the client must be able to observe.
The router can then treat read-your-writes as a simple comparison problem rather than a guess:
def choose_read_node(session_token, endpoint_policy, replicas):
if endpoint_policy == "primary_only":
return "primary"
fresh_enough = [r for r in replicas if r.replay_position >= session_token]
if fresh_enough:
return least_loaded(fresh_enough)
if endpoint_policy == "wait_then_replica":
waited = wait_for_any_replica(session_token, timeout_ms=40)
if waited is not None:
return waited
return "primary"
That pseudocode captures three common production strategies. The simplest is primary stickiness: after a write, keep the session on the primary for a short interval. It is easy to implement and often good enough inside one region with tightly bounded lag. It is also imprecise, because time-based stickiness guesses that replicas will have caught up by then. A token-based strategy is better because it proves freshness instead of assuming it from the clock.
The second strategy is wait-for-replica. Harbor Point can try to keep the read on a nearby replica, but only after that replica's replay position reaches the session token. This preserves more read offload than blanket primary stickiness, at the cost of occasional wait latency and more routing logic. The third strategy is explicit bounded staleness: for endpoints that do not require read-your-writes, the router skips the token check and serves from a replica that is merely "fresh enough" by a weaker SLA.
One more production detail matters during failover. A node-local position such as an LSN may not be comparable across leadership changes unless the system also carries a timeline, term, or another epoch identifier. That is why products with multi-node failover often prefer GTIDs, globally comparable commit timestamps, or per-epoch tokens. Read-your-writes works only when the token means the same thing to the router after topology changes as it did before them.
Concept 3: Good systems classify reads by freshness contract instead of giving every endpoint the same rule
Harbor Point's workflows do not all need the same consistency surface. The risk-control path that decides whether a new reservation can be approved should read from the primary or another node that can prove an equally strong view of the latest committed state. A trader refreshing the position they just edited usually needs session consistency, which can be implemented by token-aware replica routing with a short wait budget and primary fallback. A regional oversight dashboard may be perfectly acceptable on a bounded-staleness replica if its contract is "usually within two seconds of primary."
Writing those categories down is what turns replica routing into an operable system:
primary_only: approval checks, risk limits, and any workflow where a stale answer can trigger a wrong business actionsession_consistent: trader self-service pages and APIs that must satisfy read-your-writes without paying primary cost for every read foreverbounded_stale: dashboards, analytics, and exploratory screens where the product explicitly accepts lag in exchange for lower latency and more read scale
Once Harbor Point adopts those classes, the right metrics become obvious. The team should watch replay position distance, wait-on-replica latency, fallback-to-primary rate, the percentage of traffic in each freshness class, and token mismatches after failover. Those metrics show whether the design is really saving primary load or merely hiding that session-sensitive reads already route back to the leader most of the time.
This classification also prepares the ground for the next lesson. When Harbor Point starts maintaining global secondary indexes across shards in 12.md, those indexes become another read-optimization structure whose visibility can trail the source-of-truth writes. The technical surface changes, but the discipline stays the same: do not claim a lookup is current unless the system can prove the supporting structure has caught up far enough.
Troubleshooting
Issue: Replica lag graphs look low, but a trader still fails to see their own release immediately after commit.
Why it happens / is confusing: The graph may be measuring transport delay or sampled wall-clock lag instead of the replica's actual replay position relative to the transaction that just committed. A replica can receive WAL quickly and still not have replayed the specific write the session cares about.
Clarification / Fix: Route read-your-writes by commit token, not by a generic "replica lag < N ms" threshold. Monitor send, flush, and replay progress separately so the team can see where freshness is actually blocked.
Issue: Enabling sticky-primary reads solved stale results, but the primary became overloaded during the 09:30 burst.
Why it happens / is confusing: Blanket stickiness turned many harmless follow-up reads into primary traffic, including workflows that never needed session guarantees in the first place.
Clarification / Fix: Apply read-your-writes only to endpoints whose user contract requires it. For the middle ground, use token-aware wait-for-replica with a short timeout so replicas still carry most read volume when they are close behind.
Issue: After failover, some session tokens are rejected or cause confusing routing decisions.
Why it happens / is confusing: The token was tied to the old primary's local position format, and the new topology cannot compare it safely without an epoch or globally meaningful identifier.
Clarification / Fix: Use tokens that survive topology changes, such as GTID sets or timeline-aware commit positions, and test them in failover drills rather than assuming the same comparison logic still works after promotion.
Advanced Connections
Connection 1: 10.md made the projection table cheap to read; this lesson decides when replica reads of that projection are actually trustworthy
Incremental maintenance can keep desk_issuer_exposure_mv correct on the primary while replica routing still makes the dashboard look stale. Derived-state maintenance and read freshness are separate contracts. Harbor Point needs both: the projection must be updated correctly, and the chosen read node must have replayed far enough to show the update.
Connection 2: 12.md extends the same freshness reasoning to global secondary indexes
A global secondary index is another structure maintained away from the base write path so that reads become cheaper later. Once Harbor Point spreads data across shards, the index may lag or update asynchronously. The same question returns in a new form: what visibility proof is required before a lookup through that structure can be treated as current?
Resources
Optional Deepening Resources
- [DOC] PostgreSQL Monitoring Statistics
- Focus: Interpret replication positions such as sent, written, flushed, and replayed LSNs when diagnosing why a replica still serves stale reads.
- [DOC] MySQL Replication with GTIDs
- Focus: See how GTIDs provide a portable commit token for routing or waiting on session-sensitive reads.
- [DOC] MongoDB Causal Consistency
- Focus: Study a production API that exposes read-your-writes and monotonic reads as session guarantees rather than assuming every replica is equally current.
- [DOC] CockroachDB Follower Reads
- Focus: Compare explicit bounded-staleness reads with session-consistent or primary-only paths.
Key Insights
- A replica serves a position in history, not "the database in general" - Freshness depends on how far replay has advanced, not on the existence of a replica endpoint.
- Read-your-writes is enforced with proof, not optimism - Commit tokens let the router compare what a session needs against what a replica has already applied.
- Read scale improves when freshness classes are explicit - Primary-only, session-consistent, and bounded-stale paths each have a valid place, but only when the application names the contract clearly.