Log Shipping and Ordered Apply

LESSON

004 30 min advanced

Log Shipping and Ordered Apply

The core idea: Log shipping makes a replica trustworthy by sending the primary's ordered recovery history, but operators must distinguish bytes received, bytes durably flushed, and changes actually applied before they can reason about freshness or failover.

Core Insight

Harbor Point has chosen a local voting topology for reservation writes and an asynchronous standby for dashboards and disaster recovery. At 09:30, the primary accepts a reservation for issuer MUNI-77. A trader sees success. A dashboard team wants that update on a standby. Operations wants the standby to be promotable if the primary dies.

The tempting mental model is that the primary sends "the changed row" to the standby. That is too weak for a database replica. The primary's real durable history is its write-ahead log: a sequence of records for heap changes, index updates, visibility metadata, and the commit record that makes the transaction real.

Log shipping works because the standby replays that same ordered history. The replica is not guessing what the primary meant. It is reconstructing the same prefix of state the primary would use during crash recovery.

The trade-off is that "the replica has the update" is not one state. A log record may have reached the standby over the network, been flushed to the standby's disk, or been replayed into query-visible data pages. Each milestone answers a different production question, and mixing them up causes stale dashboards, bad failover decisions, and confusing incident reports.

Replication as Remote Recovery

When Harbor Point approves reservation R-88421, the primary does not only change one logical row. A storage engine may update a summary row, insert a reservation row, maintain indexes, record visibility information, and finally write a commit record. The write-ahead log preserves that state transition in the order the engine requires.

Conceptually, a tiny WAL prefix might look like this:

LSN 8A/10  UPDATE issuer_exposure reserved_notional += 500000
LSN 8A/28  INSERT reservations(id='R-88421', issuer='MUNI-77', status='open')
LSN 8A/40  INSERT index entry for open reservations
LSN 8A/58  COMMIT tx=88421

The standby cannot safely expose the reservation after replaying only the row insert. It needs the ordered prefix that includes the supporting index changes and the commit record. That is why physical log shipping is so powerful: the standby consumes the exact history that defines the primary's storage state.

The normal path has two distinct branches:

client transaction
   |
   v
primary generates WAL -> primary fsync -> commit visible on primary
   |
   `-> WAL sender -> standby receives WAL -> standby flushes WAL -> standby replays WAL

This is replication as remote crash recovery. It gives the standby a precise source of truth, but it also couples the standby to the primary's engine format, version rules, and ordered log stream. It is the right tool when the goal is "another trustworthy copy of the same database," not when the goal is a differently shaped business event stream.

Three Watermarks, Three Questions

A useful replica dashboard should not show one boolean called healthy. For log shipping, Harbor Point needs at least three positions:

Position       Meaning                                  Main question answered
------------   ---------------------------------------  -------------------------------
receive_lsn    standby has received WAL bytes           is transport keeping up?
flush_lsn      standby has durably stored WAL locally    what can survive standby crash?
replay_lsn     standby has applied WAL to data pages     what can standby queries see?

Those positions often diverge. Suppose the primary commits R-88421 at LSN 8A/58. The standby might receive the record quickly, flush it a few milliseconds later, and replay it only after a long dashboard query releases a conflict.

During that window, all of these statements can be true:

the network path is healthy because receive_lsn is advancing
failover may be safer than the dashboard looks because flush_lsn contains the commit
standby reads are still stale because replay_lsn has not reached the commit

This distinction is the heart of ordered apply. Received bytes are not yet durable. Durable bytes are not yet visible. Visible state is the replayed prefix, not the shipped prefix.

Different operational questions need different watermarks. "Can the dashboard show this reservation?" is a replay question. "Would the standby retain this history if it crashed now?" is a flush question. "Is the replication link saturated?" is a receive question.

Worked Example: The Stale Dashboard Incident

At 09:31, support opens the dashboard and does not see R-88421, even though the trader received a successful reservation response. The incident channel starts with the wrong question: "Is replication connected?"

The replica is connected. In fact, its positions look like this:

primary latest LSN: 8A/90
standby receive_lsn: 8A/90
standby flush_lsn:   8A/90
standby replay_lsn:  8A/40

The log has arrived and is durable on the standby, but the commit record at 8A/58 is not yet replayed. A long-running dashboard query is holding a snapshot that delays apply. From the failover perspective, the commit may be recoverable. From the read-scaling perspective, the dashboard is stale.

The fix depends on the real question:

Symptom                         Correct signal       Likely response
------------------------------  -------------------  -----------------------------------
dashboard stale                 replay_lsn lag        reduce query conflicts or add read tier
remote durability uncertain      flush_lsn lag         inspect standby disk or sync policy
transport falling behind         receive_lsn lag       inspect WAL sender, network, retention
standby cannot resume after gap  retained WAL window   reseed or increase retention controls

The useful lesson is not that log shipping is fragile. It is that log shipping gives clear internal positions, and those positions make the system diagnosable when the team reads them correctly.

Retention and Apply Pressure

Ordered apply depends on a continuous log prefix. If the standby disconnects for long enough and the primary recycles the needed WAL segment, the standby cannot infer the missing middle from later records. It must be reseeded from a base backup, snapshot, or another retained source.

That creates a retention trade-off. Keeping more WAL gives lagging replicas a longer recovery window, but it consumes storage and can create pressure on the primary. Retaining too little makes ordinary maintenance or a network outage turn into an expensive rebuild.

Apply speed is the second pressure. Harbor Point's standby may receive WAL quickly but replay slowly because of:

insufficient standby I/O
long-running reads that conflict with replay
bursts of primary write volume at market open
replay work that touches hot indexes or large pages

Read scaling is therefore not free. A standby that serves expensive dashboard queries can make itself less fresh by delaying the same apply path that keeps it useful. The design must decide whether that standby is primarily for failover, analytics, low-latency regional reads, or some carefully isolated mix.

Failure Modes

Treating connection health as freshness. A connected standby can still be stale if replay is behind. Watch replay position for user-visible reads.

Using replay lag to judge failover safety alone. A standby may have flushed WAL that it has not replayed yet. Promotion can continue recovery from durable WAL, so failover analysis also needs flush position.

Letting a standby fall off the retained log. Log shipping requires a continuous history. If the primary discards segments before the standby consumes them, the standby usually needs reseeding.

Running heavy reads on the same replica that must stay fresh. Long queries and I/O saturation can delay apply. Separate freshness-sensitive replicas from heavy analytical reads when the workload demands it.

Resources

[DOC] PostgreSQL Documentation: Log-Shipping Standby Servers
- Focus: Read for the connection between WAL, standby recovery, and streaming replication.
[DOC] PostgreSQL Documentation: Replication Monitoring
- Focus: Compare sent, write, flush, and replay positions as real operational watermarks.
[DOC] MySQL Reference Manual: Replication Threads
- Focus: Notice the same split between fetching changes and applying them through relay-log machinery.
[BOOK] Designing Data-Intensive Applications
- Focus: Review replication logs, follower catch-up, and the idea of replicas as state machines consuming ordered history.

Key Takeaways

Log shipping is reliable because the standby replays the primary's ordered recovery history, not an informal list of changed rows.
Received, flushed, and replayed positions answer different questions about transport, durability, and read freshness.
A standby can be failover-useful before it is query-fresh, and query-freshness depends on ordered apply reaching the relevant commit.
Retention, replay throughput, and read conflicts determine whether a log-shipping replica remains trustworthy under real workload pressure.

← Back to Consistency and Replication

← Back to Distributed Systems

← Back to Learning Hub