Watermarks, Late Data, and Window Guarantees

LESSON

Data Architecture and Platforms

027 30 min advanced

Day 499: Watermarks, Late Data, and Window Guarantees

The core idea: A watermark is the stream processor's claim about how far event time has become complete enough to act on; the watermark policy, allowed lateness, and sink behavior together define what a window result actually guarantees in production.

Today's "Aha!" Moment

In 026.md, PayLedger switched its payout-risk monitor from processing time to event time so a delayed bank message would not silently rewrite business meaning. That solved one question, "Which clock defines the truth?" It did not solve the next question from operations: "When is a 15-minute exposure window trustworthy enough to release a payout?" A reserve adjustment from the banking partner might arrive 40 seconds after the payout event it offsets. If the stream job emits the window as soon as wall-clock time passes 12:15, it is not making a principled decision. It is guessing that nothing earlier is still in flight.

Watermarks are the mechanism that turns that guess into an explicit contract. A watermark is not "the current event time" and it is not a promise that no older record will ever arrive. It is a lower bound on completeness derived from observed source progress. Once PayLedger chooses a watermark strategy and an allowed-lateness policy, every emitted window gains a real guarantee: on-time records should already be reflected, revisions are accepted until a named cutoff, and records that miss that cutoff follow a separate correction path. The non-obvious point is that late data does not become manageable when the team tweaks latency. It becomes manageable when the team states, in advance, what the window result means and what the system will do when reality arrives late.

Why This Matters

PayLedger uses the exposure window to decide whether same-day payouts can be released automatically for high-volume merchants. Support agents, finance analysts, and the automated payout service all read the same metric, but they use it differently. The payout service wants a fast answer. Finance wants the answer to match end-of-day replay. Support needs to explain why a merchant was held at 12:16 and released at 12:18 without sounding like the system made up numbers.

That is why window guarantees matter. If the lesson stops at "use event time," downstream consumers still do not know whether a result is provisional, final, or subject to correction. A conservative watermark gives PayLedger fewer revisions and more stable payout decisions, but it increases latency and state retention. An aggressive watermark gives faster decisions, but it increases late updates, retractions, or drops. The production trade-off is not abstract. It decides whether the pipeline behaves like an auditable financial control or like a race between transport jitter and business logic.

Core Walkthrough

Part 1: Grounded Situation

PayLedger keeps a rolling 15-minute exposure window per merchant. The window combines three event streams:

Each event has an event_time taken from the source-of-truth system and an ingest_time added when the stream job receives the record. During a quiet period the values are close together. During a partner queue backup they diverge:

12:14:52  event_time   reserve_adjusted   +$40,000
12:15:03  event_time   payout_submitted   -$35,000
12:15:05  ingest_time  payout_submitted arrives
12:15:41  ingest_time  reserve_adjusted arrives after partner delay

At 12:15:05 the processor has enough information to say that a payout request happened. It does not yet have enough information to say the whole 12:00-12:15 exposure window is complete. That distinction matters because the payout guardrail is attached to the window, not to an individual record. If the job closes the window too early, the merchant is held incorrectly. If it waits forever, the "real-time" product stops being useful.

Watermarks provide the missing completeness frontier. PayLedger configures each source to emit a watermark based on how far event time has advanced relative to the most recent records seen from that source. In a simple bounded-out-of-orderness policy, the source says something like, "I have seen events through 12:15:30, and my normal disorder budget is 60 seconds, so treat 12:14:30 as the safe frontier for on-time data." The processor then combines those frontiers across active partitions and inputs, usually by taking the minimum. The minimum matters because the joined window is only as complete as its slowest input.

Part 2: Mechanism

For PayLedger, the operational path looks like this:

ledger events ----\
partner reserves --- normalize timestamps -> partition by merchant_id
payout events ----/                              |
                                                 v
                                 event-time windows + watermark timers
                                                 |
                              first result, revisions, or late-event side output

Each record is assigned to a window by event_time, not by arrival order. The operator keeps state for that merchant and window until the engine decides the window can move through three phases:

  1. Open: the watermark has not crossed the window end, so the result is still collecting on-time records.
  2. Emitted but revisable: the watermark crossed the window end, so the first result can be published, but allowed lateness still keeps state alive for corrections.
  3. Expired: the allowed-lateness deadline passed, state is evicted, and any further record for that window must go to a late-data side output or a backfill workflow.

That lifecycle is the real meaning of a window guarantee. A first emission does not automatically mean "final forever." It means "complete according to the current watermark contract." If PayLedger allows two extra minutes for late partner adjustments, then a 12:00-12:15 window might first emit at watermark 12:15:00, accept revisions until watermark 12:17:00, and only become immutable after that.

The logic is small enough to sketch:

def handle_event(event, watermark):
    window = window_for(event.event_time, minutes=15)

    if watermark >= window.end + ALLOWED_LATENESS:
        emit_late_side_output(event)
        return

    update_window_state(window, event)

    if watermark >= window.end:
        emit_window_result(window, mode="upsert")

This pseudocode hides the hard parts, but it captures the contract:

That last point is where many production bugs actually live. If the downstream store supports upserts or retractions, PayLedger can publish early and repair later. If the downstream sink is a one-shot alerting channel, early publication may be unacceptable because there is no clean correction path. Window guarantees are therefore not just stream-engine settings. They are part of the end-to-end product contract.

Part 3: Implications and Trade-offs

The cleanest way to think about the trade-off is to ask what kind of mistake PayLedger can afford. A conservative watermark and a longer allowed-lateness period make the exposure window closer to replayed truth. That reduces false payout holds and makes finance reconciliation easier, but it increases decision latency, memory usage, checkpoint size, and operator recovery time because more window state stays live.

An aggressive watermark does the opposite. PayLedger can release payouts faster because windows close sooner, but now the team must handle a larger stream of late revisions. Some of those corrections are operationally fine, such as a dashboard cell updating from $120,000 to $85,000. Some are not, such as an already-sent "merchant blocked" notification that no downstream system can retract cleanly. The trade-off is therefore between latency and completeness, but also between repairable and non-repairable downstream effects.

This is also why one slow partition can dominate the system. If the banking-partner topic goes quiet or one partition stops advancing its watermark, the minimum watermark across the joined computation stalls. Windows stop closing even if the ledger and payout streams are healthy. Production teams often misread that symptom as "the job is slow" when the real issue is "the engine no longer has evidence that event time is complete." Idle-partition handling, per-source observability, and explicit late-data channels are not polish. They are part of making the guarantee intelligible.

The next lesson on 028.md picks up exactly here. Watermarks and allowed lateness tell the engine how long state must live and when timers should fire. Stateful operators and checkpoint recovery explain how that state survives failure without losing the guarantee you just defined.

Failure Modes and Misconceptions

Connections

Resources

Key Takeaways

PREVIOUS Stream Processing and Event-Time Semantics NEXT Stateful Operators and Checkpoint Recovery

← Back to Data Architecture and Platforms

← Back to Learning Hub