Day 499: Watermarks, Late Data, and Window Guarantees

The core idea: A watermark is the stream processor's claim about how far event time has become complete enough to act on; the watermark policy, allowed lateness, and sink behavior together define what a window result actually guarantees in production.

Today's "Aha!" Moment

In 026.md, PayLedger switched its payout-risk monitor from processing time to event time so a delayed bank message would not silently rewrite business meaning. That solved one question, "Which clock defines the truth?" It did not solve the next question from operations: "When is a 15-minute exposure window trustworthy enough to release a payout?" A reserve adjustment from the banking partner might arrive 40 seconds after the payout event it offsets. If the stream job emits the window as soon as wall-clock time passes 12:15, it is not making a principled decision. It is guessing that nothing earlier is still in flight.

Watermarks are the mechanism that turns that guess into an explicit contract. A watermark is not "the current event time" and it is not a promise that no older record will ever arrive. It is a lower bound on completeness derived from observed source progress. Once PayLedger chooses a watermark strategy and an allowed-lateness policy, every emitted window gains a real guarantee: on-time records should already be reflected, revisions are accepted until a named cutoff, and records that miss that cutoff follow a separate correction path. The non-obvious point is that late data does not become manageable when the team tweaks latency. It becomes manageable when the team states, in advance, what the window result means and what the system will do when reality arrives late.

Why This Matters

PayLedger uses the exposure window to decide whether same-day payouts can be released automatically for high-volume merchants. Support agents, finance analysts, and the automated payout service all read the same metric, but they use it differently. The payout service wants a fast answer. Finance wants the answer to match end-of-day replay. Support needs to explain why a merchant was held at 12:16 and released at 12:18 without sounding like the system made up numbers.

That is why window guarantees matter. If the lesson stops at "use event time," downstream consumers still do not know whether a result is provisional, final, or subject to correction. A conservative watermark gives PayLedger fewer revisions and more stable payout decisions, but it increases latency and state retention. An aggressive watermark gives faster decisions, but it increases late updates, retractions, or drops. The production trade-off is not abstract. It decides whether the pipeline behaves like an auditable financial control or like a race between transport jitter and business logic.

Core Walkthrough

Part 1: Grounded Situation

PayLedger keeps a rolling 15-minute exposure window per merchant. The window combines three event streams:

authorization_captured from the card ledger
payout_submitted from the payout service
reserve_adjusted from the banking-partner integration

Each event has an event_time taken from the source-of-truth system and an ingest_time added when the stream job receives the record. During a quiet period the values are close together. During a partner queue backup they diverge:

12:14:52  event_time   reserve_adjusted   +$40,000
12:15:03  event_time   payout_submitted   -$35,000
12:15:05  ingest_time  payout_submitted arrives
12:15:41  ingest_time  reserve_adjusted arrives after partner delay

At 12:15:05 the processor has enough information to say that a payout request happened. It does not yet have enough information to say the whole 12:00-12:15 exposure window is complete. That distinction matters because the payout guardrail is attached to the window, not to an individual record. If the job closes the window too early, the merchant is held incorrectly. If it waits forever, the "real-time" product stops being useful.

Watermarks provide the missing completeness frontier. PayLedger configures each source to emit a watermark based on how far event time has advanced relative to the most recent records seen from that source. In a simple bounded-out-of-orderness policy, the source says something like, "I have seen events through 12:15:30, and my normal disorder budget is 60 seconds, so treat 12:14:30 as the safe frontier for on-time data." The processor then combines those frontiers across active partitions and inputs, usually by taking the minimum. The minimum matters because the joined window is only as complete as its slowest input.

Part 2: Mechanism

For PayLedger, the operational path looks like this:

ledger events ----\
partner reserves --- normalize timestamps -> partition by merchant_id
payout events ----/                              |
                                                 v
                                 event-time windows + watermark timers
                                                 |
                              first result, revisions, or late-event side output

Each record is assigned to a window by event_time, not by arrival order. The operator keeps state for that merchant and window until the engine decides the window can move through three phases:

Open: the watermark has not crossed the window end, so the result is still collecting on-time records.
Emitted but revisable: the watermark crossed the window end, so the first result can be published, but allowed lateness still keeps state alive for corrections.
Expired: the allowed-lateness deadline passed, state is evicted, and any further record for that window must go to a late-data side output or a backfill workflow.

That lifecycle is the real meaning of a window guarantee. A first emission does not automatically mean "final forever." It means "complete according to the current watermark contract." If PayLedger allows two extra minutes for late partner adjustments, then a 12:00-12:15 window might first emit at watermark 12:15:00, accept revisions until watermark 12:17:00, and only become immutable after that.

The logic is small enough to sketch:

def handle_event(event, watermark):
    window = window_for(event.event_time, minutes=15)

    if watermark >= window.end + ALLOWED_LATENESS:
        emit_late_side_output(event)
        return

    update_window_state(window, event)

    if watermark >= window.end:
        emit_window_result(window, mode="upsert")

This pseudocode hides the hard parts, but it captures the contract:

the watermark decides when on-time completeness is good enough for the first publish
ALLOWED_LATENESS decides how long a published window remains revisable
the sink mode decides what downstream sees when a late record changes a prior answer

That last point is where many production bugs actually live. If the downstream store supports upserts or retractions, PayLedger can publish early and repair later. If the downstream sink is a one-shot alerting channel, early publication may be unacceptable because there is no clean correction path. Window guarantees are therefore not just stream-engine settings. They are part of the end-to-end product contract.

Part 3: Implications and Trade-offs

The cleanest way to think about the trade-off is to ask what kind of mistake PayLedger can afford. A conservative watermark and a longer allowed-lateness period make the exposure window closer to replayed truth. That reduces false payout holds and makes finance reconciliation easier, but it increases decision latency, memory usage, checkpoint size, and operator recovery time because more window state stays live.

An aggressive watermark does the opposite. PayLedger can release payouts faster because windows close sooner, but now the team must handle a larger stream of late revisions. Some of those corrections are operationally fine, such as a dashboard cell updating from $120,000 to $85,000. Some are not, such as an already-sent "merchant blocked" notification that no downstream system can retract cleanly. The trade-off is therefore between latency and completeness, but also between repairable and non-repairable downstream effects.

This is also why one slow partition can dominate the system. If the banking-partner topic goes quiet or one partition stops advancing its watermark, the minimum watermark across the joined computation stalls. Windows stop closing even if the ledger and payout streams are healthy. Production teams often misread that symptom as "the job is slow" when the real issue is "the engine no longer has evidence that event time is complete." Idle-partition handling, per-source observability, and explicit late-data channels are not polish. They are part of making the guarantee intelligible.

The next lesson on 028.md picks up exactly here. Watermarks and allowed lateness tell the engine how long state must live and when timers should fire. Stateful operators and checkpoint recovery explain how that state survives failure without losing the guarantee you just defined.

Failure Modes and Misconceptions

"A watermark is just wall-clock time minus a fixed delay." That shortcut is tempting because it sounds simple, but a real watermark is tied to observed event-time progress per source or partition. The fix is to instrument source watermarks directly and treat idle inputs as a first-class condition instead of burying them in latency graphs.
"Once a window emits, the answer is final." That is only true if the policy sets allowed lateness to zero and the sink never accepts revisions. The corrective mental model is that a first result is often provisional. Document whether the stream emits upserts, retractions, or a single final result after a longer wait.
"Late data means the producer is broken." Sometimes it does, but in PayLedger late data is often the normal shape of partner delivery. The design response is to measure lateness distributions from production history and choose watermark lag from those measurements, not from hope.
"Dropping very late records is harmless because the window is already closed." Dropping them silently only hides data loss. A production pipeline needs a side output, dead-letter topic, or backfill job so the business can decide whether the missed event should repair history offline.
"A single slow input only affects that one source." In joined or co-grouped windows, the global watermark often follows the slowest active input. The operational fix is to monitor per-input watermark lag and decide when idle detection, source isolation, or architecture changes are required.

Connections

026.md established that event time is the clock that gives a streaming computation stable business meaning. This lesson adds the completeness frontier that tells the system when a slice of that clock is actionable.
028.md explains the storage side of the same story: once you keep window state and timers alive for watermarks and late data, checkpointing and recovery become part of correctness, not just durability.
../consistency-and-replication/065.md is the storage-oriented analog. Event logs preserve the history of facts; watermarks and window policies determine when a streaming projection is willing to treat that history as complete enough to publish.

Resources

[PAPER] The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
- Focus: Read the sections on watermarks, triggers, and accumulation modes as one design space. They make the latency-versus-completeness trade-off explicit.
[PAPER] MillWheel: Fault-Tolerant Stream Processing at Internet Scale
- Focus: Pay attention to low watermarks, timers, and per-key state. The paper shows how event-time progress becomes an operational mechanism.
[DOC] Apache Beam Programming Guide
- Focus: The windowing and late-data sections are useful because they separate first results, trigger behavior, and allowed lateness instead of treating "window closed" as a single event.
[DOC] Apache Flink Event Time and Watermarks
- Focus: Look at how Flink derives watermarks, handles idle sources, and defines the interaction between event time and timers in real jobs.
[DOC] Kafka Streams Grace Period and Windowing Semantics
- Focus: This is a good counterpoint because it makes the "how long can a window still change?" question visible in an API that many application teams use directly.

Key Takeaways

A watermark is a completeness claim, not a timestamp oracle. It tells the engine how far event time has advanced enough to publish on-time results.
A window guarantee comes from three choices together. Watermark policy, allowed lateness, and sink behavior decide whether results are fast, revisable, final, or dropped.
Late data is a product-contract problem as much as a stream-engine problem. The right policy depends on whether downstream systems can absorb corrections cleanly.
State and recovery are already implicit in this lesson. The moment you keep windows open for late records, checkpointing and timer recovery become part of correctness.

← Back to Data Architecture and Platforms

← Back to Learning Hub