LESSON
Day 503: Materialized Views and Incremental Recompute
The core idea: A materialized view stores the answer to an expensive query, but the production problem is not the first computation. It is keeping that answer correct as new events, late corrections, and logic changes arrive, which is why incremental recompute has to be a deterministic, replayable maintenance process instead of a series of ad hoc table updates.
Today's "Aha!" Moment
In 030.md, PayLedger gave its merchant dashboard a serving table called merchant_balance_current so /balance could return in well under the product latency budget. That solved the read path, not the data problem. Every capture_settled, refund_posted, chargeback_opened, and reserve-policy change now has to move that row toward the same answer finance would get by replaying canonical history. The materialized view is therefore not just a precomputed table. It is a promise that fast reads and durable facts will keep converging.
The non-obvious point is that incremental recompute does not mean "run the whole query more often" and it does not mean "apply deltas forever." It means the system can identify exactly which keys or partitions became invalid, recompute only that scope from trustworthy inputs, and publish the result without exposing a half-updated view. Teams get into trouble when they treat the view as a cache that can be patched manually during incidents. PayLedger needs the balance view to be deterministic enough that the same business logic works both event by event and during replay after a bug fix.
Why This Matters
The pressure is easy to make concrete. The merchant UI needs a balance answer on every page load, so scanning raw settlement and refund tables on demand is too slow and too expensive. Finance still needs an auditable explanation of that same balance at the end of yesterday, after late files and corrections are applied. If PayLedger recomputes the whole balance table every time a payment event lands, compute cost explodes. If it only issues narrow UPDATE statements against the serving table, the answer eventually drifts when retries, out-of-order events, or rule changes hit production.
The trade-off is direct: materialization buys predictable read latency, but it pushes complexity into dependency tracking, invalidation, replay tooling, and reconciliation. That extra machinery is worth paying for only if the view can be repaired from canonical data. Otherwise the platform has traded a slow query for a fast-but-untrustworthy product surface.
Core Walkthrough
Part 1: Grounded Situation
PayLedger maintains a materialized view keyed by (merchant_id, currency). Each row contains at least available_balance, pending_balance, reserve_hold, last_event_time, and the source watermark that produced the row. The view depends on two upstream inputs:
- payment events such as captures, refunds, payouts, and chargebacks
- versioned reserve policies that can change by merchant segment and effective date
That dependency mix is what makes materialized views interesting. A newly settled capture for merchant m_482 usually affects one row and can be applied cheaply. A reserve-policy change for all high-risk marketplace merchants is different. It can invalidate thousands of rows, and those rows cannot be patched safely unless the recompute path can replay the affected history with the new policy version.
The naive designs fail in opposite directions. A full refresh every minute is mechanically simple, but it repeatedly scans far more history than the change warrants. Direct row updates look efficient, but they silently encode assumptions about ordering, deduplication, and policy history. The first duplicated Kafka delivery or retroactive rule change exposes that those assumptions were never modeled explicitly.
Part 2: Mechanism
For PayLedger, materialized-view maintenance starts with a dependency graph:
canonical payment events + policy snapshots
|
v
change classifier
|
+----------+-----------+
| |
v v
local delta path scoped replay path
| |
+----------+-----------+
v
versioned materialized view snapshot
The local delta path handles changes whose impact is narrow and well understood. When a refund_posted event arrives, the maintainer can map that event to one (merchant_id, currency) key, apply the balance transition, and advance the view watermark. That path is fast because the dependency surface is small and the fold function is deterministic.
def apply_event(row, event):
if event.type == "capture_settled":
row.pending_balance += event.amount
elif event.type == "refund_posted":
row.available_balance -= event.amount
elif event.type == "chargeback_opened":
row.available_balance -= event.amount
row.reserve_hold += event.amount
row.last_event_time = max(row.last_event_time, event.event_time)
return row
That function is only safe if the inputs are already normalized around identity, ordering, and correction semantics. In practice PayLedger stores dedup keys, source offsets, and policy versions alongside the view state so a retry or failover does not double-apply a change. This is where the earlier lessons on event-time handling and checkpoint recovery matter: the view update and the consumed source position need to move forward as one atomic unit.
The scoped replay path exists for invalidations that are too wide or too subtle for direct deltas. Suppose risk changes the reserve formula from 10% to 15% for a merchant segment effective three days ago. The existing rows do not retain enough intermediate detail to patch every balance safely with a simple arithmetic adjustment, because some funds have since been paid out, refunded, or already moved into hold states. The correct response is to mark the affected merchants and time range as dirty, reload canonical events plus policy snapshots from the effective boundary, fold them again, and write replacement rows to a new snapshot. Readers switch only after the replacement slice is complete.
That is what incremental recompute means in practice: not a global rebuild, but a bounded replay over the exact part of the graph that lost validity. Good systems make the boundary explicit. PayLedger tracks which source tables and policy versions contributed to each published slice, so operators can answer a concrete question during an incident: "Which balances are stale, and what input change invalidated them?"
Part 3: Implications and Trade-offs
This design changes what "cheap" means. Reads become cheap because the dashboard is no longer paying the full aggregation cost. Writes and changes become more expensive because the platform now maintains lineage between inputs and view rows. Every new upstream dependency makes the serving path richer for reads and wider for recompute.
The granularity of the view is the first major trade-off. A very narrow view keyed only by merchant and currency keeps the read path simple, but any rule that depends on settlement age, payout state, or fraud status forces the maintainer to recover that context during replay. A wider view with extra dimensions can make recompute more precise, but it also increases storage cost and the number of rows that must be kept consistent.
The second trade-off is between pure delta maintenance and replayability. Delta application is efficient for append-only facts whose effects are local. Replay is safer when logic changes, late data lands outside the incremental window, or reference data fans out broadly. Mature teams keep both paths. If PayLedger can only process new events and cannot rebuild an affected slice, the materialized view eventually turns into operational debt disguised as performance optimization.
Operationally, the view also needs its own correctness signals. Freshness is not only "how old is the last update?" It is also "what is the oldest dirty key?" and "which policy snapshot is this row based on?" That is why production maintenance typically includes periodic reconciliation jobs that compare a sample or partition of the materialized view against a trusted recomputation from canonical history. Exactly-once delivery reduces one failure mode; it does not prove the projection logic is still correct.
Failure Modes and Misconceptions
- "A materialized view is just a cache." A cache can often be dropped and repopulated lazily. A balance or reserve projection needs deterministic rebuild rules, source lineage, and atomic publication because product decisions depend on its exact contents.
- "Incremental recompute means applying deltas forever." Some invalidations are not local. Policy rewrites, schema fixes, and historical corrections often require replay over a bounded slice instead of another incremental patch.
- "If the source tables are right, the view is right." The maintainer can still drift because of buggy folds, missing invalidations, or non-atomic cutovers. Reconciliation is part of the design, not optional cleanup.
- "One upsert per event is enough." Retries, out-of-order delivery, and backfills force the system to reason about identity, event time, and prior source position. Without that metadata, the same event can change the view twice.
- "Rule changes only affect future rows." Any policy with an effective date in the past can invalidate already published answers. The maintenance path has to support replay from the correct historical boundary.
Connections
- 030.md separated canonical history, analytical surfaces, and serving tables. This lesson explains how the serving table stays aligned with canonical facts once it exists.
- 029.md introduced the idea that batch replay and streaming updates must share one semantic definition of derived state. Materialized-view maintenance is where that requirement becomes concrete row by row.
- 032.md follows from this directly. Once you understand how to maintain and repair a projection, you can evaluate which serving-layer pattern is appropriate for dashboards, APIs, and feature computation.
Resources
- [BOOK] Designing Data-Intensive Applications
- Focus: Read the chapters on derived data and stream processing for the distinction between canonical facts, cached projections, and replay after corrections.
- [DOC] PostgreSQL Materialized Views
- Focus: Use this as a baseline for how simple refresh-based materialization works before comparing it with incremental maintenance in streaming or hybrid systems.
- [PAPER] The Dataflow Model
- Focus: Pay attention to how correctness, latency, and cost are balanced when updates arrive out of order and recomputation must stay bounded.
- [DOC] Apache Flink Checkpointing
- Focus: Review how operator state and source positions are persisted together, because atomic state advancement is central to trustworthy incremental view maintenance.
- [DOC] dbt Incremental Models
- Focus: Contrast warehouse-style incremental table maintenance with serving-table maintenance, especially around invalidation boundaries and when a full refresh is still required.
Key Takeaways
- A materialized view is a maintenance contract, not just stored query output. The hard problem is preserving correctness as inputs and business logic change.
- Incremental recompute is about bounded invalidation. Safe systems know which keys or partitions became stale and can replay only that scope from canonical inputs.
- Fast reads move complexity upstream. You trade query latency for lineage, replay tooling, and reconciliation work, so the view must remain rebuildable.
- Delta updates and replay are complementary. Event-by-event maintenance handles local change cheaply, while scoped recompute repairs the cases that simple deltas cannot model safely.