LESSON

027 30 min advanced

Day 426: Materialized Views and Incremental Maintenance

The core idea: A materialized view stores a query result as real state, and incremental maintenance keeps that state current by applying only the base-table deltas that matter, but the design is trustworthy only when inserts, updates, deletes, replay, and freshness are modeled explicitly.

Today's "Aha!" Moment

In 09.md, Harbor Point built a WAL-based CDC pipeline so committed reservation changes could leave PostgreSQL in commit order instead of through a fragile poller. That solved the extraction problem. It did not solve the read problem. Risk officers still need a dashboard that answers a question like "how much open municipal-bond exposure does desk ALPHA currently hold with issuer CA-MUNI?" in a few milliseconds, even during the 09:30 market burst.

The naive query is easy to write and painful to run repeatedly:

SELECT desk_id, issuer_id, SUM(notional) AS open_notional, COUNT(*) AS reservation_count
FROM bond_reservations
WHERE status = 'open'
GROUP BY desk_id, issuer_id;

That query is fine when an engineer runs it once. It is a bad serving plan when dozens of dashboards and alerts run it every few seconds against hot OLTP tables. A normal view does not help, because it still reruns the query each time. A materialized view changes the contract: the system persists the answer itself, so reads become cheap.

The second realization is the important one. If Harbor Point recomputes the whole view every minute, the dashboard becomes a sawtooth: very fresh right after refresh, clearly stale just before the next one, and expensive every time the rebuild scans the source tables. Incremental maintenance avoids that by converting each committed base change into a delta against the derived state. Reserve CA-MUNI-77 for ALPHA, and the row for (ALPHA, CA-MUNI) should move by +2_000_000 notional and +1 reservation. Release it, and the same row should move back down.

That is why materialized views are not just "cached queries." In production they are maintained state machines. The hard part is not persisting the result once. The hard part is guaranteeing that every change to the source tables is reflected in the derived tables exactly as the chosen freshness contract promises. The next lesson, 11.md, matters because even a correct materialized view can still look stale if clients read it from replicas with lag.

Why This Matters

Harbor Point's trading workflow generates a steady stream of small writes: reserve a bond, partially fill it, release the remainder, transfer exposure between desks, cancel stale holds. The dashboard query that risk officers use is read-heavy, but its answer depends on those writes being incorporated quickly and correctly. If every dashboard request scans bond_reservations, joins issuer metadata, and recomputes aggregates from scratch, the read path competes directly with the write path that keeps market activity moving.

Materialized views matter because they let the system spend work once on the write side so many readers can reuse the result cheaply. For Harbor Point, that means the dashboard can hit a compact table such as desk_issuer_exposure_mv keyed by (desk_id, issuer_id) instead of repeating a large aggregation over live reservation rows. Query latency becomes predictable, and the primary database stops doing the same grouping work over and over.

But the operational gain comes with a tighter correctness burden. A materialized view is useful only if its update rules are complete. If an open -> released transition forgets to subtract notional, the dashboard lies. If a replayed CDC batch applies the same delta twice, exposure doubles. If the team promises "under two seconds stale" but cannot expose the view's last applied log position, operators have no way to distinguish a truly fresh dashboard from one that simply answers quickly with old data.

The production question is therefore not "should we cache this query?" It is "which derived state is valuable enough to maintain continuously, and what consistency contract can we actually defend?" That framing turns materialized views from a performance trick into a storage design choice.

Learning Objectives

By the end of this session, you will be able to:

Explain what a materialized view stores and why full recomputation is often the wrong serving strategy - Distinguish ordinary views, periodic refreshes, and persisted derived state in Harbor Point's risk workflow.
Trace incremental maintenance from a committed base-table change to a view-row update - Show how inserts, updates, deletes, and replay-safe offsets become deltas against desk_issuer_exposure_mv.
Evaluate the production trade-offs of different maintenance strategies - Choose among immediate, asynchronous, and periodic refresh approaches based on write cost, freshness targets, and recovery complexity.

Core Concepts Explained

Concept 1: A materialized view stores the answer, not just the query

An ordinary SQL view is a saved query definition. When Harbor Point reads it, PostgreSQL still has to execute the joins, filters, and aggregates against the base tables at read time. That can be perfectly reasonable for light workloads or exploratory analysis. It does nothing to reduce repeated read cost for a high-traffic dashboard.

A materialized view stores the derived rows themselves. If Harbor Point persists one row per (desk_id, issuer_id) with open_notional, reservation_count, and a freshness marker such as last_applied_lsn, dashboard reads can become a simple indexed lookup instead of a repeated aggregation over the write-heavy bond_reservations table. The system has effectively moved work from query time to maintenance time.

That distinction explains why built-in materialized views and incrementally maintained projection tables are often discussed together. In PostgreSQL, REFRESH MATERIALIZED VIEW usually rebuilds the stored result by rerunning the defining query. In other systems, or in custom application tables, the derived rows may be updated continuously as base data changes. Both patterns persist derived state. The real difference is how the stored result becomes fresh again.

For Harbor Point, a once-per-minute full refresh is already too coarse for live exposure monitoring. The read path becomes fast, but freshness oscillates and the refresh job still scans the whole source dataset. That is acceptable for overnight reporting. It is the wrong fit for a market-open dashboard that must reflect a release or fill almost immediately.

The trade-off is now visible. Fast reads are purchased with extra storage, write-side maintenance, and rebuild procedures. If the view definition changes from "sum open notional" to "sum only notional above risk threshold," Harbor Point must either backfill the view from the source tables or replay the relevant history. Persisted answers are valuable, but they create their own lifecycle.

Concept 2: Incremental maintenance applies deltas, not full recomputations

Incremental maintenance starts from one simple question: after a single committed change, which derived rows are now different, and by how much? For Harbor Point's desk_issuer_exposure_mv, an inserted open reservation for desk_id='ALPHA', issuer_id='CA-MUNI', notional=2_000_000 means one group should change by (+2_000_000, +1). A release of that same reservation means the group should change by (-2_000_000, -1).

Updates require more care than inserts because an update is usually two logical actions at once: remove the old contribution and add the new one. If a reservation moves from desk ALPHA to desk BETA, the correct delta is not "add to BETA." It is "subtract from (ALPHA, CA-MUNI) and add to (BETA, CA-MUNI) in the same committed change set." If status changes from open to released, the old row contributed to the aggregate but the new row does not, so the net effect is a subtraction only.

That logic is why production systems often normalize every change into before and after records before touching the view. The maintenance code does not ask "what SQL statement did the application run?" It asks "what contribution did the old row make to the view, and what contribution does the new row make now?" The delta is contribution(after) - contribution(before).

Harbor Point can drive this from the ordered CDC stream introduced in 09.md:

bond_reservations commit
        |
        v
  WAL / CDC event with before+after images
        |
        v
  delta builder groups net changes by (desk_id, issuer_id)
        |
        v
  upsert into desk_issuer_exposure_mv
        |
        v
  advance last_applied_lsn / freshness watermark

The upsert itself can be small and mechanical:

INSERT INTO desk_issuer_exposure_mv (
    desk_id,
    issuer_id,
    open_notional,
    reservation_count,
    last_applied_lsn
)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (desk_id, issuer_id) DO UPDATE
SET open_notional = desk_issuer_exposure_mv.open_notional + EXCLUDED.open_notional,
    reservation_count = desk_issuer_exposure_mv.reservation_count + EXCLUDED.reservation_count,
    last_applied_lsn = GREATEST(
        desk_issuer_exposure_mv.last_applied_lsn,
        EXCLUDED.last_applied_lsn
    );

Here EXCLUDED.open_notional and EXCLUDED.reservation_count are the already computed deltas, not full recomputed totals. That distinction matters. If Harbor Point replays the same source transaction, it must detect that replay before applying the delta again. If several reservation rows in one transaction hit the same (desk_id, issuer_id) key, it is usually better to net those changes before updating the view so the view reflects the transaction as one state transition instead of a burst of intermediate states.

Concept 3: Freshness and recovery are part of the design, not afterthoughts

There are three common maintenance styles, and each changes the contract Harbor Point can offer. Immediate maintenance updates the view in the same transaction as the base write, often through engine support or trigger-like logic. That gives the strongest consistency because once the base row commits, the view is already correct. The cost is higher write latency, more contention on hot aggregate keys, and tighter coupling between OLTP writes and read-model maintenance.

Asynchronous maintenance decouples the two. Harbor Point can commit the reservation first, stream the change through CDC, and let a projection worker update desk_issuer_exposure_mv moments later. That usually improves write throughput and operational separation, but it creates a staleness window. The window is only manageable if the view exposes a watermark such as last_applied_lsn or last_applied_commit_ts. A "fast" dashboard response is not enough; operators need to know how current that answer is.

Periodic full refresh is the simplest operationally and often the weakest semantically. It can be enough for end-of-day reporting, where a fifteen-minute lag is acceptable and rebuild cost is predictable. It is a poor fit for a risk screen that must reflect a cancelled hold before another trader relies on the old exposure number. Incremental maintenance exists because many production uses sit between those extremes: readers need low latency and bounded staleness, but full synchronous maintenance on every write would cost too much.

Recovery planning makes the trade-off sharper. If Harbor Point discovers a bug in the delta logic or changes the aggregation definition, the materialized view may need a full reconciliation from source state plus a replay of newer log positions. Incremental maintenance does not remove the need for periodic truth checks. It makes them cheaper to run between rebuilds, but the source tables remain authoritative.

This is also where query shape matters. Aggregates keyed by a small, stable grouping set are usually good candidates for incremental maintenance. Views with wide fan-out joins, non-deterministic functions, or heavy reclassification logic may be much harder to maintain correctly because one base-row change can force many derived-row changes. The right question is not whether a query can be materialized. It is whether the dependency graph and delta surface stay understandable under failure and change.

Troubleshooting

Issue: The dashboard totals drift upward after a projection worker restart.

Why it happens / is confusing: Replay is normal in an at-least-once CDC pipeline. If the worker republishes or reapplies a source transaction without deduplicating by source position or event identity, the same exposure delta is added twice even though the base tables are correct.

Clarification / Fix: Store replay-safe source metadata such as transaction ID, event ID, or last applied LSN, and make delta application idempotent at that boundary. Recompute a sample of groups from the source tables regularly to catch drift early.

Issue: The materialized view answers quickly, but risk officers still complain that recent releases are missing.

Why it happens / is confusing: Query latency and freshness are different properties. The view table may be indexed and fast to read while the maintenance worker is several seconds behind the source commit stream.

Clarification / Fix: Monitor freshness directly with backlog depth and last applied source position. If the product promise is "near real time," expose the watermark in dashboards and alerts instead of assuming low query latency means fresh data.

Issue: Moving a reservation between desks creates a negative count in one group and an inflated count in another.

Why it happens / is confusing: The maintenance code treated an update as an overwrite instead of decomposing it into "remove old contribution, add new contribution." Aggregates hide this bug until specific key changes occur.

Clarification / Fix: Compute contributions from both the before and after row images. Any column that participates in filtering, grouping, or aggregation must be part of the delta logic, not just the numeric measure being summed.

Advanced Connections

Connection 1: 09.md provides the ordered change stream that makes incremental maintenance reliable

CDC from WAL streams gives Harbor Point committed, ordered base-table changes with replay positions attached. Incremental view maintenance consumes that same stream and turns it into derived state. Without ordered source deltas, the view layer would have to infer correctness from periodic rescans or application-side hooks.

Connection 2: 11.md asks what happens when the materialized view is served from lagging replicas

A perfectly maintained view on the primary can still look inconsistent to a client if the client reads from a replica that has not applied the newest changes. Once Harbor Point has a fast derived table, the next design problem is session consistency: when does a user need read-your-writes, and how should routing react to replica lag?

Resources

Optional Deepening Resources

[DOC] PostgreSQL Materialized Views
- Focus: What PostgreSQL materialized views are, how refresh works, and why built-in refresh semantics differ from true incremental maintenance.
[DOC] CREATE MATERIALIZED VIEW | Materialize Documentation
- Focus: A production system that keeps views incrementally updated and makes the maintenance/storage trade-offs explicit.
[DOC] pg_ivm
- Focus: A practical PostgreSQL extension that shows how immediate incremental view maintenance works and where its limitations appear.
[PAPER] DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views
- Focus: The algebraic idea behind propagating deltas through derived queries instead of recomputing entire results.

Key Insights

A materialized view is stored state with a freshness contract - It helps because readers reuse a persisted answer, not because the SQL text became shorter.
Incremental maintenance is delta propagation - Correct systems translate before and after source records into precise additions and subtractions on derived rows.
The write path becomes responsible for read performance - Faster reads come from paying maintenance, replay handling, and reconciliation costs somewhere else in the system.

← Back to Consistency and Replication

← Back to Learning Hub