LESSON
Day 416: Monthly Capstone: Pick Engine, Indexes, and Isolation
The core idea: Engine choice, index choice, and isolation choice are one design decision. If they do not protect the same workload and the same invariants, the database will be fast in benchmarks and unreliable in production.
Today's "Aha!" Moment
In 15.md, Harbor Point proved that adding risk_bucket safely was not a matter of one clever ALTER TABLE. The migration had to respect hot partitions, cold history, replica readiness, and mixed application versions. The capstone question is broader: what should Harbor Point's reservation database actually be, now that the month has exposed the real pressures on it?
The workload is no longer vague. Between 09:30 and 09:40, the desk sees a burst of short reservation transactions, each of which either consumes issuer headroom or gets rejected. Traders repeatedly ask for "open reservations for issuer MUNI-77, newest first." Operations looks up individual reservations by identifier during incident triage. Compliance runs same-day surveillance queries grouped by risk_bucket, while seven years of older partitions must stay queryable on a colder tier. The system also has to survive online schema changes like the one from 15.md without freezing order intake.
That combination rules out the common mistake of optimizing one dimension in isolation. A write-optimized engine with weak secondary-index ergonomics may look attractive until the desk dashboard turns into compaction-driven tail latency. A read-friendly index portfolio may look perfect until every approval path still relies on a predicate scan that admits write skew. A globally strict isolation level may sound safest until it punishes read-heavy traffic for an invariant that actually lives on one issuer-level summary row.
The useful shift is to treat the design as one contract. Pick the engine that makes Harbor Point's dominant point lookups, ordered scans, and online changes operationally boring. Pick the smallest set of indexes that match those read paths and retention tiers. Then place isolation exactly where the exposure invariant lives. That gives Harbor Point a database it can defend today and a WAL stream that can feed replication in ../27/01.md.
Why This Matters
Harbor Point is deciding between plausible systems, not between a correct answer and an obviously broken one. A log-structured engine would absorb writes well, but it would also make secondary-index maintenance, compaction scheduling, and multi-version read amplification part of the market-open risk budget. A generic key-value store with application-managed indexes would let the team ship quickly, but every schema change and every crash-recovery question would leak into application code. A page-oriented OLTP engine with MVCC and B-tree indexes imposes more up-front structure, yet it matches the way Harbor Point actually reads, writes, and debugs the system.
That is the production relevance of this capstone. By the end of the month, the team is not trying to memorize isolated internals. It is learning to make an architecture decision that survives hotspots, vacuum pressure, online DDL, and recovery. If the choice is coherent, the next module can extend it with replication. If the choice is incoherent, replication will only spread the confusion to more nodes.
Learning Objectives
By the end of this session, you will be able to:
- Choose a storage-engine shape from a concrete workload - Explain why Harbor Point's reservation system fits a page-oriented MVCC engine better than an append-only or LSM-first design.
- Design an index portfolio around real query prefixes - Pick a small set of B-tree indexes that make the desk dashboard, incident lookup, and surveillance queries cheap without bloating every partition.
- Place isolation on the real conflict surface - Show why the issuer-limit invariant should be enforced through a short locked transaction on a summary row instead of a broad predicate scan or a blanket global isolation policy.
Core Concepts Explained
Concept 1: Pick the engine by the shape of reads, updates, and operational change
Harbor Point's reservation store is not an append-mostly event log and it is not a pure analytical warehouse. Rows change status several times during their short hot life: open, routed, partially_filled, filled, or canceled. The desk needs point lookups by reservation identifier, recent ordered scans by issuer, and same-day surveillance filters that became cheaper only after risk_bucket was added in 15.md. The workload is therefore mixed OLTP with ordered reads, not just high write throughput.
That is why the best fit is a page-oriented engine with WAL, MVCC, and B-tree indexes, closer to PostgreSQL or InnoDB than to an LSM-centric design. Harbor Point already cares about stable ordered scans, online DDL behavior, and hot/cold partition control from 14.md. A B-tree engine pays for page splits and random dirty-page writeback, but it gives the desk predictable point and range access paths, mature secondary-index behavior, and a recovery story that lines up cleanly with 07.md and 08.md.
The rejected alternatives fail for specific reasons, not because they are "bad databases":
candidate where it helps why Harbor Point rejects it
heap + B-tree + MVCC stable point/range reads chosen baseline
LSM + secondary indexes high write throughput compaction and read amplification collide with dashboard latency
append-only log + app idx very simple ingest schema evolution and crash consistency leak into service code
The decisive point is operational change. Harbor Point needs to create indexes online, backfill risk_bucket gradually, demote old partitions to colder storage, and still explain what happens after a crash. A page-oriented OLTP engine is not selected because it wins every benchmark. It is selected because its read path, migration path, and recovery path all fit the real workload contract.
Concept 2: Build the smallest index portfolio that matches the real query prefixes
Once the engine is chosen, the next temptation is to index every column that ever appears in a WHERE clause. Harbor Point should resist that. Every extra secondary index increases write amplification, vacuum work, cache pressure, and migration cost. The right question is which reads recur often enough, and with stable enough prefixes, that they deserve their own physical ordering.
Harbor Point has three recurring query families:
- Incident lookup by
reservation_idor external order reference. - Trader dashboard: "open reservations for one issuer, newest first, limit 50."
- Same-day surveillance: "for one trading day, show reservations in a risk bucket range, newest first."
That leads to a tight portfolio:
CREATE UNIQUE INDEX idx_reservations_id
ON reservations (reservation_id);
CREATE INDEX idx_reservations_open_issuer_time
ON reservations (issuer, submitted_at DESC)
INCLUDE (trader_id, notional, risk_bucket)
WHERE status = 'open';
CREATE INDEX idx_reservations_day_bucket_time
ON reservations (trading_day, risk_bucket, submitted_at DESC);
The second index is the most important design move. Harbor Point does not put status in the key because the hot dashboard only cares about open reservations; a partial index makes that predicate part of the structure and keeps the hot path smaller. It also orders by submitted_at DESC, which means the desk can stop after fifty rows instead of scanning deep into one issuer's history. The included columns let the dashboard stay index-only for its most common fields without forcing every query to visit heap pages.
The third index is intentionally different. Surveillance queries are less latency-sensitive and already partition-pruned by trading_day, so Harbor Point keeps that index narrower and more mechanical. It is meant to locate the right slice of same-day data after the risk_bucket migration, not to become a universal covering index for every compliance screen. Older cold partitions may not even keep the hot partial dashboard index once no rows can remain open; that is exactly the kind of tier-aware trimming 14.md made possible.
Concept 3: Choose isolation where the invariant actually lives
The hardest mistake would be to use the new indexes to speed up a logically unsafe transaction. Harbor Point's true invariant is still "total open exposure for one issuer must not exceed its limit." If the approval path checks that rule by scanning open reservations under plain snapshot semantics, the system can still admit write skew just as it did in 02.md. Indexes make the scan faster. They do not make the invariant safe.
The fix is to move the invariant onto a lockable object and serialize only that critical section. Harbor Point keeps an issuer_exposure row per issuer and forces every approval transaction to lock it before checking headroom:
BEGIN;
SELECT reserved_notional, limit_notional
FROM issuer_exposure
WHERE issuer = $1
FOR UPDATE;
-- if reserved_notional + :delta > limit_notional, reject
UPDATE issuer_exposure
SET reserved_notional = reserved_notional + :delta
WHERE issuer = $1;
INSERT INTO reservations (..., issuer, status, risk_bucket, notional)
VALUES (..., $1, 'open', $2, $3);
COMMIT;
This is the isolation choice for Harbor Point: use an MVCC engine so dashboard and surveillance reads do not block the trading path, but make the approval path itself a short lock-based transaction on one summary row. That borrows the clarity of 03.md without forcing a global strict-2PL policy onto every query. It also avoids paying blanket SERIALIZABLE cost for readers that are not enforcing the issuer-limit invariant. Isolation is therefore not one checkbox in a connection string. It is the rule that the one dangerous business invariant must pass through one deterministic conflict point.
That choice has consequences. If some service bypasses issuer_exposure and writes directly to reservations, the design is broken no matter how good the engine or indexes are. If long-running analytics stay on the primary, Harbor Point can still accumulate MVCC cleanup debt from 13.md. The coherent answer is to keep the primary transactionally narrow, let MVCC serve short reads, and carry the ordered WAL forward into the replication primitives of ../27/01.md.
Troubleshooting
Issue: The team chooses an LSM-oriented engine and market-open writes look great, but the trader dashboard becomes erratic at p99.
Why it happens / is confusing: The write benchmark measured ingest, not the full read path. Dashboard queries now pay for compaction side effects, more scattered secondary-index lookups, and higher read amplification right when traders want the newest rows first.
Clarification / Fix: Re-evaluate the engine against the real mixed workload. For Harbor Point's query shapes, predictable ordered B-tree access is more valuable than maximizing raw write throughput.
Issue: Harbor Point adds several indexes, yet the dashboard query still reads too many pages.
Why it happens / is confusing: An index can contain the right columns and still have the wrong prefix. If the key order does not match issuer first and recent time second, or if the hot predicate is not captured as a partial index, the engine still scans a wide slice.
Clarification / Fix: Align the leading key columns with equality predicates and the stop-early ordering. Keep the hot-path index small enough that it stays resident and cheap to maintain.
Issue: Exposure occasionally exceeds the issuer limit even though every approval runs inside a transaction.
Why it happens / is confusing: The transaction boundary is real, but the conflict surface is wrong. A predicate read over open reservations can still admit non-serializable histories if different sessions do not coordinate on the same lockable object.
Clarification / Fix: Require every approval to lock and update the issuer_exposure row before inserting the reservation. If a broader invariant appears later, revisit the design and consider a stronger serializable path for that workflow only.
Advanced Connections
Connection 1: 25.md built the storage substrate; this capstone decides how much database behavior to layer on top
The month 25 capstone produced a minimal durable engine with WAL, B-trees, and bounded recovery. This lesson keeps that storage substrate but adds the realities that a single-process engine could postpone: mixed query shapes, online schema change pressure, and overlapping transactions. The connection is important because transaction design does not replace storage-engine design. It depends on it.
Connection 2: 15.md and ../27/01.md form the operational boundary around this decision
Safe online DDL in 15.md showed that Harbor Point cannot treat storage shape as static. The next module starts using the WAL stream for replication. This capstone sits between them: it chooses an engine and transaction model whose commit order is trustworthy enough to evolve locally now and ship remotely next.
Resources
Optional Deepening Resources
- [DOC] PostgreSQL Documentation: Transaction Isolation
- Focus: How MVCC isolation levels behave in practice, and where stronger guarantees still introduce blocking or retry cost.
- [DOC] PostgreSQL Documentation: Multicolumn Indexes
- Focus: Why index key order must match equality predicates and ordering needs instead of following a generic "most selective first" slogan.
- [DOC] PostgreSQL Documentation: Partial Indexes
- Focus: How to shrink a hot-path index by making a stable predicate part of the index definition.
- [DOC] PostgreSQL Documentation: Declarative Partitioning
- Focus: How partition pruning and local index strategy interact with hot/cold data management.
- [DOC] PostgreSQL Wiki: Serializable
- Focus: A concrete view of serializable snapshot isolation and why it is best reserved for workflows that truly need predicate-level protection.
Key Insights
- The engine choice is really a workload contract - Harbor Point needs stable point reads, ordered scans, online change safety, and bounded recovery more than it needs maximum raw write throughput.
- Indexes should encode recurring query prefixes, not every imaginable filter - A small hot partial index and a narrow surveillance index are more useful than a forest of expensive secondary structures.
- Isolation belongs on the invariant surface - The issuer-limit rule becomes tractable when every approval passes through one locked summary row instead of relying on faster but logically unsafe predicate scans.