LESSON

Database Engine Internals and Implementation

032 30 min advanced

Day 409: Secondary Indexes and Covering Strategies

The core idea: Once 08.md makes secondary-index maintenance crash-safe, the next question is why the engine keeps those extra trees at all: a secondary index gives the table another search order, and a covering index works only when carrying extra payload in the leaf pages is cheaper than repeated trips back to the base rows.

Today's "Aha!" Moment

At Harbor Point, the reservations table is physically organized by reservation_id because inserts arrive in commit order and the primary key is stable. That layout is fine for fetching one reservation by ID. It is terrible for the desk lead's opening dashboard query: "show all open reservations for issuer MUNI-77, newest first, with quantity and limit price." The heap pages are not arranged by issuer or status, so without another access path the engine must scan large parts of the table just to find a small working set.

A secondary index is that extra access path. It is not a duplicate table and it is not merely a faster lookup cache. It is a second ordered structure whose leaf entries map a different key, such as (issuer, status), back to the base row. The index lets the engine jump directly to "all open MUNI-77 reservations," but the leaf entry usually contains only the search key plus a row locator, so the engine still has to visit the heap or clustered primary record to fetch the rest of the columns and confirm visibility.

That is where covering changes the design. If Harbor Point extends the leaf entry to also carry quantity, limit_price, and submitted_at, many dashboard reads can be satisfied from the index pages alone. The read path gets shorter, but the index pages become wider, splits happen sooner, memory residency changes, and every write now has more bytes to maintain and recover. The win is real only when those read savings outweigh the extra write and storage cost.

The misconception to drop is "using an index means the table is no longer involved." Sometimes it still is, because the index only narrows candidate rows. Sometimes it is not, because the index covers the query. Production tuning depends on knowing which of those two worlds you are actually in.

Why This Matters

Harbor Point's market-open dashboard is read-heavy for fifteen minutes and then mostly irrelevant for the rest of the day. During that burst, traders repeatedly filter by issuer and status, sort by recency, and project only a handful of fields. If the engine answers that query through a secondary index that still forces thousands of heap visits, the plan can look "indexed" in the query output while tail latency remains dominated by random page reads and buffer misses.

The obvious reaction is to add more indexes or to widen the existing one until the dashboard becomes index-only. That helps reads, but it pushes cost onto the write path. Every new reservation, cancellation, or quantity update must now modify the base row and one or more secondary structures. Those modifications generate WAL, can trigger page splits, enlarge the working set, and make recovery responsible for more page families than the heap alone.

This trade-off is production-critical because the wrong index design often fails asymmetrically. Benchmarks over a warm cache make the read path look excellent, while the live system later shows slower inserts, more vacuum or purge work, or unstable plans once the index grows beyond memory. Understanding secondary indexes and covering strategies is how you decide whether you are solving the real bottleneck or moving it.

Learning Objectives

By the end of this session, you will be able to:

Explain what a secondary index stores and why it needs indirection - Describe how a non-primary search structure maps alternate keys back to base rows.
Analyze when an index actually covers a query - Distinguish between an index-assisted plan that still needs heap lookups and a plan that can return projected columns from the index leaf pages.
Evaluate the production trade-off - Reason about when lower read latency is worth the added write amplification, storage growth, and maintenance burden.

Core Concepts Explained

Concept 1: A secondary index is an alternate order over the table, not a second copy of it

The Harbor Point heap is organized for durable row storage, not for every predicate the application might issue. When the desk asks for open reservations by issuer, the engine needs a way to navigate by (issuer, status) even though the base table is laid out by reservation_id. A secondary index solves that by storing entries ordered by the secondary key and pointing each entry back to the row.

For a B-tree implementation, the leaf entry usually looks conceptually like this:

secondary key                    row locator
---------------------------------------------------------
("MUNI-77", "open", 2026-03-31) -> heap TID 842:17
("MUNI-77", "open", 2026-03-31) -> heap TID 844:03
("MUNI-77", "open", 2026-03-31) -> heap TID 850:91

The row locator is engine-specific. PostgreSQL stores a tuple ID that points into the heap. InnoDB stores the primary-key value because the primary key itself defines the clustered storage order. The important mechanism is the same: the secondary structure narrows the candidate set, then another lookup reaches the full row.

That indirection exists because duplicating the whole row into every secondary index would make writes and recovery explode. Harbor Point already learned in 08.md that every page the engine mutates must participate in WAL and recovery. If each issuer/status index entry tried to be a full record copy, a simple quantity change would fan out into much heavier page churn. Secondary indexes stay compact by default and pay the extra heap visit only when the query needs more data.

The production trade-off is therefore structural. A secondary index speeds predicates on columns that are not the storage order, but it introduces another tree whose pages must be cached, split, vacuumed, checkpointed, and recovered. That is usually worthwhile, but it is never free.

Concept 2: Covering is about eliminating base-row fetches for a specific query shape

Harbor Point's dashboard projects four columns: issuer, status, quantity, and limit_price, then sorts recent open reservations by submitted_at. If the index key is only (issuer, status), the engine can find candidate rows quickly but still has to fetch each matching reservation from the heap or clustered primary record to get quantity, limit_price, and often submitted_at. That means thousands of random lookups when the result set is large.

A covering strategy changes the leaf payload so the requested columns are already present where the search lands. In SQL terms, Harbor Point might use an index shaped like:

CREATE INDEX idx_reservations_issuer_status
ON reservations (issuer, status, submitted_at DESC)
INCLUDE (quantity, limit_price);

Now the index leaf can satisfy the filter, the order, and the projected columns for that dashboard query. The engine can walk the leaf range for ("MUNI-77", "open") and emit rows directly from index pages instead of bouncing back into the heap for every match.

But "covering" is not identical across engines. PostgreSQL can perform an index-only scan only when the needed columns are present and the visibility map says the referenced heap pages are all-visible; otherwise it still consults the heap for MVCC visibility. InnoDB stores the primary key in every secondary leaf, so a secondary lookup still becomes a second B-tree traversal unless the projected columns are fully available in the secondary index entry. The mechanical lesson is that coverage is about the physical read path, not merely the SQL definition.

The trade-off shows up inside the leaf pages. Adding quantity and limit_price makes each entry wider, which reduces fan-out, increases tree height sooner, and makes cache pressure worse. A covering index is excellent for a small, repetitive query surface like Harbor Point's desk dashboard. It is a poor default if the included columns are wide, volatile, or rarely queried together.

Concept 3: The read win from covering must be balanced against write amplification and lifecycle cost

Suppose Harbor Point adds the covering index above and the dashboard p95 drops from 180 ms to 22 ms. That is a real gain, but the engine has also changed the write path for every reservation event. Inserts add a new heap row plus a new secondary leaf entry. Updates that touch indexed key columns may delete and reinsert index entries. Even updates that touch only included columns still have to rewrite the leaf payload. Each of those changes generates WAL and may trigger page splits or extra purge work later.

This is why secondary-index design belongs in storage-engine reasoning, not just query tuning. Wider leaf pages mean fewer entries per page, which makes checkpoint I/O and cache residency less forgiving. More secondary structures also widen the blast radius of recovery: a crash during a quantity update must reconcile both heap and index pages correctly, or Harbor Point risks an index that points to an old value or a row that has no matching search entry. The recovery invariants from 08.md are still active here; there are simply more pages participating.

Operationally, the best covering strategy is tightly scoped. Harbor Point should cover the market-open dashboard because the read pattern is stable, latency-sensitive, and projects a narrow set of columns. It should not automatically cover ad hoc back-office queries with ten wide attributes, because that would drag a specialized access path into the hot write workload. Production index design is selective on purpose.

This lesson also sets up the next one. Once you accept that the index key defines the search order and that coverage is a separate leaf-payload decision, the next design question is how to order multiple key columns so the planner gets good selectivity, sorting support, and prefix reuse. That is the job of 10.md.

Troubleshooting

Issue: The query plan says Index Scan, but latency is still high.

Why it happens / is confusing: An index scan can still perform thousands of heap or clustered-primary lookups after finding candidate keys. The plan is using the index to narrow the search, not to answer the query entirely from index pages.

Clarification / Fix: Check whether the projected columns and ordering are covered by the index, and inspect buffer activity. If heap fetches dominate, a covering strategy may help. If the predicate is simply not selective enough, the real issue may be key order, which 10.md addresses.

Issue: Adding included columns made writes slower and the index much larger.

Why it happens / is confusing: Covering improves reads by widening leaf entries. Wider entries reduce fan-out, increase split frequency, and enlarge the memory footprint of the tree.

Clarification / Fix: Cover only the columns needed by a stable, high-value query shape. Remove speculative included columns that do not save enough heap reads to justify their storage and maintenance cost.

Issue: A supposedly index-only plan still touches the heap in PostgreSQL.

Why it happens / is confusing: Coverage of projected columns is necessary but not sufficient. PostgreSQL also needs visibility information proving the heap pages are all-visible; otherwise it must check the heap tuple for MVCC correctness.

Clarification / Fix: Inspect visibility-map health and vacuum behavior. The index may be structurally covering, but MVCC visibility rules from 05.md can still require heap access.

Issue: Secondary indexes seem to multiply recovery time after a crash.

Why it happens / is confusing: Every insert, delete, and update affecting indexed columns creates additional page updates and WAL records beyond the heap change itself. Recovery has more history to replay and more page state to reconcile.

Clarification / Fix: Audit whether each secondary index serves a real workload. Unused or marginal indexes are not harmless metadata; they are recurring write and recovery work.

Advanced Connections

Connection 1: 05.md explains why "covering" does not always mean "heap-free"

MVCC snapshots decide which row versions are visible, and some engines cannot prove visibility from the secondary index entry alone. That is why a query can be structurally covered yet still touch the base row. Secondary-index design and visibility rules are coupled, not independent topics.

Connection 2: 08.md explains why secondary indexes must be maintained with the same crash-safety discipline as heap pages

An insert into Harbor Point's reservations table is not durable merely because the heap row can be redone. If the secondary leaf entry for (issuer, status) is missing or half-applied after recovery, the query path is wrong even though the base row exists. Secondary indexes are part of the committed state, so they inherit WAL, redo, and undo obligations.

Resources

Optional Deepening Resources

[DOC] PostgreSQL Documentation: Index-Only Scans and Covering Indexes
- Focus: How covering works in PostgreSQL, including why MVCC visibility can still force heap checks.
[DOC] MySQL 8.4 Reference Manual: Clustered and Secondary Indexes
- Focus: How InnoDB stores secondary entries and why a secondary lookup typically requires a second traversal through the clustered primary key.
[DOC] SQLite Query Planner: Covering Indexes
- Focus: A compact explanation of why returning data directly from the index can remove extra table lookups.
[BOOK] Database Internals
- Focus: Storage-engine trade-offs around B-trees, page layout, and index maintenance cost.

Key Insights

A secondary index is an alternate search order with indirection - Its normal job is to narrow the candidate rows quickly, not to replace the base table entirely.
Covering is a physical read-path optimization - A query is truly covered only when the engine can satisfy filtering, ordering, projection, and visibility requirements without repeated base-row fetches.
Read speed is purchased with write and recovery work - Every extra or wider secondary index adds page churn, WAL volume, cache pressure, and operational maintenance cost.

← Back to Database Engine Internals and Implementation

← Back to Learning Hub