LESSON

Database Engine Internals and Implementation

033 30 min advanced

Day 410: Composite Index Ordering and Selectivity

The core idea: A composite index is one lexicographic order over several columns, so the leading columns determine which predicates can prune the search space early and whether the engine can stream rows in the order the query actually needs.

Today's "Aha!" Moment

In 09.md, Harbor Point learned that a covering index can remove expensive heap lookups for the market-open dashboard. That still leaves a harder question: if the desk query filters by issuer, status, and then wants the newest reservations first, what should the actual key order be? The index (issuer, status, submitted_at DESC) and the index (submitted_at DESC, issuer, status) contain the same columns, but they are not minor variants of the same structure. They are different physical orderings of the leaf pages.

That physical order is the whole lesson. A B-tree does not "understand" that all three columns matter equally. It compares the first column first, then the second only when the first ties, then the third only when the first two tie. If Harbor Point puts submitted_at first, the newest rows for every issuer are interleaved together. If it puts issuer first, every reservation for MUNI-77 sits in one contiguous slice, and the engine can jump directly into that slice before it starts reading timestamps.

The misleading heuristic to drop is "always put the most selective column first." Selectivity matters, but only in the context of actual predicates and ordering requirements. For Harbor Point's dashboard query, both issuer and status are equality predicates, so either can participate in the search prefix. The costly mistake is usually placing the range or sort column too early, or choosing a leading prefix that does not match the real workload the index is meant to serve.

Once that clicks, composite-index design stops looking like folklore and starts looking like a contract between the query shape and the storage layout. The next lesson, 11.md, will show how the planner tries to cost those competing contracts.

Why This Matters

Harbor Point has three recurring read paths during the first ten minutes of trading. The desk dashboard asks for "open reservations for one issuer, newest first, limit 50." The issuer drill-down asks for "all recent reservations for one issuer" regardless of status. Operations also has a cross-issuer screen that asks for "all open reservations, newest first." All three queries touch the same columns, but they do not need the same prefix.

If the team picks the wrong composite order, the symptoms are subtle. The plan still says "using index," but the engine may scan a much larger slice than expected, perform an extra sort, or abandon the index entirely once the working set no longer fits comfortably in memory. A benchmark over yesterday's warm cache can make the design look fine, while the live market-open workload later exposes long scans across low-cardinality prefixes such as status = 'open'.

That is why index ordering is a production decision rather than a syntactic detail. It determines which queries get a narrow seek, which queries inherit scan debt, and whether one generalized index is good enough or separate specialized indexes are justified. It also sets up the planner behavior discussed next: once multiple candidate orders exist, the optimizer has to estimate which one is cheapest for a given query.

Learning Objectives

By the end of this session, you will be able to:

Explain how composite key order becomes physical search order - Describe the leftmost-prefix rule and why trailing columns only help after leading columns have narrowed the scan.
Analyze selectivity in the context of real predicates - Distinguish between equality, range, and ordering requirements instead of applying "most selective first" as a universal rule.
Choose a defensible column order for a workload - Evaluate when one composite index can serve several queries and when a second specialized index is the cleaner production choice.

Core Concepts Explained

Concept 1: Composite indexes are lexicographic structures, not unordered bags of useful columns

Harbor Point's reservations table now has enough volume that it cannot afford to scan broadly at market open. The desk query is:

SELECT issuer, status, quantity, limit_price, submitted_at
FROM reservations
WHERE issuer = 'MUNI-77'
  AND status = 'open'
ORDER BY submitted_at DESC
LIMIT 50;

If the index key is (issuer, status, submitted_at DESC), the B-tree leaf entries are grouped first by issuer, then by status, then by time. Conceptually, a small slice of the tree looks like this:

("ALPHA-01", "open",   09:58) -> row
("ALPHA-01", "settled",09:41) -> row
("MUNI-77", "open",   10:05) -> row
("MUNI-77", "open",   10:04) -> row
("MUNI-77", "routed", 10:03) -> row
("ZETA-88", "open",   10:02) -> row

That layout gives the engine a narrow entry point. It can seek directly to the first ("MUNI-77", "open", ...) tuple and then walk downward in timestamp order until it has fifty rows. The search stays inside one contiguous band of leaf pages because the leading columns already partitioned the tree into the relevant issuer and status slice.

The leftmost-prefix rule falls straight out of that layout. The same index can also support queries on (issuer) alone, or (issuer, status), because those predicates still identify a contiguous prefix range. It is much less useful for WHERE status = 'open' without issuer, because status is not the left edge of the ordering. The tree is not grouped by status first, so the engine cannot jump to one compact "all open rows" region.

This is the first production trade-off: the leading prefix is a scarce design choice. Whatever you put first becomes the cheapest way to enter the tree. Columns placed later may still help, but only after the earlier columns have already shaped the search region.

Concept 2: Selectivity is only meaningful when you account for equality, range, and ordering together

Harbor Point's team initially argues that issuer should lead because it is far more selective than status: there are thousands of issuers but only a handful of statuses. That instinct is often directionally right, but on its own it is too crude. The query above has equality predicates on both issuer and status, then an ordering requirement on submitted_at. For that query shape, the most important structural decision is to keep the equality columns before the timestamp so the engine can land on one narrow prefix and then read rows in the requested order.

Compare three candidate indexes:

1. (issuer, status, submitted_at DESC)
2. (status, issuer, submitted_at DESC)
3. (submitted_at DESC, issuer, status)

Indexes 1 and 2 can both serve the dashboard query reasonably well because both equality columns are fixed before the ordered timestamp scan begins. Index 3 is the troublemaker. With time first, the tree is organized around "newest rows overall," not "rows for one issuer and status." The engine may still use the index, but it now has to inspect recent entries across many issuers until it accumulates fifty matches for MUNI-77. The order looks attractive for ORDER BY submitted_at DESC, yet it destroys the pruning power of the equality predicates.

The more interesting distinction is between 1 and 2, because that is where workload selectivity actually matters. If Harbor Point frequently runs WHERE issuer = ? ORDER BY submitted_at DESC LIMIT 50, then (issuer, status, submitted_at DESC) preserves a useful prefix for those issuer-only lookups. If operations mostly runs WHERE status = 'open' ORDER BY submitted_at DESC LIMIT 50, then (status, issuer, submitted_at DESC) gives that screen the better entry point. The correct order depends on which prefixes recur across the workload, not on a slogan about selectivity divorced from query shape.

Concept 3: Column order is a workload contract, so one composite index rarely serves every query equally well

Suppose Harbor Point keeps the covering strategy from 09.md and chooses:

CREATE INDEX idx_reservations_issuer_status_time
ON reservations (issuer, status, submitted_at DESC)
INCLUDE (quantity, limit_price);

This is an excellent fit for the desk dashboard and a decent fit for issuer drill-downs. It is a poor fit for the cross-issuer operations screen because status is buried behind issuer. The team now has three options: accept that the operations query is second-class, reorder the index to favor operations instead, or add a second specialized index such as (status, submitted_at DESC).

There is no abstract "best" answer because each option spends resources differently. A second index improves plan quality for another slice of the workload, but it adds write amplification, WAL volume, cache pressure, and vacuum or purge work. Reordering the first index helps one audience while degrading another. Leaving one query under-served might be fine if that query is rare and non-interactive. Composite-index design is therefore a workload contract: decide which access paths deserve to be cheap, then make that decision explicit in the physical order of the tree.

This is also where data skew starts to matter. If status = 'open' matches ninety-five percent of the table during market open, then a status-led index can look plausible on paper and still scan a huge fraction of the hot tree in production. The planner will eventually try to account for that with statistics and cost models, but the structural limitation comes first. A low-cardinality leading column cannot create a narrow prefix if most rows share the same value.

Troubleshooting

Issue: The query uses a composite index but still performs an expensive sort or reads far more rows than expected.

Why it happens / is confusing: Teams often notice that the right columns are "in the index" and assume the physical order must therefore be correct. The problem is usually that the ordered or range column appears too early, so the engine cannot exploit the equality predicates to prune first and stream rows second.

Clarification / Fix: Compare the query's equality predicates, range predicates, and ORDER BY against the index prefix from left to right. For top-N queries like Harbor Point's dashboard, place the equality columns before the ordered timestamp so the scan can start inside one narrow slice.

Issue: Engineers keep debating whether issuer or status should come first, and both sides have benchmarks showing small wins.

Why it happens / is confusing: For one equality-equality query, both orders may perform similarly enough to hide the real question. The difference usually shows up in the additional queries that can reuse the prefix, not in the headline query alone.

Clarification / Fix: Inventory the real workload. If issuer-only lookups are common, favor an issuer-led prefix. If status-led screens are latency-critical and frequent, either lead with status or create a separate status-focused index instead of forcing one compromise tree to serve both audiences poorly.

Issue: An index that looked sensible six months ago now scans too much data at market open.

Why it happens / is confusing: Selectivity is not static. Data growth and workload changes can turn a once-useful prefix into a low-pruning one, especially when the leading column has few distinct values or heavy skew.

Clarification / Fix: Re-evaluate the actual cardinalities and query mix. If the leading prefix no longer narrows the search effectively, redesign the composite order or split the workload into separate indexes before blaming the planner alone.

Advanced Connections

Connection 1: 09.md separates "which leaf range is cheap to reach" from "whether the leaf range already contains enough data"

Composite ordering decides how the engine enters and walks the tree. Covering decides whether the engine can answer the query from those leaf pages alone or still has to bounce back to the base rows. You need both decisions to match the same workload, not just one of them.

Connection 2: 11.md explains why a structurally good index can still be ignored

Once Harbor Point has multiple plausible orders, the optimizer has to estimate cardinality, sort cost, and page reads. Good structure narrows the candidate plans; the cost model determines which of those candidates the engine believes is cheapest for one specific query instance.

Resources

Optional Deepening Resources

[DOC] PostgreSQL Documentation: Multicolumn Indexes
- Focus: How the leftmost-prefix rule affects filtering and why multicolumn order changes which predicates are usable.
[DOC] MySQL 8.4 Reference Manual: Multiple-Column Indexes
- Focus: How InnoDB uses composite keys, including the interaction between index order and matching predicates.
[DOC] SQLite Query Planner: Multi-Column Indices
- Focus: A compact visual explanation of prefix narrowing and why index order also affects sorting work.
[GUIDE] Use The Index, Luke!: Concatenated Keys
- Focus: Query-pattern-driven advice for ordering equality, range, and sort columns in practice.

Key Insights

Column order is physical order - A composite index groups rows by its leading columns first, so those columns determine where the engine can seek cheaply.
Selectivity is conditional, not absolute - Equality predicates, range predicates, and ordering requirements matter together; "most selective first" is only a heuristic, not a law.
Every composite index chooses winners - One order makes some prefixes cheap and leaves others expensive, which is why workload analysis matters more than generic indexing folklore.

← Back to Database Engine Internals and Implementation

← Back to Learning Hub