B-Tree Index Internals and Access Paths

LESSON

Database Engine Internals and Implementation

003 30 min intermediate

Day 275: B-Tree Index Internals and Access Paths

A B-Tree is really a strategy for asking as few pages as possible to answer an ordered lookup.


Today's "Aha!" Moment

The insight: A B-Tree is not just "a sorted structure." It is a page-oriented navigation system designed to keep the tree shallow, preserve key order, and make point lookups and range scans efficient on real storage.

Why this matters: Teams often reduce indexes to a simple rule like "indexes make queries faster." That misses the important part: which queries, why, and at what maintenance cost. A B-Tree buys fast ordered access because it carefully uses page fanout and sorted keys to avoid scanning the whole table.

The universal pattern:

Concrete anchor: A users table has 50 million rows. Without an index on email, checking whether ana@example.com exists may require scanning huge portions of the table. With a B-Tree, the engine navigates a few internal pages, lands on the relevant leaf page, and then finds the exact key or the first matching range in sorted order.

How to recognize when this applies:

Common misconceptions:

Real-world examples:

  1. Point lookup: Find one user by primary key or unique email.
  2. Range scan: Read all orders from one day in key order using an index on created_at.

Why This Matters

The problem: Tables can grow large enough that "just scan it" becomes too expensive for many common access patterns. But indexes are not free either. They duplicate key information, consume pages, and make inserts, updates, and deletes more expensive because the navigation structure must stay ordered.

Before:

After:

Real-world impact: This understanding prevents cargo-cult indexing and makes later lessons on engine trade-offs much clearer, especially when comparing B-Trees with LSM-based designs.


Learning Objectives

By the end of this session, you will be able to:

  1. Explain why B-Trees are page-oriented rather than binary - Understand fanout and shallow height as the core performance win.
  2. Describe how lookups and range scans use the tree - Reason about internal nodes, leaf nodes, and sorted access paths.
  3. Evaluate B-Tree trade-offs - Recognize why ordered reads are fast and why writes pay maintenance cost.

Core Concepts Explained

Concept 1: B-Trees Win by Maximizing Fanout, Not by Being "Tree-Shaped"

If you stored keys in a naive binary tree on disk, each comparison might require another pointer chase and another page read. That would be terrible.

B-Trees solve this by making each node wide enough to hold:

inside one page.

That means one page read buys many routing decisions at once.

So instead of a tall skinny tree, you get:

This is the real performance trick.

Suppose an internal page can hold hundreds of keys and child pointers. Then each level narrows the search space enormously, and the full path from:

may only take a handful of page touches, even for large tables.

That is why B-Trees are designed around page layout:

all influence the fanout and therefore the height.

So the key mental model is:

not merely:

Concept 2: Leaves Store Ordered Access Paths, Which Is Why Range Queries Work

The B-Tree is valuable not only because it finds one key quickly, but because it preserves order.

That order is what makes:

so useful with a B-Tree.

The lookup pattern is usually:

  1. start at the root
  2. compare the target key against separator keys
  3. follow the chosen child pointer
  4. repeat until the leaf
  5. inspect the leaf entry or scan forward through nearby leaf entries

For point lookup, the engine stops at the exact match or the correct leaf location.

For range access, the engine finds the first qualifying leaf entry and then continues in sorted order.

This is why B-Trees are such a strong default for:

It is also why they are less helpful when the predicate does not align with key order.

For example, an index on (last_name, first_name) helps:

but not necessarily:

because the index order is not arranged around that second column alone.

So access-path thinking matters as much as tree structure.

Concept 3: Fast Ordered Reads Are Bought With Write Maintenance

Every insert, delete, or key-changing update has to preserve B-Tree invariants:

That means writes may cause:

This is the central trade-off:

The cost gets more visible when:

This also explains why engines sometimes separate:

The table may hold full row payloads, while the B-Tree holds keys plus references. That keeps navigation compact enough to preserve high fanout.

And it sets up the next lesson cleanly:


Troubleshooting

Issue: "We added an index, but the query is still slow."

Why it happens / is confusing: Teams assume every predicate benefits from every index.

Clarification / Fix: Check whether the predicate matches the index key order and selectivity. A B-Tree only helps when the access pattern aligns with its sorted path.

Issue: "Writes became slower after we added several indexes."

Why it happens / is confusing: The read improvement is visible, but index maintenance cost is less obvious.

Clarification / Fix: Remember that every indexed write may now update multiple ordered structures, potentially causing page splits, extra WAL, and more cache churn.

Issue: "Why are B-Trees usually shallow even on very large tables?"

Why it happens / is confusing: People imagine binary trees with many levels.

Clarification / Fix: Each page can hold many keys and child pointers, so fanout is large and height stays low. That is the main reason the structure works well on storage devices.


Advanced Connections

Connection 1: B-Trees <-> Pages and Record Layout

The parallel: The previous lesson showed that storage engines think in pages. B-Tree nodes are just specially organized pages optimized for navigation rather than plain row storage.

Real-world case: Node fanout, split cost, and pointer layout are direct consequences of how much a page can hold.

Connection 2: B-Trees <-> LSM Trees

The parallel: This lesson explains the classic ordered-read, in-place-maintenance design. The next lesson will contrast it with LSM trees, which accept heavier read/compaction trade-offs to make writes cheaper and more sequential.

Real-world case: Choosing B-Tree versus LSM is often choosing where you want to pay the maintenance bill: incrementally during writes or later during compaction and reads.


Resources

Optional Deepening Resources


Key Insights

  1. B-Trees optimize page hops, not just comparisons - Wide nodes keep the tree shallow, which is what makes large indexes practical on storage.
  2. Sorted leaves are what make range access powerful - Equality and ordered scans both come from preserving key order through the tree.
  3. Read speed is bought with write maintenance - Splits, ordered inserts, and multi-index upkeep are the price of fast navigable access paths.

PREVIOUS Storage Engine Foundations: Pages, Records, and Layout NEXT LSM Trees, SSTables, and Compaction Trade-offs

← Back to Database Engine Internals and Implementation

← Back to Learning Hub