LESSON
Day 392: B-Tree Insert, Split, and Merge Maintenance
The core idea: A B-tree stays searchable only because every insert and delete repairs page boundaries locally, so parent separators, sibling links, and durable page images continue to describe the same key ranges after the write lands.
Today's "Aha!" Moment
In 07.md, Harbor Point's municipal-bond index was a clean routing structure: each internal page described a key interval, and one root-to-leaf walk found the right quote range quickly. That picture is true only between writes. The hard part is not searching a stable tree; it is changing the tree while traders are still inserting new quotes and retention jobs are deleting old ones.
Consider Harbor Point during a rate announcement. Quotes for a handful of hot CUSIPs arrive in bursts, all targeting the same small region of the index on (cusip, quote_ts DESC). When the target leaf has room, the insert is just a page-local shuffle. When it does not, the engine has to split the leaf, choose a separator that cleanly divides the key space, update the parent, and sometimes repeat that work all the way to the root. Every one of those steps is structural maintenance, not optional optimization.
Deletes create the mirror image. Harbor Point's nightly retention job removes expired quotes in timestamp order. Leaves that were once packed can become sparse, which wastes space and deepens the tree unless the engine redistributes entries with a sibling or merges pages entirely. The main realization is that insert, split, redistribution, and merge are all ways of preserving the search contract from the previous lesson. The next lesson on 09.md matters because B-trees pay that maintenance cost on the write path, while LSM trees move the same maintenance burden into flushes and compaction.
Why This Matters
Harbor Point has two conflicting goals. During market hours, the trading desk wants the newest quote for a bond with very low tail latency. Overnight, the platform wants to purge expired rows without letting the index become bloated. A B-tree can serve both needs, but only if structural changes stay local, correct, and cheap enough that the write path does not collapse under hot-key pressure.
This is where many production mistakes happen. Teams often observe a symptom such as rising write latency or index bloat and reach for a tuning knob without asking which structural maintenance path is dominating. If leaf pages keep splitting on one hot range, the problem is very different from a tree that is too sparse because delete-heavy workloads never trigger reclamation effectively. Knowing the maintenance mechanics lets you tell apart healthy rebalancing, pathological churn, and outright correctness risk.
Learning Objectives
By the end of this session, you will be able to:
- Trace a B-tree insert path - Explain when an insert stays inside one leaf and when it must trigger a split and parent update.
- Explain split and merge mechanics - Follow how separator keys, sibling relationships, and tree height change during structural modification.
- Reason about production consequences - Connect page occupancy, latch pressure, write amplification, and index bloat to real workload behavior.
Core Concepts Explained
Concept 1: Most inserts are local page edits until free space runs out
Harbor Point's index leaf for CUSIP 13042AB7 stores recent quotes in descending timestamp order. When a new quote arrives, the engine follows the search path from 07.md to the target leaf and checks whether the leaf still has enough free space for one more entry. If it does, the structural work is limited: shift a suffix of entries, insert the new key in sorted position, update slot metadata, mark the page dirty, and eventually log the change for recovery.
That sounds simple, but even the "cheap" case depends on an invariant: after insertion, the page must still contain a sorted subset of the same key interval the parent expects it to cover. The parent separator does not change because the leaf's boundary did not move. This is why engines try hard to keep some free space on pages with fill-factor settings or page compaction. A little slack turns many writes into local edits instead of structural events.
For Harbor Point, that distinction shows up directly in latency. A page-local insert usually means one hot leaf latch, one WAL record, and no parent traffic. The write still pays durability cost, but it does not force a chain reaction through upper levels of the tree. On a workload with many distinct CUSIPs, most inserts look like this. On a workload concentrated on a few hot bonds, the free space for those leaves disappears quickly, and the "local edit" fast path stops being the common case.
The trade-off is straightforward: leaving more free space delays splits and reduces contention spikes, but it also lowers leaf density and makes the index larger. B-tree tuning is therefore not just about read depth. It is about deciding how much spare capacity to reserve so that the write path remains smooth under the expected skew.
Concept 2: A split rewrites key boundaries and may propagate upward
When Harbor Point receives another quote for a full leaf, the engine can no longer preserve sorted order by shuffling entries in place. It allocates a new sibling page, redistributes the leaf's entries across the old and new pages, and picks a separator key that tells the parent where the boundary now lies.
An abstract split looks like this:
before
parent: [... | 13042AB7@09:34 | ...]
|
v
leaf: [09:37 09:36 09:35 09:34 09:33 09:32] (full)
after
parent: [... | 13042AB7@09:35 | ...]
/ \
v v
left leaf: [09:37 09:36 09:35]
right leaf: [09:34 09:33 09:32]
The exact promoted separator depends on the engine's B-tree variant, but the rule is constant: every key in the left child must compare on one side of the separator, and every key in the right child must compare on the other. If Harbor Point inserts many quotes into the same hot range, the parent page that receives new separators can fill too. Then the split cascades upward. In the extreme case, the old root splits and the tree grows by one level under a newly created root.
This is where search correctness and concurrency meet. Readers descending the tree must never observe a parent boundary that points to the wrong child interval, even while the split is happening. Engines use careful latch ordering, WAL logging, and page-link conventions so that a reader can either see the old valid structure or the new valid structure, but not a broken hybrid. The split path is therefore one of the main places where B-tree implementations earn their reputation for subtlety.
For operations, the split mechanism explains why write amplification in a B-tree is bursty rather than uniform. Most inserts touch one leaf; a minority suddenly touch the leaf, its parent, possibly several ancestors, and the log. That burstiness is exactly the pressure that motivates the comparison with LSM trees in 09.md: B-trees keep data ordered in place, so they occasionally pay expensive structural maintenance on the critical write path.
Concept 3: Deletes can redistribute or merge pages, but engines are conservative for good reasons
Harbor Point's nightly retention job deletes quotes older than the compliance window. A delete usually starts as a local change, just like an insert: find the leaf entry, remove it, compact the page if needed, and preserve sorted order among what remains. The harder question is what to do when many deletes leave a leaf half empty or nearly empty.
One option is redistribution. If the left leaf is sparse and its right sibling still has slack, the engine can move a few entries across the boundary and update the parent separator so both pages remain reasonably full. Redistribution keeps the tree height stable and avoids deallocating pages immediately. Another option is merge: if two siblings together fit comfortably in one page, move all surviving entries into one page, delete the other page, and remove the redundant separator from the parent.
Those operations sound like the obvious mirror image of splitting, but production systems often apply them more cautiously than textbook descriptions suggest. Merge-heavy delete paths can cause oscillation when a workload alternates between removing and reinserting keys in the same range. Harbor Point would feel that if overnight retention merged pages aggressively and the morning quote burst forced the same range to split again. Some engines therefore tolerate underfull pages for a while, preferring mild space inefficiency over repeated churn.
The practical trade-off is between space discipline and write stability. Aggressive merging shrinks the tree and improves cache density, but it adds foreground work and can increase contention on hot ranges. Conservative merging avoids thrash, but it leaves behind bloat that raises scan cost and storage footprint. Once you understand that tension, index maintenance metrics such as split rate, page occupancy, and deleted-page reuse stop looking like obscure internals and start looking like direct workload signals.
Troubleshooting
Issue: Insert latency spikes even though the average insert rate has not changed much.
Why it happens / is confusing: A small number of hot key ranges can keep hitting full leaves, so the system pays repeated split cascades while the global workload still looks moderate.
Clarification / Fix: Measure split frequency by index and by key range, not just total write volume. Fill factor changes, workload partitioning, or a different key design can matter more than raw hardware.
Issue: The index keeps growing after large delete jobs.
Why it happens / is confusing: Deleting rows does not guarantee immediate page reclamation. Underfull pages may remain in place if the engine chooses not to merge them aggressively.
Clarification / Fix: Distinguish logical deletion from structural compaction. Inspect page occupancy and the engine's maintenance policy before assuming the purge job is ineffective.
Issue: A correctness bug appears only during crash recovery or high concurrency.
Why it happens / is confusing: Split and merge paths update multiple related pages, so ordering mistakes may stay invisible until recovery replays the log or readers race with a structural change.
Clarification / Fix: Audit the invariant boundaries explicitly: child key ranges, parent separators, sibling links, and WAL ordering. Structural bugs are rarely fixed by retry logic alone.
Advanced Connections
Connection 1: B-tree maintenance ↔ WAL and crash recovery
Page splits and merges are multi-page updates, so their correctness depends on logging enough information for recovery to rebuild the same valid tree shape after a crash. That is why upcoming lessons on WAL ordering and redo logic are not separate concerns; they are the durability half of structural maintenance.
Connection 2: B-tree maintenance ↔ LSM-tree write design
Harbor Point's hot-leaf split problem illustrates the central B-tree trade-off: writes preserve sorted structure immediately. LSM trees relax that requirement by buffering writes in memory and restoring global order later during flush and compaction. The next lesson on 09.md is easier to understand once this B-tree maintenance cost is concrete.
Resources
Optional Deepening Resources
- [DOC] PostgreSQL Documentation: B-Tree Indexes
- Focus: Read the sections on page splits, deduplication, and implementation behavior to see how a production engine preserves B-tree invariants under concurrent writes.
- [DOC] SQLite File Format: B-Tree Pages
- Focus: Inspect how interior and leaf pages are laid out on disk so split and merge discussion stays tied to concrete page structures.
- [BOOK] Database Internals by Alex Petrov
- Focus: Use the chapters on B-trees and storage-engine trade-offs to compare local page maintenance with log-structured alternatives.
Key Insights
- A cheap B-tree insert is a page-local event - The write becomes expensive only when free space is exhausted and the page boundary has to move.
- Splits repair the routing contract, not just space pressure - Promoted separators and sibling pages exist so every root-to-leaf search still reaches the correct key interval.
- Delete handling is a policy choice as much as a data-structure rule - Merge aggressively and you risk churn; merge cautiously and you accept temporary bloat.