Page Layouts and Slotted Pages

LESSON

Database Engine Internals and Implementation

009 30 min advanced

Day 386: Page Layouts and Slotted Pages

The core idea: A slotted page separates a record's stable identity from its byte position, so a database can update, delete, and compact tuples inside a fixed-size page without breaking references from indexes or higher layers.

Today's "Aha!" Moment

In 01.md, the page became the DBMS's basic unit of I/O, caching, and recovery. That still leaves a harder question: once an 8 KB page is in memory, how do rows actually fit inside it? Harbor Point's municipal_quotes table makes the pressure obvious. Some rows only carry CUSIP, bid, ask, and timestamp. Others also carry a longer desk note explaining why the quote was widened during a storm-driven selloff. If the engine stored those rows as a simple packed byte array, a longer update to one row could force every later row to move.

That becomes a correctness problem, not just a packing problem. Higher layers want stable record references such as "page 812, slot 7." Index entries, lock targets, and recovery records cannot afford to mean "whatever happens to start at byte 3184 right now." A slotted page solves this by putting a small directory near the front of the page and letting each slot entry point to the current location of a tuple body elsewhere in the page.

The important mental shift is that the slot number, not the byte offset, is the durable name of the record within the page. Tuple bytes may move during compaction or after an update that changes size. The slot entry is what stays stable enough for the rest of the engine to follow.

That is why page layout sits between high-level storage architecture and the next lesson on 03.md. Before you can reason about null bitmaps, varlen headers, or alignment, you need the physical contract that says where a record lives, how free space is tracked, and which identifiers are allowed to remain stable while bytes move.

Why This Matters

Suppose Harbor Point ingests a new wave of municipal quotes while analysts keep querying the freshest page range. Every insert, delete, and quote revision happens inside fixed-size pages. If the page layout cannot absorb variable-length rows and local reshuffling, the engine pays for that weakness everywhere else: more page splits, more index maintenance, more write-ahead log traffic, and harder crash recovery.

A slotted-page design localizes the damage. Adding a record usually means carving tuple bytes from one side of the free space region and adding a slot entry on the other side. Deleting or moving a record means updating metadata in the slot directory rather than rewriting every external reference. The page is still a constrained physical unit, but its internal bookkeeping is now flexible enough to survive realistic workloads.

Production teams feel this directly. Page fragmentation changes how full a relation really is. Free-space accounting determines whether inserts can reuse old pages or must allocate new ones. Compaction policies change write amplification and vacuum pressure. If you treat page layout as an implementation detail, those behaviors show up later as mysterious bloat, tail-latency spikes, or "why did this small update touch so much data?" incidents.

Learning Objectives

By the end of this session, you will be able to:

  1. Explain the anatomy of a slotted page - Identify the roles of the page header, slot directory, free space region, and tuple bodies.
  2. Trace how inserts, deletes, updates, and compaction work - Describe which bytes move, which identifiers stay stable, and why that distinction matters.
  3. Evaluate the trade-offs of slotted pages in production - Connect indirection, fragmentation, and local page maintenance to broader storage-engine behavior.

Core Concepts Explained

Concept 1: A slotted page gives the page an internal naming system

Return to Harbor Point's municipal_quotes page. The buffer manager has already pulled the page into memory because the executor needs several recent quotes. Inside that page, the engine needs more structure than "records laid out one after another." It needs a way to refer to a record even if the bytes later move.

A typical slotted page uses four regions: a header, a slot directory, a free-space gap, and the tuple bodies themselves. The header stores page-level facts such as checksum or LSN, plus free-space boundaries. The slot directory grows upward from the front of the page. Each slot entry stores metadata such as the tuple's offset, length, and status. The tuple bodies grow downward from the end of the page. Free space sits between the two.

low addresses
+-------------------------------+
| page header                   |
| lower = 96, upper = 8000      |
+-------------------------------+
| slot 1 -> off 8128 len 40     |
| slot 2 -> off 8080 len 48     |
| slot 3 -> DEAD                |
| slot 4 -> off 8000 len 80     |
+-------------------------------+  <- lower
|           free space          |
+-------------------------------+  <- upper
| tuple 4 bytes                 |
| tuple 2 bytes                 |
| tuple 1 bytes                 |
+-------------------------------+
high addresses

The key idea is that slot 4 is the stable handle. If tuple 4 later moves from offset 8000 to offset 7920 during compaction, the rest of the system can still refer to the same record by the same page-and-slot identifier. Only the slot entry changes. This is the local indirection layer that makes page maintenance possible.

The trade-off is small but fundamental. Every record now costs some slot metadata and one extra lookup step inside the page. In exchange, the engine gains the freedom to reorganize payload bytes without forcing every higher layer to rewrite its references.

Concept 2: Inserts, updates, and deletes are page-maintenance operations, not just row operations

Now imagine that Harbor Point receives another quote for a bond already on the page, but the new version carries a longer desk note because liquidity is deteriorating. On insert, the engine checks whether the page has enough space for both the tuple body and, if needed, a new slot entry. If it does, the tuple bytes are written near the upper boundary, upper moves downward, the slot entry is filled in, and lower moves upward if a new slot was added.

A delete is usually lazier than newcomers expect. The engine often marks the slot or tuple as deleted, dead, or reusable rather than immediately sliding every later tuple forward. That choice protects correctness and reduces short-term work. Concurrent readers, MVCC visibility rules, or recovery logic may still need to reason about the old entry. Immediate compaction would turn every delete into a much larger rewrite.

An update depends on whether the new tuple version still fits cleanly. If the new bytes fit in place, the engine may overwrite the payload directly. If the tuple grows, the engine may move the payload to a different position in the same page and update the slot offset, or it may place a new version elsewhere and leave a redirect or dead entry behind, depending on the engine's concurrency model. The stable part is still the slot-level identity, not the old byte position.

That distinction is why slotted pages are so common. A logical "row update" is really a physical page-maintenance operation governed by free space, visibility rules, and the cost of moving bytes. The page layout decides how much of that maintenance stays local and how much spills outward into index churn or page allocation.

Concept 3: Fragmentation is the price of flexibility, and compaction is how the engine buys space back

After enough quote updates and deletions, Harbor Point's page starts to look untidy. Slot 3 is dead, tuple 2 was shortened, and tuple 4 moved. Total free bytes may look healthy, but they can be scattered across several holes. A new 120-byte tuple cannot be placed unless there is one sufficiently large contiguous region or the engine compacts the page.

Compaction rewrites the live tuple bodies into a tighter arrangement, typically packing them back toward one end of the page and updating slot offsets as it goes. Because the slots remain the stable names, the compaction can restore contiguous free space without changing the page-and-slot identifiers used elsewhere. That is the hidden payoff of the design.

But compaction is not free. It consumes CPU, dirties the page, and can increase WAL volume because many bytes and metadata fields change at once. Engines therefore make policy choices about when to compact eagerly, when to defer cleanup to background work, and when to leave some fragmentation in place because the page is not yet under enough pressure to justify the rewrite.

This is also the handoff to 03.md. Once you understand how a page tracks slots and free space, the next question is why one tuple body is 40 bytes while another is 120. That answer lives inside the record format: null bitmaps, length prefixes, alignment, and variable-length attributes.

Troubleshooting

Issue: A page appears to have free bytes, but inserts still spill to another page.

Why it happens / is confusing: Slotted pages need contiguous space for the tuple body and room in the slot directory. Scattered holes from deletes do not automatically count as usable insertion space.

Clarification / Fix: Inspect contiguous free space and slot availability, not just total free bytes. If fragmentation is the limit, compaction, fillfactor tuning, or background cleanup may recover the page.

Issue: After defragmentation, engineers expect all existing record references to become invalid.

Why it happens / is confusing: They are thinking in raw offsets rather than slots. Compaction changes tuple locations, but the slot directory can preserve the logical record identifier within the page.

Clarification / Fix: Verify what higher layers actually reference. In slotted-page engines, indexes or heap pointers generally target (page_id, slot_id) or an equivalent indirection, not a byte offset.

Issue: Deleting rows does not immediately shrink the table file.

Why it happens / is confusing: Reclaiming space inside a page is different from returning pages to the filesystem. A dead tuple can make space reusable for future inserts without reducing file size.

Clarification / Fix: Distinguish page-local reuse, relation-level free-space tracking, and file truncation. Vacuum or cleanup may make space reusable long before storage on disk becomes smaller.

Advanced Connections

Connection 1: Slotted pages <-> buffer pools and dirty-page policy

The buffer manager only knows that page 812 became dirty; it does not flush "just slot 4." That means frequent tuple movement inside slotted pages can affect eviction pressure and write amplification. PostgreSQL's line pointers and page pd_lower and pd_upper fields are a concrete example of how page layout and buffer management meet.

Connection 2: Slotted pages <-> B-tree pages and record formats

Heap pages, B-tree nodes, and SQLite B-tree pages all use variants of slot or cell directories because internal references must survive local byte movement. The exact bytes inside each tuple or cell are the next problem, which is why 03.md follows immediately.

Resources

Key Insights

  1. A page layout is a contract about identity and movement - The engine needs a stable way to name records even when their bytes are reorganized.
  2. Slots make local rewrites safe - Updates, deletes, and compaction can stay page-local because external references target slot identities rather than raw offsets.
  3. Flexibility creates fragmentation pressure - Slotted pages trade simple packed storage for indirection, cleanup work, and free-space policy decisions.
PREVIOUS DBMS Architecture and Storage Hierarchy NEXT Record Formats and Variable-Length Columns

← Back to Database Engine Internals and Implementation

← Back to Learning Hub