LESSON
Day 274: Storage Engine Foundations: Pages, Records, and Layout
A storage engine does not really store "rows." It stores bytes inside pages, and the way those bytes are arranged quietly determines read cost, update cost, fragmentation, and recovery complexity.
Today's "Aha!" Moment
The insight: Once data leaves the logical schema and hits disk or durable storage, the engine stops thinking in terms of "table with columns" and starts thinking in terms of:
- fixed-size pages
- record headers and offsets
- free space management
- pointers or slot directories
- movement, deletion, and reuse
That physical layout is the hidden reason later topics like indexes, WAL, MVCC, and query execution behave the way they do.
Why this matters: Developers often speak about rows as if they exist directly on disk. But an engine has to answer harder questions:
- how much do we read at once?
- where inside a page does a record live?
- how do we find it if nearby records move?
- how do variable-length columns fit without rewriting everything?
- how do we update one record without destroying locality or recovery correctness?
Those are page-layout problems, not abstract schema problems.
The universal pattern:
- storage is organized into pages
- pages contain record metadata plus payload
- record movement and free space require indirection or bookkeeping
- every later access path is built on top of these page-level decisions
Concrete anchor: A customers table may look like simple rows in SQL, but the engine may actually store each row inside an 8 KB page, with a header, null bitmap, variable-length offsets, and a slot directory. A single update to a long text field may no longer fit in the original position, forcing page reorganization or redirection.
How to recognize when this applies:
- You care about why point reads, scans, or updates cost what they cost.
- You need to understand fragmentation, vacuuming, overflow pages, or page splits later.
- You want to reason about how indexes point to data physically.
Common misconceptions:
- [INCORRECT] "A row is just stored contiguously exactly as the SQL table suggests."
- [INCORRECT] "Page layout is an implementation detail with little effect on performance."
- [CORRECT] The truth: Page and record layout are first-order performance and correctness decisions because they shape locality, write amplification, indirection, and recovery behavior.
Real-world examples:
- Short fixed-width records: Cheap to pack densely and scan predictably.
- Wide or variable-length records: Harder to update in place and more likely to cause fragmentation or overflow handling.
Why This Matters
The problem: Logical data models describe what the data means, but storage engines must decide how it is physically placed. Poor physical layout decisions create hidden costs:
- extra I/O
- more page splits or rewrites
- unstable record addresses
- poor cache locality
- complicated free-space reuse
- harder crash recovery bookkeeping
Before:
- Rows are imagined as abstract objects that happen to live on disk.
- Updates are assumed to touch only the changed field.
- Performance problems are blamed on queries alone, not on physical layout.
After:
- Pages are understood as the real unit of storage and I/O.
- Record layout is treated as a design choice affecting update and scan cost.
- Later lessons on B-Trees, WAL, buffer pools, and MVCC have a clear physical foundation.
Real-world impact: This mental model helps explain why engines make trade-offs like slotted pages, indirection, append-friendly structures, or overflow storage. It also prevents hand-wavy explanations of "database performance" that ignore the actual storage substrate.
Learning Objectives
By the end of this session, you will be able to:
- Explain why storage engines work in pages rather than logical rows - Understand page size as the basic unit of durable I/O and caching.
- Describe how records are physically organized inside pages - Reason about headers, slot directories, fixed vs variable-length fields, and free space.
- Evaluate layout trade-offs - Recognize how page organization affects locality, updates, fragmentation, and future index design.
Core Concepts Explained
Concept 1: Pages Are the Real Unit of I/O and Caching
Storage engines do not typically read one row at a time from disk. They read and write in larger chunks called pages.
A page is usually:
- fixed-size
- aligned with the engine's buffering strategy
- the unit the buffer pool caches and the WAL/recovery logic reasons about
Typical sizes might be:
- 4 KB
- 8 KB
- 16 KB
depending on the engine.
Why pages?
Because storage devices and operating systems reward chunked access much more than tiny random byte reads. Also, the engine needs a stable unit for:
- caching
- eviction
- dirty tracking
- flushing
- latching
- recovery bookkeeping
So the first deep shift is:
- the engine does not think "load row 42"
- it thinks "bring page P into memory, then find the record inside it"
That means all higher-level access costs are downstream of page behavior:
- point lookup cost
- scan cost
- update cost
- cache hit rate
This is also why later lessons on buffer pools and indexes will care so much about locality.
Concept 2: Records Need Layout Metadata, Not Just Payload Bytes
Once a page is loaded, the engine still needs to locate records inside it.
That is why pages usually include some combination of:
- page header
- slot directory or item pointer array
- free space region
- record bodies
One common pattern is the slotted page:
- the page header tracks metadata
- records are stored somewhere in the page body
- a slot directory maps logical positions to physical offsets
This indirection is valuable because records may move as the page changes.
Why movement matters:
- deletes create holes
- variable-length updates may no longer fit in place
- compaction may repack the page
If external code or indexes pointed directly to the byte offset of every record, any compaction would break them. Slot indirection makes layout more flexible.
Records themselves also need internal metadata, such as:
- null bitmap
- column offsets
- length info for variable-width fields
- visibility or version metadata in some engines
So a "row" on disk is usually:
- header + metadata + encoded field values
not a naive concatenation of columns.
This becomes especially important once fields are:
- nullable
- variable length
- compressed
- versioned
Concept 3: Layout Trade-offs Shape Updates, Fragmentation, and Future Access Paths
Physical layout is always a trade-off.
If records are packed tightly:
- scans may be faster
- cache locality may improve
- but updates may require movement or page reorganization
If the engine leaves extra space:
- updates may fit more often in place
- but storage density drops
Variable-length columns make this harder:
- a short update may fit
- a long update may force the record to move
- moved records create indirection, overflow, or page churn
Deletes create another pressure:
- reclaim free space immediately and pay maintenance cost
- or leave fragmentation and clean up later
This is why layout decisions ripple outward into later database behavior:
- B-Trees depend on stable record references or tuple IDs
- WAL and recovery need to describe page changes correctly
- MVCC may add version chains or visibility metadata to record layout
- vacuum/compaction exist because page-level free space degrades over time
So this lesson is not just about "how bytes are arranged." It is the physical base layer under:
- index maintenance
- update amplification
- recovery design
- buffer-pool efficiency
Once that clicks, later lessons stop feeling like separate topics. They become consequences of one physical model.
Troubleshooting
Issue: "Why does updating one field sometimes seem much more expensive than expected?"
Why it happens / is confusing: At the logical level it looks like a tiny change.
Clarification / Fix: The updated record may no longer fit in its current physical space. Check whether variable-length growth, page compaction, overflow handling, or page splits are involved.
Issue: "Why do databases care so much about page size?"
Why it happens / is confusing: SQL abstractions hide the page boundary.
Clarification / Fix: Page size affects I/O granularity, cache efficiency, scan behavior, and how many records or index entries fit together physically. It is a foundational design knob, not arbitrary trivia.
Issue: "Why can deletes and updates make performance worse over time even when row count is stable?"
Why it happens / is confusing: The logical table size may look unchanged.
Clarification / Fix: Physical pages may be accumulating holes, overflow chains, or fragmented free space. The engine may need vacuuming, compaction, or page reorganization to recover locality.
Advanced Connections
Connection 1: Pages and Layout <-> B-Tree Indexes
The parallel: The next lesson will show how B-Tree nodes are pages too. Index design is not separate from page layout; it is another use of the same physical storage unit, optimized for navigation rather than row payload.
Real-world case: Leaf entries and internal nodes both live inside pages, so fanout, split behavior, and pointer stability all inherit page-layout constraints.
Connection 2: Pages and Layout <-> WAL and Recovery
The parallel: Later lessons on WAL and crash recovery depend on page-level reasoning. Recovery replays changes against pages, not against abstract rows in the SQL sense.
Real-world case: Correct redo/undo logic requires stable page IDs, change ordering, and careful tracking of how records physically changed inside a page.
Resources
Optional Deepening Resources
- [BOOK] Database Internals
- Link: https://www.databass.dev/
- Focus: Use it for practical explanations of slotted pages, record layout, buffer management, and storage-engine trade-offs.
- [DOCS] PostgreSQL Documentation: Database Physical Storage
- Link: https://www.postgresql.org/docs/current/storage.html
- Focus: A useful real-world reference for pages, tuples, free space, and the physical view below SQL tables.
- [DOCS] SQLite File Format
- Link: https://www.sqlite.org/fileformat.html
- Focus: Great for seeing a concrete page-based storage format in detail.
- [BOOK] Designing Data-Intensive Applications
- Link: https://dataintensive.net/
- Focus: Helpful for tying physical storage choices back to higher-level database trade-offs.
Key Insights
- Pages are the true storage unit - Engines optimize around page-sized I/O, caching, and recovery, not around abstract logical rows.
- Records need metadata and indirection - Variable lengths, deletes, movement, and free-space reuse force structure beyond simple field concatenation.
- Physical layout drives future behavior - Indexes, WAL, MVCC, fragmentation, and scan cost all inherit the consequences of page and record organization.