LESSON

Database Engine Internals and Implementation

002 30 min intermediate

Day 274: Storage Engine Foundations: Pages, Records, and Layout

A storage engine does not really store "rows." It stores bytes inside pages, and the way those bytes are arranged quietly determines read cost, update cost, fragmentation, and recovery complexity.

Today's "Aha!" Moment

The insight: Once data leaves the logical schema and hits disk or durable storage, the engine stops thinking in terms of "table with columns" and starts thinking in terms of:

fixed-size pages
record headers and offsets
free space management
pointers or slot directories
movement, deletion, and reuse

That physical layout is the hidden reason later topics like indexes, WAL, MVCC, and query execution behave the way they do.

Why this matters: Developers often speak about rows as if they exist directly on disk. But an engine has to answer harder questions:

how much do we read at once?
where inside a page does a record live?
how do we find it if nearby records move?
how do variable-length columns fit without rewriting everything?
how do we update one record without destroying locality or recovery correctness?

Those are page-layout problems, not abstract schema problems.

The universal pattern:

storage is organized into pages
pages contain record metadata plus payload
record movement and free space require indirection or bookkeeping
every later access path is built on top of these page-level decisions

Concrete anchor: A customers table may look like simple rows in SQL, but the engine may actually store each row inside an 8 KB page, with a header, null bitmap, variable-length offsets, and a slot directory. A single update to a long text field may no longer fit in the original position, forcing page reorganization or redirection.

How to recognize when this applies:

You care about why point reads, scans, or updates cost what they cost.
You need to understand fragmentation, vacuuming, overflow pages, or page splits later.
You want to reason about how indexes point to data physically.

Common misconceptions:

[INCORRECT] "A row is just stored contiguously exactly as the SQL table suggests."
[INCORRECT] "Page layout is an implementation detail with little effect on performance."
[CORRECT] The truth: Page and record layout are first-order performance and correctness decisions because they shape locality, write amplification, indirection, and recovery behavior.

Real-world examples:

Short fixed-width records: Cheap to pack densely and scan predictably.
Wide or variable-length records: Harder to update in place and more likely to cause fragmentation or overflow handling.

Why This Matters

The problem: Logical data models describe what the data means, but storage engines must decide how it is physically placed. Poor physical layout decisions create hidden costs:

extra I/O
more page splits or rewrites
unstable record addresses
poor cache locality
complicated free-space reuse
harder crash recovery bookkeeping

Before:

Rows are imagined as abstract objects that happen to live on disk.
Updates are assumed to touch only the changed field.
Performance problems are blamed on queries alone, not on physical layout.

After:

Pages are understood as the real unit of storage and I/O.
Record layout is treated as a design choice affecting update and scan cost.
Later lessons on B-Trees, WAL, buffer pools, and MVCC have a clear physical foundation.

Real-world impact: This mental model helps explain why engines make trade-offs like slotted pages, indirection, append-friendly structures, or overflow storage. It also prevents hand-wavy explanations of "database performance" that ignore the actual storage substrate.

Learning Objectives

By the end of this session, you will be able to:

Explain why storage engines work in pages rather than logical rows - Understand page size as the basic unit of durable I/O and caching.
Describe how records are physically organized inside pages - Reason about headers, slot directories, fixed vs variable-length fields, and free space.
Evaluate layout trade-offs - Recognize how page organization affects locality, updates, fragmentation, and future index design.

Core Concepts Explained

Concept 1: Pages Are the Real Unit of I/O and Caching

Storage engines do not typically read one row at a time from disk. They read and write in larger chunks called pages.

A page is usually:

fixed-size
aligned with the engine's buffering strategy
the unit the buffer pool caches and the WAL/recovery logic reasons about

Typical sizes might be:

4 KB
8 KB
16 KB

depending on the engine.

Why pages?

Because storage devices and operating systems reward chunked access much more than tiny random byte reads. Also, the engine needs a stable unit for:

caching
eviction
dirty tracking
flushing
latching
recovery bookkeeping

So the first deep shift is:

the engine does not think "load row 42"
it thinks "bring page P into memory, then find the record inside it"

That means all higher-level access costs are downstream of page behavior:

point lookup cost
scan cost
update cost
cache hit rate

This is also why later lessons on buffer pools and indexes will care so much about locality.

Concept 2: Records Need Layout Metadata, Not Just Payload Bytes

Once a page is loaded, the engine still needs to locate records inside it.

That is why pages usually include some combination of:

page header
slot directory or item pointer array
free space region
record bodies

One common pattern is the slotted page:

the page header tracks metadata
records are stored somewhere in the page body
a slot directory maps logical positions to physical offsets

This indirection is valuable because records may move as the page changes.

Why movement matters:

deletes create holes
variable-length updates may no longer fit in place
compaction may repack the page

If external code or indexes pointed directly to the byte offset of every record, any compaction would break them. Slot indirection makes layout more flexible.

Records themselves also need internal metadata, such as:

null bitmap
column offsets
length info for variable-width fields
visibility or version metadata in some engines

So a "row" on disk is usually:

header + metadata + encoded field values

not a naive concatenation of columns.

This becomes especially important once fields are:

nullable
variable length
compressed
versioned

Concept 3: Layout Trade-offs Shape Updates, Fragmentation, and Future Access Paths

Physical layout is always a trade-off.

If records are packed tightly:

scans may be faster
cache locality may improve
but updates may require movement or page reorganization

If the engine leaves extra space:

updates may fit more often in place
but storage density drops

Variable-length columns make this harder:

a short update may fit
a long update may force the record to move
moved records create indirection, overflow, or page churn

Deletes create another pressure:

reclaim free space immediately and pay maintenance cost
or leave fragmentation and clean up later

This is why layout decisions ripple outward into later database behavior:

B-Trees depend on stable record references or tuple IDs
WAL and recovery need to describe page changes correctly
MVCC may add version chains or visibility metadata to record layout
vacuum/compaction exist because page-level free space degrades over time

So this lesson is not just about "how bytes are arranged." It is the physical base layer under:

index maintenance
update amplification
recovery design
buffer-pool efficiency

Once that clicks, later lessons stop feeling like separate topics. They become consequences of one physical model.

Troubleshooting

Issue: "Why does updating one field sometimes seem much more expensive than expected?"

Why it happens / is confusing: At the logical level it looks like a tiny change.

Clarification / Fix: The updated record may no longer fit in its current physical space. Check whether variable-length growth, page compaction, overflow handling, or page splits are involved.

Issue: "Why do databases care so much about page size?"

Why it happens / is confusing: SQL abstractions hide the page boundary.

Clarification / Fix: Page size affects I/O granularity, cache efficiency, scan behavior, and how many records or index entries fit together physically. It is a foundational design knob, not arbitrary trivia.

Issue: "Why can deletes and updates make performance worse over time even when row count is stable?"

Why it happens / is confusing: The logical table size may look unchanged.

Clarification / Fix: Physical pages may be accumulating holes, overflow chains, or fragmented free space. The engine may need vacuuming, compaction, or page reorganization to recover locality.

Advanced Connections

Connection 1: Pages and Layout <-> B-Tree Indexes

The parallel: The next lesson will show how B-Tree nodes are pages too. Index design is not separate from page layout; it is another use of the same physical storage unit, optimized for navigation rather than row payload.

Real-world case: Leaf entries and internal nodes both live inside pages, so fanout, split behavior, and pointer stability all inherit page-layout constraints.

Connection 2: Pages and Layout <-> WAL and Recovery

The parallel: Later lessons on WAL and crash recovery depend on page-level reasoning. Recovery replays changes against pages, not against abstract rows in the SQL sense.

Real-world case: Correct redo/undo logic requires stable page IDs, change ordering, and careful tracking of how records physically changed inside a page.

Resources

Optional Deepening Resources

[BOOK] Database Internals
- Link: https://www.databass.dev/
- Focus: Use it for practical explanations of slotted pages, record layout, buffer management, and storage-engine trade-offs.
[DOCS] PostgreSQL Documentation: Database Physical Storage
- Link: https://www.postgresql.org/docs/current/storage.html
- Focus: A useful real-world reference for pages, tuples, free space, and the physical view below SQL tables.
[DOCS] SQLite File Format
- Link: https://www.sqlite.org/fileformat.html
- Focus: Great for seeing a concrete page-based storage format in detail.
[BOOK] Designing Data-Intensive Applications
- Link: https://dataintensive.net/
- Focus: Helpful for tying physical storage choices back to higher-level database trade-offs.

Key Insights

Pages are the true storage unit - Engines optimize around page-sized I/O, caching, and recovery, not around abstract logical rows.
Records need metadata and indirection - Variable lengths, deletes, movement, and free-space reuse force structure beyond simple field concatenation.
Physical layout drives future behavior - Indexes, WAL, MVCC, fragmentation, and scan cost all inherit the consequences of page and record organization.

← Back to Database Engine Internals and Implementation

← Back to Learning Hub