File Organization: Heap, Sorted, and Hash

LESSON

Database Engine Internals and Implementation

013 30 min advanced

Day 390: File Organization: Heap, Sorted, and Hash

The core idea: Once 05.md has made a change durable, file organization decides where that row lives on disk, which means it decides whether your workload pays mostly during writes, range reads, or exact-key lookups.

Today's "Aha!" Moment

In 05.md, Harbor Point's municipal-bond database learned how to make a quote durable before every affected data page had been rewritten in place. That answered the durability question. It did not answer the placement question. After the WAL record is safe, where should the new quote row actually go? The answer matters because future reads and updates will inherit whatever physical layout choice the storage engine makes today.

Suppose Harbor Point stores live quotes as rows keyed by cusip and quote_ts. A heap file can drop a new row onto any page with free space, so the market-open ingest path stays cheap. A sorted file can keep rows physically ordered by (cusip, quote_ts), so compliance sweeps and range scans become efficient. A hash-organized file can route each row straight to a bucket based on cusip, so exact lookups avoid scanning unrelated pages. The same logical row therefore creates three very different I/O patterns even before query planning enters the picture.

The misconception to discard is that physical layout is a minor implementation detail because "we can always add an index later." In practice, the base file organization determines how much page movement inserts cause, how stable row locations are, how much locality the buffer pool can exploit, and why the next lesson on 07.md needs B-trees at all. File organization is the first access-path decision, not an afterthought.

Why This Matters

Harbor Point has three competing access patterns against the same municipal_quotes relation. The ingest service appends thousands of quote revisions at market open. Trader dashboards ask for the latest quote on one cusip at a time. Compliance analysts run range scans over issuers and time windows to reconstruct how pricing moved during a market event.

Those requests want different physical guarantees. The ingest service wants inserts that touch as few pages as possible. The compliance team wants related rows packed together so one sequential scan can answer a wide query without random seeks. The trader dashboard wants a direct path to one instrument without wading through neighboring rows that happen to sit nearby on disk.

This is why file organization is a production decision, not a classroom taxonomy. If the table is heap-organized, writes stay cheap, but selective reads depend on secondary indexes or broad scans. If the table is kept fully sorted, range queries get beautiful locality, but steady inserts start paying for page shifts, overflow pages, and background reorganization. If the table is hash-organized, equality probes can be fast, but any query that cares about order or neighboring keys becomes much more expensive. Real engines are built around these trade-offs, which is why many of them combine a heap table with ordered indexes rather than pretending one layout solves every workload.

Learning Objectives

By the end of this session, you will be able to:

  1. Explain what each file organization optimizes - Describe why heap, sorted, and hash layouts make different operations cheap.
  2. Trace the mechanics behind each layout - Follow how inserts, lookups, and reorganizations interact with pages, free space, and buckets.
  3. Choose a layout using workload evidence - Match a production access pattern to the layout whose trade-offs actually fit it.

Core Concepts Explained

Concept 1: Heap files minimize movement and keep the write path simple

For Harbor Point's ingest service, the most attractive property of a heap file is that the storage engine does not need to preserve any global row order. Once the WAL record is durable, the engine can place the new quote on any page that still has space or append a new page if none do. That means one logical insert usually dirties a small, predictable set of pages: the target data page, some free-space metadata, and whatever indexes exist above the heap.

The mechanics are simple on purpose. A heap-organized relation is typically just a collection of pages plus a free-space directory or bitmap that helps the engine find candidate pages for future inserts. Each row is identified by a physical locator such as (page_id, slot_id), so the page can manage variable-length tuples internally without the whole table needing to stay ordered. Deletes and updates may leave holes behind, and compaction or vacuum later turns those holes back into usable space.

That simplicity is why heap organization is common for general-purpose OLTP tables. Harbor Point can accept quote revisions quickly even when inserts arrive out of key order, because the base table is not trying to keep related cusip values adjacent at all times. The trade-off is just as important: without an index, a heap file has no direct story for "find all rows for this issuer" beyond scanning pages and checking tuples. Heap layout makes writes cheap by refusing to encode lookup structure into the table itself.

You can think of the heap file as saying, "durability comes from WAL, identity comes from the row locator, and locality will be supplied by some other structure if we need it." That is a strong production default when write volume is high and the access pattern is mixed, but it is not magic. A heap table that serves many range queries will usually need an ordered index to recover the locality it gave up at the base-file layer.

Concept 2: Sorted files turn physical order into an access path

Now switch Harbor Point's thought experiment. Suppose the dominant query is not single-row ingest but repeated scans such as "show every quote for issuers between CUSIP 13000 and CUSIP 13999 from 09:30 to 09:45." A sorted file makes that query attractive because the rows are physically maintained in key order. The engine can binary-search a page directory or fence keys to find the first relevant page, then read forward through mostly sequential pages while the key range remains relevant.

That layout buys real locality. Nearby keys tend to live on nearby pages, so large ordered scans become cache-friendly and device-friendly in a way that a heap file cannot match by itself. If Harbor Point bulk-loads yesterday's quote archive sorted by (cusip, quote_ts), the result is almost ideal for analytical replay work and end-of-day reporting.

The pain begins when fresh quotes arrive continuously. An insert into a sorted file is not "write the row wherever space exists." It is "find the correct page for this key and preserve the invariant that keys remain ordered afterward." If the destination page is full, the engine must shift records to neighboring pages, create overflow pages, or periodically rebuild larger regions. All of those options add write amplification. Overflow pages are especially revealing: they preserve correctness, but they chip away at the locality advantage that justified the sorted file in the first place.

That maintenance burden is why fully sorted files are common for bulk-loaded or append-in-order workloads, but much less common as the primary structure for hot OLTP tables. The next lesson on 07.md exists because engineers wanted the read benefits of sorted order without forcing every insert to rewrite an ever-growing suffix of the file. B-trees solve that by storing order hierarchically and localizing change to one root-to-leaf path plus occasional node splits.

Concept 3: Hash organization buys direct equality access by giving up order

A hash-organized file makes the opposite bet. Instead of preserving global order, it computes a bucket from the search key, such as bucket = h(cusip), and stores the row in the pages assigned to that bucket. If Harbor Point's hottest operation is "fetch the latest quote for exactly this one cusip," hashing can skip a lot of irrelevant pages because the engine does not need to preserve any relationship between neighboring key values.

The attractive case is straightforward. Equality lookups can jump directly to a bucket, examine only the rows or pointers stored there, and ignore the rest of the file. Inserts are also naturally distributed when hash values are well balanced, because new rows spread across many buckets instead of fighting for the right place in one global order. That is why hash organization shows up in key-value systems and in equality-focused access methods.

The limitations are structural, not accidental. Hashing destroys order, so the database cannot use bucket adjacency to answer "all quotes from CUSIP 13000 to CUSIP 13999" efficiently. It also depends heavily on distribution. If one issuer becomes unusually hot, its bucket can develop overflow chains, extra locking pressure, and latency cliffs that did not exist in the balanced case. Dynamic hashing schemes reduce the need for full-table rehashing, but they do not change the basic bargain: you are trading away ordered access to make equality probes cheap.

For Harbor Point, that means hash organization only wins when the workload is dominated by exact-key access and the team is willing to pair it with something else for range queries. In mixed workloads, a pure hash-organized base file often feels too narrow. The engine either adds an ordered secondary structure anyway or adopts a different primary organization whose weaknesses are easier to balance across many query shapes.

Troubleshooting

Issue: Market-open ingest is healthy, but range scans over quote history read far more pages than expected.

Why it happens / is confusing: A heap file kept inserts cheap by ignoring order. WAL and buffer tuning cannot create locality that the base file never encoded.

Clarification / Fix: Add an ordered access path for the range predicate, or change the physical organization for workloads where scans dominate. Do not expect heap layout alone to cluster related cusip and time ranges.

Issue: A sorted-file prototype looks excellent in bulk-load benchmarks but falls apart once real-time quote revisions start arriving.

Why it happens / is confusing: Benchmarks often measure read locality after an ideal load, not the cost of preserving order under continuous out-of-order inserts. Page shifts and overflow chains become the hidden write bill.

Clarification / Fix: Measure insert amplification and overflow growth, not just scan speed. If steady writes matter, a B-tree or LSM-style design is usually a better long-term answer than a strictly sorted base file.

Issue: Exact cusip lookups are fast in a hash layout, but compliance queries over issuer or time ranges become unpredictable.

Why it happens / is confusing: Hashing grouped rows by bucket identity, not by neighboring key values, so range predicates lost the physical order they need.

Clarification / Fix: Use hash organization only when equality probes dominate, or pair it with an ordered secondary structure that can answer the range side of the workload.

Advanced Connections

Connection 1: Sorted files -> B-tree node structure

A sorted file is the simplest way to say "disk pages should reflect key order," but it becomes expensive once inserts must preserve that invariant continuously. B-trees keep the same idea of ordered leaves while reducing maintenance from "rewrite long stretches of the file" to "search one path and sometimes split one node." That is why 07.md is the natural continuation of this lesson rather than a separate topic.

Connection 2: File organization -> write amplification and recovery pressure

The physical layout does not change the WAL rule from 05.md, but it changes how many pages a logical write tends to dirty. Heap insertion often touches a small number of pages. Sorted files may cascade across neighbors to keep order intact. Hash layouts can trigger bucket overflows or splits. Those differences show up later as checkpoint pressure, cache churn, and recovery work, which is why file organization belongs in the same mental model as logging and buffer management rather than in a separate "storage trivia" box.

Resources

Optional Deepening Resources

Key Insights

  1. A file organization is a physical promise about locality - Heap promises flexibility, sorted files promise order, and hash files promise direct equality access.
  2. Cheap writes and cheap reads rarely come from the same invariant - The work you avoid during insertion usually reappears during scans, reorganization, or overflow handling.
  3. Modern storage structures are responses to these trade-offs - B-trees, clustered indexes, and LSM designs exist because pure heap, pure sorted, and pure hash layouts each leave an important workload unsolved.
PREVIOUS Write Path Fundamentals and Append-Only Logging NEXT B-Tree Node Structure and Search Path

← Back to Database Engine Internals and Implementation

← Back to Learning Hub