Buffer Pools, Caching, and I/O Scheduling

LESSON

Database Engine Internals and Implementation

007 30 min intermediate

Day 279: Buffer Pools, Caching, and I/O Scheduling

The core idea: a good query plan still needs the right pages in memory at the right time, and a storage engine lives or dies by how it manages that working set and schedules background I/O around foreground reads.


Today's "Aha!" Moment

The insight: Query cost is not just about operators. It is also about whether the pages those operators need are already hot, whether dirty pages are blocking eviction, and whether background writeback is quietly competing with foreground latency.

Why this matters: Two identical queries can behave very differently depending on cache warmth, eviction pressure, and I/O scheduling.

Concrete anchor: Yesterday's query plan looked fine in staging, but production is slower. The SQL did not change. The data did not change much. What changed was the memory working set and the fact that checkpoint writeback, scans, and index probes are now competing for the same storage path.

The practical sentence to remember:
Execution plans decide what work is needed; buffer pools and I/O scheduling decide how painful that work becomes.


Why This Matters

The problem: Storage is orders of magnitude slower than CPU, so engines try hard to keep useful pages in memory. But memory is limited, some pages are dirty, some are pinned by active work, and scans can easily evict data that short point-lookups needed.

Without this model:

With this model:

Operational payoff: Better sizing of memory, better tuning of shared buffers and flush policy, fewer checkpoint spikes, and a clearer understanding of why some workloads get hurt by scans while others get hurt by writeback.


Learning Objectives

By the end of this lesson, you should be able to:

  1. Explain the role of the buffer pool as the managed in-memory working set for data pages.
  2. Describe the main cache-management decisions around hits, misses, dirty pages, eviction, and scan pollution.
  3. Reason about I/O scheduling trade-offs between foreground latency and background maintenance work.

Core Concepts Explained

Concept 1: The Buffer Pool Is the Engine's Working Memory

Concrete example / mini-scenario: A query needs a B-Tree root page, then a few internal pages, then a leaf page, then finally the target row page. If all of them are already in memory, the query feels instant. If not, each miss may trigger storage I/O.

Intuition: A buffer pool is not just "a cache." It is the engine's actively managed set of database pages in memory.

What lives there:

How it works mechanically:

  1. An operator requests a page.
  2. If the page is already resident, the engine gets a buffer hit.
  3. If not, the engine triggers a buffer miss and loads the page from storage.
  4. The page is pinned while in use so it cannot be evicted mid-operation.
  5. If modified, the page becomes dirty and must be flushed later before eviction or checkpoint progress.

Why engines manage this directly:

Important subtlety: In many systems, the database cache and the OS page cache can interact awkwardly. This is why some engines prefer direct I/O or careful buffer sizing: too many layers of caching can waste memory and hide pressure until it hurts.


Concept 2: Cache Policy Is About Protecting the Right Working Set

Concrete example / mini-scenario: A reporting query sequentially scans a huge table. Meanwhile, OLTP traffic keeps asking for the same small hot index pages. If the scan pollutes the cache, point lookups suddenly slow down.

Intuition: The hard question is not "should we cache?" It is "which pages deserve to stay?"

Common pressures on eviction policy:

Why a naive policy fails:

What engines typically add:

The key systems idea:
Eviction policy is really admission control for memory pressure.

If the engine protects the wrong pages, I/O explodes even though nothing about the query text changed.

Connection to yesterday's lesson: A great query plan can still run badly if the pages it expects to probe cheaply keep getting evicted by unrelated work.


Concept 3: I/O Scheduling Is a Latency Policy, Not Just Plumbing

Concrete example / mini-scenario: Foreground user queries need low-latency random reads, but a checkpoint starts flushing large numbers of dirty pages and saturates the device queue.

Intuition: Once reads, writeback, checkpointing, log flushes, and scans all share the same storage path, "I/O scheduling" becomes a policy question about whose latency matters most right now.

Main sources of I/O pressure:

Why background work is dangerous:

Typical balancing mechanisms:

The trade-off:

Mental model:
The buffer pool decides what deserves scarce memory.
The I/O scheduler decides which debt gets paid first when memory is no longer enough.


Troubleshooting

Issue: Queries are fast after warm-up but very slow after restart or failover.

Why it happens: The working set is no longer resident, so operators that used to hit memory now miss and fetch from storage.

Clarification / Fix: Distinguish cold-cache from warm-cache performance. Capacity planning should consider both, especially for failover scenarios.

Issue: Checkpoints cause periodic latency spikes.

Why it happens: Dirty-page flushing may be too bursty, so checkpoint work competes with foreground reads and writes in large bursts.

Clarification / Fix: Smooth writeback, revisit checkpoint cadence, and measure how much dirty-page debt accumulates between flush phases.

Issue: Cache hit rate looks acceptable, but user-facing latency is still unstable.

Why it happens: Averages can hide the fact that the pages missing are exactly the ones on latency-sensitive paths, or that background writes are saturating storage anyway.

Clarification / Fix: Look beyond overall hit rate. Segment by workload class, page type, and tail latency.

Issue: Large scans hurt unrelated OLTP traffic.

Why it happens: Scan pages may pollute the buffer pool and evict hot pages needed by point lookups.

Clarification / Fix: Use scan-aware caching behavior, readahead policy, or separate analytical workloads when needed.


Advanced Connections

Connection 1: Buffer Pools <-> WAL and Checkpoints

The parallel: Dirty pages can stay in memory for performance, but WAL is what makes that safe. Checkpoints then decide how much of that dirty state should be pushed out to bound recovery time.

Why this matters: Memory management, durability, and recovery are one system, not three separate topics.

Connection 2: Buffer Pools <-> Concurrency Control

The parallel: Once many queries touch the same in-memory pages, coordination matters too. That is why the next lesson moves naturally into locks, latches, and deadlocks.

Why this matters: Hot pages are good for I/O, but they can also become hot contention points.


Resources

Suggested Resources


Key Insights

  1. The buffer pool is the engine's managed working set, not just a generic cache.
  2. Eviction policy is a workload decision because the wrong pages in memory can ruin an otherwise good plan.
  3. I/O scheduling is part of latency control because background maintenance and foreground queries compete for the same storage path.

PREVIOUS Query Execution Pipelines and Operator Costs NEXT DBMS Architecture and Storage Hierarchy

← Back to Database Engine Internals and Implementation

← Back to Learning Hub