Concurrency Control Fundamentals: Locks, Latches, and Deadlocks

LESSON

Consistency and Replication

009 30 min intermediate

Day 280: Concurrency Control Fundamentals: Locks, Latches, and Deadlocks

The core idea: once many transactions and engine threads touch the same data and pages at the same time, the database must coordinate two different things: logical correctness of transactions and physical safety of internal structures.


Today's "Aha!" Moment

The insight: lock and latch are not interchangeable words. A lock protects logical data consistency between transactions. A latch protects an in-memory structure for a short critical section inside the engine.

Why this matters: Teams often debug blocking in production without knowing whether they are looking at transactional contention or internal engine contention. Those are related, but they are not the same problem.

Concrete anchor: Two transactions both try to update the same account row. At the same time, the engine must also safely modify a shared B-Tree page structure in memory. One kind of coordination is about business correctness. The other is about not corrupting the internal page tree.

The practical sentence to remember:
Locks protect the meaning of data. Latches protect the safety of the engine's internal memory structures.


Why This Matters

The problem: Concurrency is the reason a database is useful, but it is also the reason it gets hard. Without coordination, updates race, reads observe inconsistent state, and internal structures can be corrupted. With too much coordination, throughput collapses and transactions wait on each other.

Without this model:

With this model:

Operational payoff: Better diagnosis of blocking, better design of transaction access order, and fewer cases where latency spikes are blamed on storage when the real issue is concurrency control.


Learning Objectives

By the end of this lesson, you should be able to:

  1. Explain why databases need multiple layers of coordination, not just one generic mutex everywhere.
  2. Distinguish locks from latches in purpose, scope, and duration.
  3. Reason about deadlocks and blocking as natural consequences of pessimistic coordination, not as random failures.

Core Concepts Explained

Concept 1: Locks Protect Transactional Correctness

Concrete example / mini-scenario: Transaction A reads an inventory row and wants to decrement stock. Transaction B wants to update that same row at the same time.

Intuition: Locks exist so concurrent transactions do not both behave as if they were alone when that would violate consistency.

What locks are about:

Typical lock goals:

How they work mechanically:

Important property: Locks often live as long as part or all of a transaction, not just for a tiny machine-level critical section.

Why this matters: That lifetime is exactly why locks can create user-visible blocking and why transaction design affects concurrency so strongly.


Concept 2: Latches Protect Internal Engine Structures

Concrete example / mini-scenario: The engine is splitting a B-Tree page in memory while another thread wants to traverse or update the same structure.

Intuition: Latches are short-lived guards that keep internal pages, hash tables, queues, and metadata structures from being torn apart by concurrent engine threads.

What latches are about:

How they differ from locks:

Why engines keep the distinction:

Mental model:
A lock is reserving a meeting room for a business process.
A latch is holding the door while furniture is being moved so nobody gets hurt.


Concept 3: Deadlocks Are a Normal Cost of Pessimistic Coordination

Concrete example / mini-scenario:

Now each transaction is waiting for the other.

Intuition: A deadlock is not a bug in the sense of memory corruption. It is a cycle in the wait graph. The system must detect it or prevent it.

How deadlocks emerge:

Common responses:

  1. Detection

    • Build or inspect the wait-for graph
    • Detect cycles
    • Abort one transaction as the victim
  2. Timeouts

    • Simpler, but less precise
    • Can confuse long waits with deadlocks
  3. Prevention through design

    • Consistent access order
    • Smaller transactions
    • Reduced lock footprint

Important nuance:
Deadlocks are mostly a lock-level phenomenon. Latches can also suffer from bad ordering internally, but engines try hard to design latch protocols that keep those critical sections extremely short and well ordered.

The bigger lesson: Once you accept pessimistic coordination, waits and occasional forced aborts are part of the design space. The real question is whether they are rare and controlled.


Troubleshooting

Issue: Transactions are blocking each other more than expected.

Why it happens: The lock footprint may be larger than expected, transactions may be touching rows in inconsistent order, or an index choice may be forcing broader scans and therefore broader locking.

Clarification / Fix: Inspect lock waits, reduce transaction scope, and look for ways to narrow the read/write set earlier.

Issue: Throughput drops even though lock waits do not look dramatic.

Why it happens: The real problem may be latch contention inside hot internal structures such as buffer descriptors, lock tables, or index root pages.

Clarification / Fix: Distinguish transactional blocking from internal engine contention. They need different remedies.

Issue: Deadlocks appear after adding a new feature.

Why it happens: The feature may have introduced a new access order that conflicts with existing transaction flows.

Clarification / Fix: Compare lock acquisition order across transaction types. The fix is often to normalize access order, not just retry harder.

Issue: Retries are increasing but correctness is still fine.

Why it happens: The system may be resolving deadlocks correctly, but at an application-level cost in tail latency and wasted work.

Clarification / Fix: Treat frequent deadlocks as a design smell. Recovery behavior is working, but the concurrency pattern is probably wrong for the workload.


Advanced Connections

Connection 1: Locks/Latches <-> Buffer Pools

The parallel: Hot data pages are good for I/O, but they also attract more concurrent access. That means a successful cache can turn into a contention hot spot.

Why this matters: This is the bridge from the previous lesson. Memory locality can improve latency while also increasing coordination pressure.

Connection 2: Locks/Latches <-> Optimistic Concurrency

The contrast: Pessimistic locking coordinates by waiting before conflicts can cause damage. Optimistic approaches do more work first, then validate or order later.

Why this matters: This is exactly why the next lesson matters. Once you understand the costs of pessimistic coordination, optimistic alternatives make much more sense.


Resources

Suggested Resources


Key Insights

  1. Locks and latches solve different problems even though both are coordination mechanisms.
  2. Deadlocks are expected under pessimistic coordination and must be detected, prevented, or resolved.
  3. Contention diagnosis starts by asking what is being protected: user-visible data semantics or internal engine structure.

PREVIOUS Read Repair & Anti-Entropy - Healing Diverged Replicas NEXT Optimistic Concurrency and Timestamp Ordering

← Back to Consistency and Replication

← Back to Learning Hub