LESSON
Day 280: Concurrency Control Fundamentals: Locks, Latches, and Deadlocks
The core idea: once many transactions and engine threads touch the same data and pages at the same time, the database must coordinate two different things: logical correctness of transactions and physical safety of internal structures.
Today's "Aha!" Moment
The insight: lock and latch are not interchangeable words. A lock protects logical data consistency between transactions. A latch protects an in-memory structure for a short critical section inside the engine.
Why this matters: Teams often debug blocking in production without knowing whether they are looking at transactional contention or internal engine contention. Those are related, but they are not the same problem.
Concrete anchor: Two transactions both try to update the same account row. At the same time, the engine must also safely modify a shared B-Tree page structure in memory. One kind of coordination is about business correctness. The other is about not corrupting the internal page tree.
The practical sentence to remember:
Locks protect the meaning of data. Latches protect the safety of the engine's internal memory structures.
Why This Matters
The problem: Concurrency is the reason a database is useful, but it is also the reason it gets hard. Without coordination, updates race, reads observe inconsistent state, and internal structures can be corrupted. With too much coordination, throughput collapses and transactions wait on each other.
Without this model:
- Blocking gets treated as one generic problem.
- Deadlocks look mysterious instead of predictable.
- Teams tune indexes or hardware when the real issue is contention on rows, pages, or internal structures.
With this model:
- You can distinguish transaction-level waits from engine-level short critical sections.
- You can explain why deadlocks happen even in correct systems.
- You can reason about why some contention should be reduced with schema or query changes, while other contention points to engine hot spots.
Operational payoff: Better diagnosis of blocking, better design of transaction access order, and fewer cases where latency spikes are blamed on storage when the real issue is concurrency control.
Learning Objectives
By the end of this lesson, you should be able to:
- Explain why databases need multiple layers of coordination, not just one generic mutex everywhere.
- Distinguish locks from latches in purpose, scope, and duration.
- Reason about deadlocks and blocking as natural consequences of pessimistic coordination, not as random failures.
Core Concepts Explained
Concept 1: Locks Protect Transactional Correctness
Concrete example / mini-scenario: Transaction A reads an inventory row and wants to decrement stock. Transaction B wants to update that same row at the same time.
Intuition: Locks exist so concurrent transactions do not both behave as if they were alone when that would violate consistency.
What locks are about:
- Rows
- Key ranges
- Pages in some engines
- Tables in broader cases
Typical lock goals:
- Prevent lost updates
- Prevent conflicting writes
- Control whether reads can see in-flight changes
- Support isolation guarantees
How they work mechanically:
- A transaction asks for a lock mode such as shared or exclusive.
- If compatible with existing locks, it proceeds.
- If incompatible, it waits, aborts, or is handled by another concurrency-control rule.
Important property: Locks often live as long as part or all of a transaction, not just for a tiny machine-level critical section.
Why this matters: That lifetime is exactly why locks can create user-visible blocking and why transaction design affects concurrency so strongly.
Concept 2: Latches Protect Internal Engine Structures
Concrete example / mini-scenario: The engine is splitting a B-Tree page in memory while another thread wants to traverse or update the same structure.
Intuition: Latches are short-lived guards that keep internal pages, hash tables, queues, and metadata structures from being torn apart by concurrent engine threads.
What latches are about:
- Buffer frames
- B-Tree page structures
- Internal linked lists
- Lock tables
- Metadata caches
How they differ from locks:
- Purpose:
- Locks protect transaction semantics
- Latches protect memory safety and structural correctness
- Duration:
- Locks may last for large parts of a transaction
- Latches should be held briefly
- Visibility:
- Lock waits often surface as transactional blocking
- Latch contention may show up as CPU stalls, internal waits, or throughput collapse
Why engines keep the distinction:
- If every internal step used long-lived transactional locking, the engine would grind to a halt.
- If logical correctness relied only on short in-memory latches, transaction isolation would break immediately.
Mental model:
A lock is reserving a meeting room for a business process.
A latch is holding the door while furniture is being moved so nobody gets hurt.
Concept 3: Deadlocks Are a Normal Cost of Pessimistic Coordination
Concrete example / mini-scenario:
- Transaction A locks row X and wants row Y
- Transaction B locks row Y and wants row X
Now each transaction is waiting for the other.
Intuition: A deadlock is not a bug in the sense of memory corruption. It is a cycle in the wait graph. The system must detect it or prevent it.
How deadlocks emerge:
- Different transactions acquire resources in different orders
- Long transactions hold locks while doing other work
- Multiple resources are locked incrementally instead of atomically
Common responses:
-
Detection
- Build or inspect the wait-for graph
- Detect cycles
- Abort one transaction as the victim
-
Timeouts
- Simpler, but less precise
- Can confuse long waits with deadlocks
-
Prevention through design
- Consistent access order
- Smaller transactions
- Reduced lock footprint
Important nuance:
Deadlocks are mostly a lock-level phenomenon. Latches can also suffer from bad ordering internally, but engines try hard to design latch protocols that keep those critical sections extremely short and well ordered.
The bigger lesson: Once you accept pessimistic coordination, waits and occasional forced aborts are part of the design space. The real question is whether they are rare and controlled.
Troubleshooting
Issue: Transactions are blocking each other more than expected.
Why it happens: The lock footprint may be larger than expected, transactions may be touching rows in inconsistent order, or an index choice may be forcing broader scans and therefore broader locking.
Clarification / Fix: Inspect lock waits, reduce transaction scope, and look for ways to narrow the read/write set earlier.
Issue: Throughput drops even though lock waits do not look dramatic.
Why it happens: The real problem may be latch contention inside hot internal structures such as buffer descriptors, lock tables, or index root pages.
Clarification / Fix: Distinguish transactional blocking from internal engine contention. They need different remedies.
Issue: Deadlocks appear after adding a new feature.
Why it happens: The feature may have introduced a new access order that conflicts with existing transaction flows.
Clarification / Fix: Compare lock acquisition order across transaction types. The fix is often to normalize access order, not just retry harder.
Issue: Retries are increasing but correctness is still fine.
Why it happens: The system may be resolving deadlocks correctly, but at an application-level cost in tail latency and wasted work.
Clarification / Fix: Treat frequent deadlocks as a design smell. Recovery behavior is working, but the concurrency pattern is probably wrong for the workload.
Advanced Connections
Connection 1: Locks/Latches <-> Buffer Pools
The parallel: Hot data pages are good for I/O, but they also attract more concurrent access. That means a successful cache can turn into a contention hot spot.
Why this matters: This is the bridge from the previous lesson. Memory locality can improve latency while also increasing coordination pressure.
Connection 2: Locks/Latches <-> Optimistic Concurrency
The contrast: Pessimistic locking coordinates by waiting before conflicts can cause damage. Optimistic approaches do more work first, then validate or order later.
Why this matters: This is exactly why the next lesson matters. Once you understand the costs of pessimistic coordination, optimistic alternatives make much more sense.
Resources
Suggested Resources
- [DOC] PostgreSQL Explicit Locking - Documentation
Focus: concrete lock modes, conflicts, and deadlock behavior in a production engine. - [DOC] InnoDB Locking - Documentation
Focus: practical row, gap, and next-key locking behavior in MySQL. - [BOOK] Database Internals - Book site
Focus: deeper engine-level intuition for latches, locks, and internal contention.
Key Insights
- Locks and latches solve different problems even though both are coordination mechanisms.
- Deadlocks are expected under pessimistic coordination and must be detected, prevented, or resolved.
- Contention diagnosis starts by asking what is being protected: user-visible data semantics or internal engine structure.