Data Models for Transactional and Analytical Systems

LESSON

Database Engine Internals and Implementation

001 30 min intermediate

Day 273: Data Models for Transactional and Analytical Systems

A data model is not just how data looks. It is a bet about what work matters: tiny consistent updates, or large historical reads and aggregates.


Today's "Aha!" Moment

The insight: Transactional systems and analytical systems are usually asking different questions of the same business reality, so they often need different data models even when they describe the same domain.

Why this matters: Teams often treat "the database schema" as if there should be one natural representation for all workloads. In practice:

Those pressures push the model in different directions.

The universal pattern:

Concrete anchor: An ecommerce platform stores an orders table in the transactional system because it needs to create, update, cancel, and validate orders in real time. The analytics team, however, wants revenue by day, region, campaign, and product category across two years. For that, a fact table plus dimensions or another analytical model often fits better than the original OLTP shape.

How to recognize when this applies:

Common misconceptions:

Real-world examples:

  1. Order service: Needs strong correctness and fast writes for one customer or order at a time.
  2. Revenue dashboard: Needs to scan many rows efficiently and aggregate across long time ranges.

Why This Matters

The problem: Many data-platform issues start with modeling confusion. Teams try to force one representation to serve incompatible goals, and the system pushes back through slow queries, awkward pipelines, brittle ETL, or transactional databases overloaded by analytical workloads.

Before:

After:

Real-world impact: This framing prevents expensive misfits later. It is the reason row stores, column stores, star schemas, append-only logs, and materialized views all exist: not because one is universally better, but because the work differs.


Learning Objectives

By the end of this session, you will be able to:

  1. Explain why transactional and analytical workloads pull data models in different directions - Understand how workload shape drives representation.
  2. Describe the main modeling differences between OLTP-style and OLAP-style systems - Reason about entities, facts, dimensions, normalization, denormalization, and history.
  3. Evaluate when one representation should feed another - Recognize why systems of record and analytical stores are often separated and linked by derived pipelines.

Core Concepts Explained

Concept 1: Transactional and Analytical Systems Optimize Different Questions

A transactional system is usually concerned with:

Typical examples:

An analytical system is usually concerned with:

Typical examples:

So even if both systems describe "orders," the underlying question is different:

That difference is what starts to separate the models.

Concept 2: The Same Domain Often Gets Two Different Shapes

Transactional modeling often prefers:

This keeps writes targeted and helps protect invariants without duplicating too much mutable data.

Analytical modeling often prefers:

This makes large scans and aggregations cheaper and easier to express.

So the same business idea may appear differently:

Neither is "the real one." They are optimized views over the same business world.

This is the key shift:

That matters because the storage engine feels these choices directly:

That is why this lesson is the right opener for the month. The later lessons on pages, indexes, WAL, MVCC, and query execution are all downstream consequences of what kind of work the data model is trying to support.

Concept 3: System of Record and Derived Analytical Models Often Need Separation

A common mature architecture pattern is:

This happens because asking one system to do everything well is expensive.

If you run heavy analytical scans on the transactional database, you may get:

If you keep the analytical side too close to the normalized OLTP model, you may get:

So separation often looks like:

This introduces its own trade-offs:

But it also buys:

The right question is not:

It is:

That question will keep showing up throughout the month.


Troubleshooting

Issue: "Our transactional database is getting slower every time product asks for new dashboards."

Why it happens / is confusing: The team assumes the operational schema should also be the reporting schema.

Clarification / Fix: Separate operational and analytical workload needs. The issue may not be SQL skill; it may be that the same storage and model are serving incompatible jobs.

Issue: "Our warehouse has the same business data as production, but queries still feel awkward and slow."

Why it happens / is confusing: Data was copied, but not reshaped for analytical access.

Clarification / Fix: Review whether the warehouse still mirrors OLTP entities too literally. Analytical systems often need facts, dimensions, and explicit time-oriented modeling.

Issue: "Why do we have two or three versions of 'customer' data?"

Why it happens / is confusing: Multiple representations look like duplication or inconsistency at first glance.

Clarification / Fix: Check whether each version exists for a different workload boundary: operational truth, analytical slicing, or serving layer optimization. The key is to document ownership and derivation clearly.


Advanced Connections

Connection 1: Data Models <-> Storage Engine Foundations

The parallel: The next lesson will move below the schema layer into pages, records, and layout. That physical design only makes sense once the workload is clear: small updates and point lookups reward different layouts than wide scans and aggregate-heavy reads.

Real-world case: A row store and a column store are responses to different access patterns, not simply different implementation styles.

Connection 2: Data Models <-> Query Execution and Cost

The parallel: Later in the month, query execution, indexing, and buffer-pool behavior will show how the engine pays for the model you choose.

Real-world case: A normalized transactional schema may be elegant for updates but expensive for large analytical joins, while a denormalized fact/dimension model may reverse that trade-off.


Resources

Optional Deepening Resources


Key Insights

  1. The right data model depends on the workload - Transactional and analytical systems ask different questions and reward different shapes.
  2. One business domain often needs multiple representations - Operational truth and analytical truth are usually connected, but not physically identical.
  3. Physical design follows model intent - Later choices about pages, indexes, WAL, and execution only make sense once the workload boundary is explicit.

NEXT Storage Engine Foundations: Pages, Records, and Layout

← Back to Database Engine Internals and Implementation

← Back to Learning Hub