LESSON

Database Engine Internals and Implementation

001 30 min intermediate

Day 273: Data Models for Transactional and Analytical Systems

A data model is not just how data looks. It is a bet about what work matters: tiny consistent updates, or large historical reads and aggregates.

Today's "Aha!" Moment

The insight: Transactional systems and analytical systems are usually asking different questions of the same business reality, so they often need different data models even when they describe the same domain.

Why this matters: Teams often treat "the database schema" as if there should be one natural representation for all workloads. In practice:

a transactional system wants fast point reads and writes, tight invariants, and predictable latency
an analytical system wants scans, joins, aggregations, and historical slices across large volumes of data

Those pressures push the model in different directions.

The universal pattern:

operational workflows prefer entities and invariant-protecting updates
analytical workflows prefer facts, dimensions, denormalized access paths, or append-friendly history
the same business concept may be represented differently in each system because the work being optimized is different

Concrete anchor: An ecommerce platform stores an orders table in the transactional system because it needs to create, update, cancel, and validate orders in real time. The analytics team, however, wants revenue by day, region, campaign, and product category across two years. For that, a fact table plus dimensions or another analytical model often fits better than the original OLTP shape.

How to recognize when this applies:

One workload is dominated by single-row or small-transaction updates.
Another workload is dominated by scans, aggregations, reporting, or ML feature extraction.
The same schema seems elegant for one team and painfully inefficient for another.

Common misconceptions:

[INCORRECT] "If the schema represents the business entities correctly, it should work equally well for every workload."
[INCORRECT] "Analytics is just running bigger queries against the transactional database."
[CORRECT] The truth: Different workloads reward different physical layouts, access patterns, and levels of normalization or denormalization.

Real-world examples:

Order service: Needs strong correctness and fast writes for one customer or order at a time.
Revenue dashboard: Needs to scan many rows efficiently and aggregate across long time ranges.

Why This Matters

The problem: Many data-platform issues start with modeling confusion. Teams try to force one representation to serve incompatible goals, and the system pushes back through slow queries, awkward pipelines, brittle ETL, or transactional databases overloaded by analytical workloads.

Before:

One schema is treated as universally correct.
Transactional systems receive ad hoc analytical queries that destroy latency.
Analytical platforms inherit OLTP structures that are correct but painful to query at scale.

After:

Workload type is treated as part of the data-model decision.
The platform distinguishes system-of-record shape from analytical read shape.
Storage-engine and execution choices become easier to reason about because the workload intent is explicit.

Real-world impact: This framing prevents expensive misfits later. It is the reason row stores, column stores, star schemas, append-only logs, and materialized views all exist: not because one is universally better, but because the work differs.

Learning Objectives

By the end of this session, you will be able to:

Explain why transactional and analytical workloads pull data models in different directions - Understand how workload shape drives representation.
Describe the main modeling differences between OLTP-style and OLAP-style systems - Reason about entities, facts, dimensions, normalization, denormalization, and history.
Evaluate when one representation should feed another - Recognize why systems of record and analytical stores are often separated and linked by derived pipelines.

Core Concepts Explained

Concept 1: Transactional and Analytical Systems Optimize Different Questions

A transactional system is usually concerned with:

creating or updating one business object at a time
enforcing invariants
keeping latency low under concurrent writes
answering small, specific queries quickly

Typical examples:

create order
reserve inventory
update user email
fetch account balance

An analytical system is usually concerned with:

reading lots of records together
reconstructing trends over time
aggregating across many entities
joining multiple domains for reporting, exploration, or modeling

Typical examples:

revenue by region over 12 months
conversion by campaign and device
churn rate by cohort
anomaly detection across billions of events

So even if both systems describe "orders," the underlying question is different:

OLTP asks: what is the current valid state of this order?
OLAP asks: what patterns emerge across many orders over time?

That difference is what starts to separate the models.

Concept 2: The Same Domain Often Gets Two Different Shapes

Transactional modeling often prefers:

normalized entities
explicit relations
smaller updates
current-state focus

This keeps writes targeted and helps protect invariants without duplicating too much mutable data.

Analytical modeling often prefers:

append-friendly facts
dimensions for slicing and grouping
denormalized or column-friendly layouts
time as a first-class axis

This makes large scans and aggregations cheaper and easier to express.

So the same business idea may appear differently:

in OLTP: orders, order_items, customers, payments
in analytics: fact_orders, dim_customer, dim_product, dim_time

Neither is "the real one." They are optimized views over the same business world.

This is the key shift:

a data model is partly about truth, but also about access pattern

That matters because the storage engine feels these choices directly:

row-oriented layouts tend to help point reads and localized writes
column-oriented layouts tend to help scans, compression, and vectorized aggregation

That is why this lesson is the right opener for the month. The later lessons on pages, indexes, WAL, MVCC, and query execution are all downstream consequences of what kind of work the data model is trying to support.

Concept 3: System of Record and Derived Analytical Models Often Need Separation

A common mature architecture pattern is:

one system is the source of truth for operational updates
other systems derive analytical or serving-friendly shapes from it

This happens because asking one system to do everything well is expensive.

If you run heavy analytical scans on the transactional database, you may get:

lock or resource contention
cache churn
worse p99 latency for operational traffic
fragile reporting jobs that fight business workflows

If you keep the analytical side too close to the normalized OLTP model, you may get:

expensive joins for simple reports
awkward historical reasoning
poor compression or scan efficiency
duplicated business logic in query code

So separation often looks like:

transactional DB -> CDC / ETL / stream pipeline -> warehouse, lakehouse, or analytical mart

This introduces its own trade-offs:

extra complexity
freshness delay
multiple representations of the same business concept

But it also buys:

workload isolation
better query performance
clearer ownership of operational vs analytical concerns

The right question is not:

should we ever duplicate data?

It is:

where does duplication or transformation create a better workload boundary?

That question will keep showing up throughout the month.

Troubleshooting

Issue: "Our transactional database is getting slower every time product asks for new dashboards."

Why it happens / is confusing: The team assumes the operational schema should also be the reporting schema.

Clarification / Fix: Separate operational and analytical workload needs. The issue may not be SQL skill; it may be that the same storage and model are serving incompatible jobs.

Issue: "Our warehouse has the same business data as production, but queries still feel awkward and slow."

Why it happens / is confusing: Data was copied, but not reshaped for analytical access.

Clarification / Fix: Review whether the warehouse still mirrors OLTP entities too literally. Analytical systems often need facts, dimensions, and explicit time-oriented modeling.

Issue: "Why do we have two or three versions of 'customer' data?"

Why it happens / is confusing: Multiple representations look like duplication or inconsistency at first glance.

Clarification / Fix: Check whether each version exists for a different workload boundary: operational truth, analytical slicing, or serving layer optimization. The key is to document ownership and derivation clearly.

Advanced Connections

Connection 1: Data Models <-> Storage Engine Foundations

The parallel: The next lesson will move below the schema layer into pages, records, and layout. That physical design only makes sense once the workload is clear: small updates and point lookups reward different layouts than wide scans and aggregate-heavy reads.

Real-world case: A row store and a column store are responses to different access patterns, not simply different implementation styles.

Connection 2: Data Models <-> Query Execution and Cost

The parallel: Later in the month, query execution, indexing, and buffer-pool behavior will show how the engine pays for the model you choose.

Real-world case: A normalized transactional schema may be elegant for updates but expensive for large analytical joins, while a denormalized fact/dimension model may reverse that trade-off.

Resources

Optional Deepening Resources

[BOOK] Designing Data-Intensive Applications
- Link: https://dataintensive.net/
- Focus: Use it for the broad conceptual split between transactional and analytical workloads and why storage/processing choices follow from them.
[DOCS] The Data Warehouse Toolkit Overview
- Link: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/
- Focus: Good reference for fact/dimension thinking and why analytical models differ from normalized operational schemas.
[DOCS] DuckDB Documentation
- Link: https://duckdb.org/docs/
- Focus: Useful for seeing modern analytical/columnar assumptions in practice.
[DOCS] PostgreSQL Documentation
- Link: https://www.postgresql.org/docs/current/index.html
- Focus: A practical reference for a row-oriented transactional engine and the workload assumptions it reflects.

Key Insights

The right data model depends on the workload - Transactional and analytical systems ask different questions and reward different shapes.
One business domain often needs multiple representations - Operational truth and analytical truth are usually connected, but not physically identical.
Physical design follows model intent - Later choices about pages, indexes, WAL, and execution only make sense once the workload boundary is explicit.

← Back to Database Engine Internals and Implementation

← Back to Learning Hub