LESSON
Day 273: Data Models for Transactional and Analytical Systems
A data model is not just how data looks. It is a bet about what work matters: tiny consistent updates, or large historical reads and aggregates.
Today's "Aha!" Moment
The insight: Transactional systems and analytical systems are usually asking different questions of the same business reality, so they often need different data models even when they describe the same domain.
Why this matters: Teams often treat "the database schema" as if there should be one natural representation for all workloads. In practice:
- a transactional system wants fast point reads and writes, tight invariants, and predictable latency
- an analytical system wants scans, joins, aggregations, and historical slices across large volumes of data
Those pressures push the model in different directions.
The universal pattern:
- operational workflows prefer entities and invariant-protecting updates
- analytical workflows prefer facts, dimensions, denormalized access paths, or append-friendly history
- the same business concept may be represented differently in each system because the work being optimized is different
Concrete anchor: An ecommerce platform stores an orders table in the transactional system because it needs to create, update, cancel, and validate orders in real time. The analytics team, however, wants revenue by day, region, campaign, and product category across two years. For that, a fact table plus dimensions or another analytical model often fits better than the original OLTP shape.
How to recognize when this applies:
- One workload is dominated by single-row or small-transaction updates.
- Another workload is dominated by scans, aggregations, reporting, or ML feature extraction.
- The same schema seems elegant for one team and painfully inefficient for another.
Common misconceptions:
- [INCORRECT] "If the schema represents the business entities correctly, it should work equally well for every workload."
- [INCORRECT] "Analytics is just running bigger queries against the transactional database."
- [CORRECT] The truth: Different workloads reward different physical layouts, access patterns, and levels of normalization or denormalization.
Real-world examples:
- Order service: Needs strong correctness and fast writes for one customer or order at a time.
- Revenue dashboard: Needs to scan many rows efficiently and aggregate across long time ranges.
Why This Matters
The problem: Many data-platform issues start with modeling confusion. Teams try to force one representation to serve incompatible goals, and the system pushes back through slow queries, awkward pipelines, brittle ETL, or transactional databases overloaded by analytical workloads.
Before:
- One schema is treated as universally correct.
- Transactional systems receive ad hoc analytical queries that destroy latency.
- Analytical platforms inherit OLTP structures that are correct but painful to query at scale.
After:
- Workload type is treated as part of the data-model decision.
- The platform distinguishes system-of-record shape from analytical read shape.
- Storage-engine and execution choices become easier to reason about because the workload intent is explicit.
Real-world impact: This framing prevents expensive misfits later. It is the reason row stores, column stores, star schemas, append-only logs, and materialized views all exist: not because one is universally better, but because the work differs.
Learning Objectives
By the end of this session, you will be able to:
- Explain why transactional and analytical workloads pull data models in different directions - Understand how workload shape drives representation.
- Describe the main modeling differences between OLTP-style and OLAP-style systems - Reason about entities, facts, dimensions, normalization, denormalization, and history.
- Evaluate when one representation should feed another - Recognize why systems of record and analytical stores are often separated and linked by derived pipelines.
Core Concepts Explained
Concept 1: Transactional and Analytical Systems Optimize Different Questions
A transactional system is usually concerned with:
- creating or updating one business object at a time
- enforcing invariants
- keeping latency low under concurrent writes
- answering small, specific queries quickly
Typical examples:
- create order
- reserve inventory
- update user email
- fetch account balance
An analytical system is usually concerned with:
- reading lots of records together
- reconstructing trends over time
- aggregating across many entities
- joining multiple domains for reporting, exploration, or modeling
Typical examples:
- revenue by region over 12 months
- conversion by campaign and device
- churn rate by cohort
- anomaly detection across billions of events
So even if both systems describe "orders," the underlying question is different:
- OLTP asks: what is the current valid state of this order?
- OLAP asks: what patterns emerge across many orders over time?
That difference is what starts to separate the models.
Concept 2: The Same Domain Often Gets Two Different Shapes
Transactional modeling often prefers:
- normalized entities
- explicit relations
- smaller updates
- current-state focus
This keeps writes targeted and helps protect invariants without duplicating too much mutable data.
Analytical modeling often prefers:
- append-friendly facts
- dimensions for slicing and grouping
- denormalized or column-friendly layouts
- time as a first-class axis
This makes large scans and aggregations cheaper and easier to express.
So the same business idea may appear differently:
- in OLTP:
orders,order_items,customers,payments - in analytics:
fact_orders,dim_customer,dim_product,dim_time
Neither is "the real one." They are optimized views over the same business world.
This is the key shift:
- a data model is partly about truth, but also about access pattern
That matters because the storage engine feels these choices directly:
- row-oriented layouts tend to help point reads and localized writes
- column-oriented layouts tend to help scans, compression, and vectorized aggregation
That is why this lesson is the right opener for the month. The later lessons on pages, indexes, WAL, MVCC, and query execution are all downstream consequences of what kind of work the data model is trying to support.
Concept 3: System of Record and Derived Analytical Models Often Need Separation
A common mature architecture pattern is:
- one system is the source of truth for operational updates
- other systems derive analytical or serving-friendly shapes from it
This happens because asking one system to do everything well is expensive.
If you run heavy analytical scans on the transactional database, you may get:
- lock or resource contention
- cache churn
- worse p99 latency for operational traffic
- fragile reporting jobs that fight business workflows
If you keep the analytical side too close to the normalized OLTP model, you may get:
- expensive joins for simple reports
- awkward historical reasoning
- poor compression or scan efficiency
- duplicated business logic in query code
So separation often looks like:
- transactional DB -> CDC / ETL / stream pipeline -> warehouse, lakehouse, or analytical mart
This introduces its own trade-offs:
- extra complexity
- freshness delay
- multiple representations of the same business concept
But it also buys:
- workload isolation
- better query performance
- clearer ownership of operational vs analytical concerns
The right question is not:
- should we ever duplicate data?
It is:
- where does duplication or transformation create a better workload boundary?
That question will keep showing up throughout the month.
Troubleshooting
Issue: "Our transactional database is getting slower every time product asks for new dashboards."
Why it happens / is confusing: The team assumes the operational schema should also be the reporting schema.
Clarification / Fix: Separate operational and analytical workload needs. The issue may not be SQL skill; it may be that the same storage and model are serving incompatible jobs.
Issue: "Our warehouse has the same business data as production, but queries still feel awkward and slow."
Why it happens / is confusing: Data was copied, but not reshaped for analytical access.
Clarification / Fix: Review whether the warehouse still mirrors OLTP entities too literally. Analytical systems often need facts, dimensions, and explicit time-oriented modeling.
Issue: "Why do we have two or three versions of 'customer' data?"
Why it happens / is confusing: Multiple representations look like duplication or inconsistency at first glance.
Clarification / Fix: Check whether each version exists for a different workload boundary: operational truth, analytical slicing, or serving layer optimization. The key is to document ownership and derivation clearly.
Advanced Connections
Connection 1: Data Models <-> Storage Engine Foundations
The parallel: The next lesson will move below the schema layer into pages, records, and layout. That physical design only makes sense once the workload is clear: small updates and point lookups reward different layouts than wide scans and aggregate-heavy reads.
Real-world case: A row store and a column store are responses to different access patterns, not simply different implementation styles.
Connection 2: Data Models <-> Query Execution and Cost
The parallel: Later in the month, query execution, indexing, and buffer-pool behavior will show how the engine pays for the model you choose.
Real-world case: A normalized transactional schema may be elegant for updates but expensive for large analytical joins, while a denormalized fact/dimension model may reverse that trade-off.
Resources
Optional Deepening Resources
- [BOOK] Designing Data-Intensive Applications
- Link: https://dataintensive.net/
- Focus: Use it for the broad conceptual split between transactional and analytical workloads and why storage/processing choices follow from them.
- [DOCS] The Data Warehouse Toolkit Overview
- Link: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/
- Focus: Good reference for fact/dimension thinking and why analytical models differ from normalized operational schemas.
- [DOCS] DuckDB Documentation
- Link: https://duckdb.org/docs/
- Focus: Useful for seeing modern analytical/columnar assumptions in practice.
- [DOCS] PostgreSQL Documentation
- Link: https://www.postgresql.org/docs/current/index.html
- Focus: A practical reference for a row-oriented transactional engine and the workload assumptions it reflects.
Key Insights
- The right data model depends on the workload - Transactional and analytical systems ask different questions and reward different shapes.
- One business domain often needs multiple representations - Operational truth and analytical truth are usually connected, but not physically identical.
- Physical design follows model intent - Later choices about pages, indexes, WAL, and execution only make sense once the workload boundary is explicit.