Day 462: Systems of Record vs Derived Data

The core idea: A system of record is the place where the business is allowed to decide what happened; derived data exists to make that decision searchable, analyzable, or faster to consume without acquiring the right to redefine it.

Today's "Aha!" Moment

In Polyglot Persistence and Data Integration, PayLedger split one workload across PostgreSQL, Elasticsearch, and a warehouse because each store serves a different access pattern. That lesson answered the "why do we have multiple stores?" question. This lesson answers the more dangerous follow-up: when those stores disagree, which one is allowed to settle the argument?

The distinction matters because real incidents rarely begin with a database engine crash. They begin when a support agent sees processing in search while the settlement service already committed settled in PostgreSQL, or when finance notices a warehouse dashboard that still reflects an older fee calculation after a replay. The operational pain is not merely "lag exists." The pain is that teams often have not defined whether the stale copy is an inconvenience, a bug, or a reason to stop a business workflow.

For PayLedger, the recurring rule is simple but non-negotiable: payout state transitions become authoritative only when they commit in PostgreSQL. Elasticsearch may expose those states faster to human operators, and the warehouse may aggregate them better for finance, but neither store gets to invent or override settlement truth. Once that line is explicit, the architecture becomes easier to reason about. Without it, every projection quietly becomes a second decision-maker.

Why This Matters

Production systems accumulate copies of the same facts for good reasons. Search indexes exist because support needs fast filtering. Warehouses exist because analysts need long scans and aggregations. Caches exist because hot reads should not hit the transactional database for every page load. None of those copies are inherently dangerous. They become dangerous when a product flow starts acting on them as if they were all equally authoritative.

PayLedger feels this most sharply during payroll settlement cutoffs. A payroll coordinator approves a batch, support investigates failed transfers, and finance prepares end-of-day reporting from the same underlying payouts. If the UI, background jobs, and reporting stack all assume their local copy is "the truth," disagreements turn into duplicate retries, inconsistent customer messaging, or manual spreadsheet reconciliation. That is exactly how derived data leaks back into the write path and corrupts the system the team thought it was merely observing.

The practical shift is from "where can I read this fact?" to "where is this fact decided, and what are other stores allowed to do with it?" Once that question is answered, freshness budgets, repair workflows, and user-facing disclaimers become engineering details instead of philosophical debates.

Learning Objectives

By the end of this session, you will be able to:

Explain what makes a store authoritative - Distinguish between the system that commits business state and the stores that only project it.
Analyze how derived data is built and repaired - Trace how search, analytics, and cache views inherit lag, replay behavior, and reconciliation rules.
Apply authority boundaries to product workflows - Decide which reads can use projections and which actions must confirm against the system of record.

Core Concepts Explained

Concept 1: A system of record is defined by mutation rights and business accountability

In PayLedger, a payout moves through draft, approved, processing, and settled. Those transitions affect real money movement, audit trails, and customer commitments, so the team cannot let any component update them casually. PostgreSQL is the system of record not because relational databases are morally superior, but because that is where the platform accepts the authoritative write, enforces constraints, and stores the audit trail that other services trust.

That authority has three operational consequences. First, writes that change payout truth must pass through the source system's validation rules, transaction boundary, and concurrency controls. Second, identifiers and versions created there become the canonical handles that every derived store must carry forward. Third, when a disagreement appears, the platform resolves it by consulting the authoritative store rather than averaging across copies or trusting whichever surface updated last.

This is why "system of record" is not just a naming convention. If support changes payout status directly in Elasticsearch to make a dashboard look fresh, Elasticsearch is no longer a projection; it is now a second write authority with no proper transaction model for the business workflow. If analysts patch warehouse rows manually to fix a report without reconciling the source record, the warehouse starts to drift from the business ledger it is supposed to explain.

The easiest mental test is to ask, "If auditors or downstream automation challenge this fact tomorrow, which store is allowed to defend it?" In PayLedger, the answer for payout state, transfer references, and approval history is PostgreSQL. That answer drives design decisions everywhere else.

Concept 2: Derived data is a projection with an explicit freshness and rebuild contract

Once PostgreSQL commits a payout update, PayLedger publishes a durable change stream. Elasticsearch consumes that stream to power operational search. The warehouse consumes it to build daily and monthly finance models. A Redis cache may materialize the most recent payout summaries for dashboard speed. All three are derived data because they exist downstream of the authoritative write and can, in principle, be rebuilt from the source plus retained history.

The mechanism matters more than the label:

authoritative transaction
        |
        v
  source tables + outbox
        |
        v
   CDC / event stream
     /      |       \
    v       v        v
search   warehouse   cache

Each projection chooses a different trade-off. Search favors inverted indexes and near-real-time refresh. The warehouse favors append-heavy ingestion and wide scans. The cache favors latency over durability. What they share is a lack of final authority. Their job is to improve access, not to redefine the state they represent.

That has two implications that teams often skip. First, every derived store needs an explicit freshness promise. Search might lag by seconds, warehouse models by minutes or hours, cache entries until invalidation or TTL expiry. Second, every derived store needs a rebuild story. If an index mapping changes, PayLedger should be able to reindex from the event log or source snapshot. If a warehouse transformation is wrong, finance should rerun the model from durable inputs instead of hand-editing output tables.

This is where the previous lesson's integration architecture becomes concrete. Polyglot persistence tells you why multiple stores exist. The systems-of-record distinction tells you which of those stores are safe to delete and recompute, and which one must never be treated as disposable.

Concept 3: When source and projection disagree, product behavior must follow authority boundaries

The hardest part is not stating the rule. The hardest part is enforcing it in user-facing workflows. Suppose a support agent searches for payout po_98421 and sees status processing in Elasticsearch, but the payout detail page fetched from PostgreSQL shows settled. If the support tool lets the agent retry the transfer directly from the search result, the product has turned a stale projection into an action surface. That is how "read model lag" becomes a duplicate-payment incident.

PayLedger solves this by separating browse paths from decision paths. Search results are good enough to narrow a case quickly, but any irreversible action such as retrying a settlement, refunding a fee, or escalating a compliance hold must confirm against PostgreSQL before execution. Finance dashboards can rely on warehouse models for trends, but end-of-period close procedures run completeness checks against source exports before numbers become official. Derived systems remain useful, but only within the authority envelope they were designed for.

This also changes observability. The team does not just monitor API latency and database CPU. It monitors projection lag, version skew, reconciliation mismatches, and the number of workflows that had to fall back to the source of record because projections were stale. Those signals reveal whether derived data is behaving like a helpful read model or drifting toward an ungoverned second truth.

The next lesson, Storage Engines: Log-Structured vs Page-Oriented, goes down one layer and asks why different databases behave so differently once you assign them authoritative or derived roles. That question matters because engine internals shape write amplification, scan efficiency, compaction, and therefore which system is a good home for record-keeping versus projection-heavy workloads.

Troubleshooting

Issue: Support sees one payout status in search and a different status on the detail page.

Why it happens / is confusing: Elasticsearch is a derived near-real-time projection, not the authoritative write path. Refresh delay, consumer lag, or a temporarily failed projector can leave search behind even when PostgreSQL is already correct.

Clarification / Fix: Treat search as a discovery surface. When a workflow is about to trigger a business action, re-read the payout from the system of record and expose projection lag metrics so operators can tell staleness from source corruption.

Issue: Finance totals change after a warehouse backfill, and stakeholders assume the warehouse is unreliable.

Why it happens / is confusing: Warehouse tables often contain derived models rather than immutable source facts. A corrected transformation, late-arriving event, or replay can legitimately change a report even when the source ledger was always correct.

Clarification / Fix: Separate raw landed facts from transformed reporting tables, document the acceptable revision window, and keep a reproducible rebuild path so changes are explainable instead of mysterious.

Issue: Teams start adding "temporary" direct writes into a cache or search index to reduce user-visible lag.

Why it happens / is confusing: The stale read is visible, while the authority boundary is abstract. Under pressure, engineers optimize for local freshness and quietly create an unsupported second mutation path.

Clarification / Fix: Keep mutation rights exclusive to the system of record. If freshness targets are too loose, improve projection latency, invalidation design, or UI messaging rather than bypassing the authoritative commit path.

Advanced Connections

Connection 1: Polyglot Persistence and Data Integration explains why multiple stores exist; this lesson explains how they stay governable

The previous lesson introduced specialized stores and replayable change propagation. This lesson adds the decision rule that keeps that topology safe: only one store gets to own business truth for a given fact, and every projection must be understood as a consumer of that truth rather than a peer.

Connection 2: CQRS and materialized views formalize the same split between command authority and read optimization

Command Query Responsibility Segregation is often over-marketed, but its useful core is exactly the distinction in this lesson. Commands mutate the authoritative model; queries often hit denormalized or asynchronously updated views. Systems like event-sourced ledgers, Elasticsearch-backed support consoles, and warehouse-fed BI stacks all depend on that separation being explicit rather than implied.

Resources

[BOOK] Designing Data-Intensive Applications
- Focus: Chapters on data models, derived data, and stream-based materialization provide the conceptual frame for source systems, projections, and rebuildability.
[DOC] Debezium Outbox Event Router
- Focus: A concrete pattern for emitting committed source-of-record changes into downstream pipelines without reintroducing dual writes.
[DOC] Near real-time search
- Focus: Why search projections are usually fresh within seconds rather than transactionally current at the instant of source commit.
[ARTICLE] CQRS
- Focus: The architectural split between write authority and read-optimized models, including when the complexity is justified.

Key Insights

Authority is a property of who may decide and mutate a fact - A system of record is the store that can defend the business state under audit, retries, and conflicting copies.
Derived data exists to optimize access, not to redefine truth - Search indexes, caches, and warehouses are valuable precisely because they can lag, replay, and rebuild without becoming the canonical write path.
User workflows must respect the same boundary the storage design does - Irreversible actions should confirm against the authoritative source even when browsing and analytics run on projections.

← Back to Data Architecture and Platforms

← Back to Learning Hub