LESSON

013 30 min intermediate

Day 381: Documentation & Sharing - Making Assumptions Auditable

The core idea: A model becomes auditable only when its assumptions, data lineage, validation evidence, and usage boundaries travel together as one reviewable package, so another person can reconstruct both the result and the point where the model stops being safe to trust.

Today's "Aha!" Moment

In 12.md, Harbor Point Securities finally had a validation story for its resilience-bond model. The desk could show how the model behaved on calm sessions, ETF rebalance days, and genuine liquidity shocks. That was enough to make the model credible in principle. The next question from the risk committee was harder: "If we approve this model for live quote adjustments, what exactly are we approving?"

Harbor Point's first answer was not good enough. The lead analyst had a slide deck, a notebook, and a spreadsheet with a few manual overrides for stress labels. The valuation layer was linked to one municipal data snapshot, the liquidity layer had quietly picked up a newer vendor file, and nobody besides the analyst could explain why one May stress replay produced narrower uncertainty bands than the report shown a week later. The model was sophisticated, but the reasoning around it was not auditable.

That is the shift in this lesson. Documentation is not a post-project summary for people who missed the meeting. In production, documentation is part of the control system around a model. It tells reviewers which assumptions matter, which evidence supports them, which decisions are allowed to depend on the model, and which signals should trigger downgrade or re-review. If those links are missing, "sharing the model" usually means sharing conclusions without sharing the conditions that made those conclusions valid.

The common misconception is that auditable documentation is mostly governance theater. In practice, it is what keeps a useful model from becoming an institutional rumor. When assumptions are explicit and reproducible, disagreement becomes inspectable. When they are implicit, the same model is reused in contexts it was never tested for, and the failure shows up later as a trading loss, a confused client memo, or a governance fire drill.

Why This Matters

Harbor Point does not have a single audience for its resilience-bond model. Traders use it to adjust quote width. Risk managers use it to cap inventory. Strategy analysts use it to explain market conditions to clients who may move millions of dollars based on the firm's commentary. Those uses have different failure costs, but they all depend on the same underlying assumptions about liquidity regimes, ETF outflows, dealer balance-sheet constraints, and the reliability of the market data feeding the model.

When those assumptions live only in one analyst's head or in scattered notebooks, every review becomes fragile. A validator may approve the model for one use case while the desk quietly applies it to another. A later replay may fail because the original data snapshot was overwritten. A stress label may be revised without anyone realizing that the validation report is no longer tied to the code and parameters that generated it. What looks like "bad documentation" is really a broken evidence chain.

An auditable package changes the operating posture. Instead of asking who remembers how the model worked last quarter, Harbor Point can inspect a release bundle and answer four concrete questions: What assumptions were made? What evidence supported them? What actions were approved? What should happen when the assumptions stop holding? That extra structure adds friction upfront, but it is the friction that prevents silent scope creep and makes later policy changes defensible.

Learning Objectives

By the end of this session, you will be able to:

Explain what makes modeling assumptions auditable - Distinguish a readable narrative from a package that another reviewer can challenge and reproduce.
Design a shareable evidence chain - Connect assumptions, data versions, validation outputs, and usage boundaries into one reviewable release artifact.
Evaluate documentation trade-offs in production - Decide which documentation is mandatory for safe reuse and which detail can remain supplemental.

Core Concepts Explained

Concept 1: Auditability begins with an assumption register tied to decisions

Harbor Point's resilience-bond model has several moving parts: a fair-value layer for slow credit repricing, a market-state layer for live liquidity conditions, and a stress layer for ETF redemptions and dealer balance-sheet saturation. If the desk documents those layers only in prose, reviewers still cannot see which assumptions actually matter operationally. The first auditable artifact is therefore an assumption register tied to decisions, not a narrative summary.

For Harbor Point, that register might look like this:

assumption                     affects decision                  evidence / failure signal
---------------------------    ------------------------------    --------------------------------------------
ETF outflow elasticity         quote widening and size limits    validated on rebalance days; revisit if
                                                             redemption sensitivity doubles
dealer inventory saturation    inventory caps during stress      replayed against 2024-2025 shock sessions;
                                                             downgrade if exits exceed tolerance
rating-watch shock decay       client scenario commentary        checked against post-watch spread paths;
                                                             manual review if confidence is low

This changes the review from "Here is our model" to "Here are the assumptions that authorize particular actions." The mechanism matters. Each assumption should point to a data source or feature, a validation slice that exercised it, and the decision boundary it influences. If the ETF elasticity estimate changes, Harbor Point immediately knows that quote rules and intraday size limits must be reconsidered. If the rating-watch decay assumption weakens, the client commentary workflow may need a human sign-off again even if trading rules remain unchanged.

The production benefit is traceability. A risk manager can challenge one assumption without reopening the entire model. A new analyst can see which assumptions are empirical estimates, which are business choices, and which are conservative guardrails added because the cost of being wrong is high. That separation is crucial because not every assumption should be debated the same way.

The trade-off is more authoring work. Someone has to maintain the register when features, data vendors, or operating rules change. But that work is cheaper than rediscovering, in the middle of a volatile session, that the desk has been relying on an assumption nobody can now locate or defend.

Concept 2: Sharing must be reproducible, not merely readable

Once Harbor Point knows which assumptions matter, it still has to share them in a form that survives time and personnel changes. A polished PDF is readable, but it is not reproducible. An auditable release has to let another reviewer recreate the same outputs from the same inputs, or at least understand exactly why a recreation would differ.

That means packaging the model as a release bundle rather than as a loose collection of artifacts. The bundle should include the code commit, environment or dependency lock, frozen data snapshot, feature-generation configuration, validation report, and the approved operating boundaries. If one of those pieces is missing, a later reviewer can end up arguing about the model when the real disagreement is about which data file or manual label set was used.

Harbor Point could represent the release bundle with a simple manifest like this:

model_release: 2026-03-31
code_commit: 5f3e1c2
market_data_snapshot: muni-liquidity-2026-03-28.parquet
stress_label_version: 4
validated_for:
  - quote adjustments
  - inventory limit recommendations
advisory_only_for:
  - client commentary during active dislocation
fallback_trigger:
  - three stress sessions outside predicted coverage band

The mechanism underneath the manifest is versioned lineage. Vendor data updates must be pinned. Hand-applied overrides must be logged as inputs, not remembered as tribal knowledge. Generated charts in the review packet should be traceable back to a known configuration. Harbor Point does not need perfect reproducibility in the philosophical sense; it needs practical reproducibility good enough that a validator, trader, or auditor can answer, "What exactly produced this recommendation?"

The trade-off is operational overhead. Reproducible bundles take discipline, storage, and release hygiene. But that cost buys independent challenge. It also prevents a common failure mode where the model appears to drift mysteriously, when in reality the team has been comparing outputs from different snapshots and unlabeled overrides.

Concept 3: Documentation is the interface between model evidence and operating policy

The final step is where most teams under-document: the handoff from evidence to action. Harbor Point's validators may agree that the resilience-bond model is sound on the historical regimes tested in 12.md. That still does not tell traders what they are allowed to do at 11:17 a.m. when spreads gap out, ETF outflows accelerate, and the model's uncertainty band widens. Documentation becomes auditable only when it states the model's operating policy in explicit terms.

For Harbor Point, the release packet should answer questions such as: Which desks may use the model directly? Which outputs are advisory only? Which metrics trigger downgrade to manual limits? Who signs off when a stress label definition changes? Those are not "extra governance notes." They are the interface between the model and the business process it is modifying.

This is why the documentation package should include decision tables, escalation rules, and change history, not just technical explanation. A useful policy section might say that the liquidity layer can automatically recommend quote widening only when the current regime classifier is confident and recent interval coverage is healthy. It might also say that inventory caps can tighten automatically, but client-facing scenario commentary requires human approval during active dislocations. Once those rules are written down, model usage stops being a matter of folklore.

This lesson also prepares the ground for 14.md. Policy design under uncertainty depends on being able to change a rule deliberately when the environment shifts. That is only possible if the current rule is already tied to named assumptions, named evidence, and named triggers for revision. Adaptive control without auditable documentation is just improvisation with better vocabulary.

The trade-off is social as much as technical. Explicit policy documentation makes uncertainty visible, which can feel uncomfortable to teams that want the model to appear decisive. But clarity about limits is what allows fast action during stress: operators know what is authorized, what needs review, and what evidence must change before the policy changes too.

Troubleshooting

Issue: Two reviewers rerun the "same" Harbor Point release and get different stress results.

Why it happens / is confusing: One replay used a newer municipal data vendor snapshot or a different set of manual stress labels, but those inputs were never pinned in the shared package.

Clarification / Fix: Treat data snapshots, overrides, and label definitions as first-class release artifacts. If they are not versioned and referenced in the manifest, the result is not auditable.

Issue: The documentation is thorough, but traders still do not know which model outputs are safe to use live.

Why it happens / is confusing: The package explains the mechanics but never maps outputs to authorized decisions, downgrade triggers, and human-review requirements.

Clarification / Fix: Add an operating-policy section that states approved uses, advisory-only uses, escalation paths, and fallback conditions in explicit language.

Issue: Documentation quality collapses after the first launch.

Why it happens / is confusing: The team treats documentation as a one-time report instead of a required part of each model release and revalidation cycle.

Clarification / Fix: Make the assumption register, manifest, validation summary, and change log required release gates. If any piece is stale, the release should fail review rather than accumulating undocumented drift.

Advanced Connections

Connection 1: Testing & Validation ↔ Documentation & Sharing

12.md established what Harbor Point had to test before trusting the model. This lesson makes that evidence durable. A validation result only becomes institutionally useful when the challenge set, metrics, and acceptance rules are packaged so another reviewer can inspect the same evidence later and understand which decision rights it justified.

Connection 2: Documentation & Sharing ↔ Policy Design Under Uncertainty

14.md will focus on changing rules as new evidence arrives. Harbor Point can only do that safely if the current quote and inventory policies are already anchored to auditable assumptions. Otherwise every policy revision starts from guesswork about what the model was doing before.

Resources

Optional Deepening Resources

[DOC] Federal Reserve SR 11-7: Guidance on Model Risk Management
- Focus: Model inventory, assumption documentation, independent validation, and change control for models that influence real financial decisions.
[PAPER] Model Cards for Model Reporting
- Focus: A structured way to document intended use, limitations, evaluation context, and caveats so others can review model claims intelligently.
[DOC] NIST AI Risk Management Framework 1.0
- Focus: Governance patterns for documenting context, risks, and lifecycle controls around systems that operate under uncertainty.

Key Insights

Auditable assumptions are decision-linked assumptions - The important assumptions are the ones that authorize actions, not every sentence in the project wiki.
Readable documentation is not enough - If another reviewer cannot trace outputs back to versioned data, code, and operating boundaries, the model is still effectively opaque.
Documentation is part of control - The point of sharing assumptions is not only communication; it is making policy changes, overrides, and downgrades defensible when the world changes.

← Back to System Dynamics and Causal Modeling

← Back to Learning Hub