Parameter Sweeps - Systematic Exploration of Model Space

LESSON

System Dynamics and Causal Modeling

005 30 min intermediate

Day 373: Parameter Sweeps - Systematic Exploration of Model Space

The core idea: A parameter sweep turns one model run into a decision map by rerunning the model across structured combinations of uncertain inputs and policy levers, so you can see where a recommendation is stable, where it fails, and which assumptions deserve closer scrutiny.


Today's "Aha!" Moment

In 04.md, Harbor City built a hybrid model for Seawall District that links climate scenarios, street-level flooding, and evacuation behavior. That model can now answer a narrow question: what happens for one chosen seawall height, one pump-availability assumption, one storm track, and one household departure pattern? The city council's real question is harder. It is not "what happens in that one case?" but "does our preferred resilience package still work when the uncertain parts move within plausible bounds?"

A parameter sweep answers that question by treating the model like an experiment platform. Harbor City varies the parameters that matter to the decision: storm surge level, drainage blockage, pump availability during the event, household departure delay, and the design choice itself. Each parameter combination becomes one run. The outputs are not generic charts; they are the things the city actually cares about, such as whether West Tunnel stays passable, how many households lose evacuation access, how long the district substation is offline, and how expensive the chosen package is.

The misconception to discard is that a sweep is just "run the simulation many times." Volume alone does not create insight. If the ranges are arbitrary, if the team sweeps parameters because they are easy to change rather than because they are uncertain and decision-relevant, or if it reports only the best average outcome, the exercise produces noise dressed up as rigor. A good sweep exposes structure in the model space: the threshold where a cheap pump-heavy plan stops being safe, or the interaction where late evacuation matters only when tunnel flooding starts earlier than expected.

Why This Matters

Harbor City is comparing three flood-resilience packages for Seawall District. On the nominal scenario, the least expensive option looks attractive: a moderate seawall, upgraded pumps, and a commitment to trigger evacuations early. But nominal scenarios are exactly where bad decisions hide. If pump availability falls during the storm, or if residents begin leaving later than the emergency office assumed, the district can cross a sharp boundary where the tram corridor floods before shelter traffic clears. The same plan that looked efficient on paper becomes the one that strands hospital staff.

Without a parameter sweep, the city is choosing from a single screenshot of a dynamic system. That is equivalent to approving a production rollout because one benchmark run looked good on one dataset and one seed. The model may be sophisticated, but the decision is still resting on a narrow slice of possibility space.

A parameter sweep replaces that screenshot with a map. It does not remove uncertainty, and it does not prove the model is correct. What it does is organize uncertainty so the city can distinguish robust options from brittle ones. Instead of saying "option B wins," Harbor City can say "option B wins only when pump uptime stays above 0.85 and evacuation begins within 50 minutes; outside that region, option C is safer." That is a production-grade statement because it makes the recommendation conditional, testable, and governable.

Learning Objectives

By the end of this session, you will be able to:

  1. Explain why parameter sweeps follow naturally from hybrid modeling - Describe why one model run cannot justify a policy choice in a system with uncertain inputs and interacting mechanisms.
  2. Design a useful sweep instead of an arbitrary one - Choose parameters, ranges, sampling strategy, and outputs that match the decision you are trying to defend.
  3. Interpret sweep results for robustness and next-step analysis - Read boundaries, interactions, and failure regions, then identify which assumptions should be calibrated with data next.

Core Concepts Explained

Concept 1: A parameter sweep turns the model into an experiment table

Once Harbor City has a hybrid model, every single run is just one point in a much larger space. One run might assume a 2.6-meter seawall, 90% pump availability, a 2.1-meter surge, and a 40-minute evacuation delay. Another run keeps the same wall but lowers pump availability and delays departures. A parameter sweep is the disciplined process of generating many such points so the city can observe how outcomes change across the model space instead of pretending the first point was representative.

That framing matters because Seawall District has both decision levers and uncertainties. Decision levers are things the city can choose, such as wall height or pump redundancy. Uncertainties are things the city cannot directly control, such as the realized storm surge, how much debris blocks drains, or how long residents take to act on a warning. Mixing those together carelessly makes the sweep hard to interpret, but separating them too rigidly is also a mistake. The real policy question is usually about their interaction: which design choice still holds up when the uncertain world pushes against it?

You can think of the sweep as building an experiment table:

levers:        wall_height, pump_redundancy
uncertainties: surge_level, drain_blockage, departure_delay
run i:         {2.6m, N+1 pumps, 2.1m surge, 30% blockage, 45 min} -> outcomes

The important word is "systematic." Harbor City is not tweaking one knob at random until a chart looks interesting. It is choosing a finite set of parameter combinations that make comparisons meaningful and reproducible. This is also why one-at-a-time exploration is often misleading. If the city changes only pump availability while holding everything else fixed, it may conclude that the plan is robust. But late departure and early tunnel flooding might only become dangerous together. A one-at-a-time sweep misses precisely the interaction that produces the operational failure.

The trade-off is coverage versus cost. Sweeping more parameters at finer resolution reveals more of the model surface, but the run count grows quickly. The city should therefore sweep the parameters with both high uncertainty and high leverage on the decision. Parameters that barely change any decision-relevant outcome do not belong in the first pass.

Concept 2: Good sweeps are designed around plausible ranges, repeated runs, and explicit outputs

Harbor City's next task is not to launch thousands of runs blindly. It is to decide what counts as a plausible parameter range and what evidence justifies that range. Pump availability might vary between 0.75 and 1.00 because backup power and maintenance records set a realistic lower bound. Household departure delay might range from 20 to 90 minutes because recent evacuation drills and incident logs show that some neighborhoods act late. These ranges are part of the model argument. If they are arbitrary, the sweep is arbitrary too.

The sampling design should match the dimensionality of the problem. If Harbor City is exploring three or four parameters at a few discrete levels, a full factorial grid is often fine because every combination is interpretable. If the city starts sweeping many continuous parameters, the factorial explosion becomes expensive fast, and a design such as Latin hypercube sampling or a staged coarse-to-fine sweep becomes more practical. The point is not to worship one method. The point is to spend compute budget where it reveals decision boundaries instead of wasting it on redundant runs.

There is another complication: the evacuation model is stochastic. Two runs with the same parameter values can produce slightly different clearance times because agent decisions and route conflicts include randomness. That means a sweep point is rarely one run. It is usually a bundle of runs across different seeds, followed by summary statistics such as median stranded households, worst decile clearance time, or probability of losing hospital access.

for config in experiment_design(parameters, sampler="latin_hypercube", samples=200):
    replications = [run_harbor_city_model(config, seed=s) for s in range(20)]
    summary = summarize(replications, metrics=["median_clearance", "p90_flood_depth"])
    record(config, summary)

Notice what this pseudocode makes explicit. The sweep is not only about the inputs; it is also about the outputs and how they are aggregated. Harbor City should record the metrics that correspond to the actual decision: pass/fail constraints for emergency access, cost, outage duration, and evacuation completion. It should also record model version, seed policy, and parameter definitions so the experiment is reproducible. In production modeling, irreproducible sweeps are little better than anecdotes.

Concept 3: The value of a sweep is in the boundaries it reveals, not the single winner it names

Suppose Harbor City plots the sweep results for two candidate packages. The lower seawall plus upgraded pumps is cheapest and looks good on average. But the heatmap shows a narrow safe region: it performs acceptably only when drain blockage stays low and departure delay stays under 50 minutes. The taller seawall option costs more, yet its safe region is much wider. It keeps West Tunnel open across far more of the plausible storm and behavior envelope. A single nominal run might have made the cheaper package look superior; the sweep reveals that it is only superior in a fragile corner of the space.

This is why parameter sweeps are fundamentally about robustness, not just optimization. Harbor City is not looking for the parameter combination that makes one policy look best. It is asking how often each policy remains acceptable when uncertainties vary. In many production decisions, "acceptable across a wide region" is more valuable than "best in one narrow region." That is especially true when failure is cliff-like rather than gradual. If one additional drain blockage level suddenly closes the last evacuation route, the city needs to know where that cliff is.

The sweep also tells Harbor City what to do next. If the recommendation flips mainly when pump availability drops or when departure delays lengthen, those are the parameters that deserve better evidence. That is the bridge to 06.md. Parameter sweeps tell you where the model is fragile; calibration tells you which parts of that parameter space are actually supported by observed data. A sweep without calibration can reveal sensitivity, but it cannot tell you which sensitive regions are plausible enough to drive policy.

There is a final trade-off here. Coarse sweeps are cheap and useful for finding the broad shape of the landscape, but they can miss narrow cliffs. Extremely fine sweeps are expensive and can create false precision if the parameter ranges themselves are uncertain. The practical workflow is usually iterative: start broad, identify candidate boundaries, and then refine where the decision truly changes.

Troubleshooting

Issue: The sweep explodes into an unmanageable number of runs before the team has learned anything.

Why it happens / is confusing: Full factorial designs grow multiplicatively. Adding just a few more levels or parameters can turn a tractable experiment into a compute sink.

Clarification / Fix: Separate high-leverage uncertainties from low-impact constants, begin with a coarse design, and refine only near the boundaries where the policy recommendation changes. Use denser sampling only when the first pass shows that the decision is sensitive in that region.

Issue: The same parameter point gives noticeably different answers on repeated runs.

Why it happens / is confusing: Part of the model is stochastic, but the team is treating each sweep point as deterministic and reporting one seed as if it were the truth.

Clarification / Fix: Run multiple replications per parameter point and summarize the distribution, not just one outcome. Record medians, percentiles, and failure probabilities so the sweep reflects both parameter uncertainty and simulation variability.

Issue: The sweep produces an "optimal" plan that wins only in a razor-thin strip of the chart.

Why it happens / is confusing: The scoring function compresses everything into one average value, which rewards narrow peaks and hides brittle behavior.

Clarification / Fix: Evaluate safe operating regions, failure-rate constraints, or worst-case regret alongside average performance. If a policy wins only inside a tiny plausible band, treat it as fragile even if its mean score is high.

Advanced Connections

Connection 1: Hybrid Models <-> Parameter Sweeps

The hybrid model in 04.md gives Harbor City a coherent mechanism map across climate forcing, street flooding, and evacuation behavior. Parameter sweeps are what turn that map into a decision tool. They stress the coupled model across the uncertain inputs and interface assumptions that matter for the city's recommendation, revealing whether the hybrid structure is decision-stable or only looks convincing at one nominal setting.

Connection 2: Parameter Sweeps <-> Calibration

Parameter sweeps tell Harbor City which uncertainties and design levers move the outcome enough to change the policy choice. Calibration in 06.md then narrows those high-leverage parameters using observed flood depths, pump outages, and evacuation timing data. The sweep answers "where is the model fragile?" Calibration answers "which parts of that fragile space are realistic?"

Resources

Key Insights

  1. A parameter sweep maps behavior, not just outcomes - Its main product is the boundary structure of the model space, including the regions where a decision stays safe or suddenly fails.
  2. Interactions are usually the real reason to sweep - One-at-a-time changes can miss the combinations of uncertainty that actually create operational breakdowns.
  3. Sweeps prepare the model for better evidence - They show which parameters deserve calibration, better measurement, or tighter governance before the model is used to justify a costly decision.
PREVIOUS Hybrid Models - Combining Multiple Modeling Approaches NEXT Calibration - Fitting Models to Real Data

← Back to System Dynamics and Causal Modeling

← Back to Learning Hub