Scientific Workflow¶

Use this workflow when you want a real benchmark study instead of a single illustrative run.

Before this page: Getting Started if you have not completed a single run yet.

After this page: Study Analysis for interpreting the generated study bundle.

Goal¶

The SDK now supports a reproducible study engine built around: - fixed profile sets - fixed scenario families - explicit study arms - shared seeds across all comparisons - automatic summaries, comparisons, and poster assets

That means we can compare a candidate algorithm against baseline controllers, safety conditions, and corrupted-data ablations without hand-stitching folders afterward.

The Main Commands¶

The scientific workflow now centers on five commands: - iints study-protocol - iints run-study - iints analyze - iints compare-study - iints poster-study

Recommended Flow¶

1. Write The Protocol Bundle¶

Start by freezing the benchmark design:

iints study-protocol \
  --preset default \
  --profile-set clinic_safe_core \
  --output-dir results/study_protocol

This writes: - STUDY_PROTOCOL.md - study_design.json - study_matrix.csv - algorithms.json - study_experiment.yaml

The protocol bundle records: - the research question - the hypotheses - the profile set - the baseline registry - the study arms - the scenario matrix - the seed policy - the corruption plan - the recommended follow-up commands

2. Run The Generic Study Engine¶

Use run-study for the reusable benchmark path:

iints run-study \
  --algo algorithms/example_algorithm.py \
  --preset default \
  --profile-set clinic_safe_core \
  --seeds 1,2,3,4,5 \
  --output-dir results/study_bundle

If you want a single reproducible config file, point run-study at the generated experiment YAML:

iints run-study --experiment results/study_protocol/study_experiment.yaml

This automatically creates: - protocol/ - scenarios/ - study_clean/ - study_corrupted/ - study_supervisor_off/ - comparisons/

Each run is nested under:

<arm_dir>/<algorithm_id>/<profile_id>/<scenario_slug>_seed_<seed>/

and carries explicit metadata for: - study_preset - study_arm - condition_group - algorithm_id - algorithm_role - profile_id - scenario_slug - seed - supervisor_enabled - corruption_modes

3. Re-Analyze A Study Folder On Demand¶

If you already have a bundle or want to re-run the analysis with new options:

iints analyze results/study_bundle/study_clean \
  --output-json results/study_bundle/study_clean/study_summary.json \
  --output-markdown results/study_bundle/study_clean/study_summary.md \
  --output-csv results/study_bundle/study_clean/evidence_table.csv \
  --output-evidence-markdown results/study_bundle/study_clean/evidence_table.md

4. Compare Two Study Arms¶

For example, compare clean certified data against corrupted uncertified data:

iints compare-study \
  results/study_bundle/study_clean \
  results/study_bundle/study_corrupted \
  --output-json results/study_bundle/comparisons/clean_vs_corrupted.json \
  --output-markdown results/study_bundle/comparisons/clean_vs_corrupted.md

5. Generate A Poster Summary¶

iints poster-study \
  results/study_bundle/study_clean/study_summary.json \
  --output-path results/study_bundle/study_clean/study_poster.png

When the richer study fields are present, the poster includes: - a baseline-vs-candidate comparison panel - a safety outcomes panel - a profile heatmap panel

What `study-protocol` Encodes¶

The protocol bundle now acts as the authoritative benchmark design writer.

It defines: - profile set metadata - candidate and baseline algorithms - study arms - scenario families - metrics - seed policy - corruption operators - reproducibility checklist

By default, the clinic_safe_core profile set contains: - clinic_safe_baseline - clinic_safe_stress_meal - clinic_safe_hypo_prone - clinic_safe_hyper_challenge - clinic_safe_pizza - clinic_safe_midnight

By default, the baseline registry includes: - Clinical Baseline - PID Controller - Standard Pump - Correction Bolus

You can disable those defaults or add more comparison algorithms:

iints study-protocol \
  --output-dir results/custom_protocol \
  --no-include-default-baselines \
  --extra-algorithms "My Published Baseline,My Legacy Controller"

The generated study_experiment.yaml is designed to be edited by hand. A minimal example looks like this:

experiment:
  name: meal_stress_benchmark
  preset: default
  profile_set: clinic_safe_core
  seeds: [1, 2, 3, 4, 5]
  time_step: 5
  include_default_baselines: true
study:
  scenarios:
    - baseline_day
    - meal_challenge
algorithm:
  candidate: algorithms/example_algorithm.py
  extra_algorithms:
    - My Legacy Controller
paths:
  output_dir: results/study_bundle
  carelink_metrics: null
  reference_csv: null

What `analyze` Adds¶

The study summary remains backward compatible, but now also adds: - by_algorithm - by_profile - by_arm - by_scenario - safety_summary - pairwise_baseline_deltas

If the run outputs contain prediction and uncertainty columns, the summary also adds: - calibration_summary - uncertainty_summary

That uncertainty summary now also includes an uncertainty_vs_error block so you can see whether larger predicted uncertainty actually lines up with larger forecast error.

That means the generated study JSON can support: - cohort-level overview - subgroup analysis - candidate-vs-baseline deltas - safety-first reporting - uncertainty-aware benchmarking

Optional External Plausibility Check¶

If you have a CareLink workbench or reference metrics export, you can still compare simulated metrics against a real-world plausibility reference:

iints analyze results/study_bundle/study_clean \
  --output-json results/study_bundle/study_clean/study_summary.json \
  --carelink-metrics results/personal_carelink/carelink_metrics.json

This is treated as an external plausibility check, not as a clinical efficacy claim.

Controlled Corruption Modes¶

iints data corrupt-for-study supports: - timestamp_shift - missing_block - duplicate_rows - glucose_spikes - drop_meal_annotations - unit_scale_error

The study protocol also records those corruption operators in the bundle so the ablation logic stays explicit.

What Makes This Scientific¶

A strong SDK study now includes: - a predefined protocol with explicit hypotheses - a fixed study matrix - repeated seeds across conditions - multiple patient profiles - candidate-vs-baseline comparisons - supervisor-on vs supervisor-off comparisons - certified vs uncertified comparisons - descriptive statistics with confidence intervals - safety summaries, not only best-case metrics - optional calibration and uncertainty reporting - optional external plausibility checks

Fastest Public Path¶

If you want the shortest end-to-end public workflow:

iints run-study \
  --algo algorithms/example_algorithm.py \
  --output-dir results/study_bundle

Then inspect: - results/study_bundle/protocol/STUDY_PROTOCOL.md - results/study_bundle/study_clean/study_summary.json - results/study_bundle/study_clean/study_poster.png - results/study_bundle/comparisons/clean_vs_corrupted.json

That gives you one deterministic package for: - protocol review - benchmark evidence - safety comparison - poster figures

Where To Go Next¶

If you want to...	Continue with
interpret study outputs	Study Analysis
certify data before running studies	MDMP Quickstart
explain source evidence	Evidence Base
run studies on a Raspberry Pi	Raspberry Pi Digital Patient
inspect all study commands	Command Reference