Local AI Research¶

Use this page when you want to train local models from IINTS-AF data instead of only running hand-written algorithms.

The SDK now separates three AI roles:

Role	Best model type	Trained on	Purpose
explanation assistant	local LLM such as Ministral via Ollama	reports and certified payloads	explain, summarize, review
glucose predictor	time-series model	real multimodal T1D datasets	forecast future glucose
controller policy	compact numeric policy model	supervised safe-action labels from simulated runs	propose insulin actions in research simulations

Do not confuse them: - Ministral is useful for language and review. - A controller needs physiological inputs and numeric validation. - The deterministic safety supervisor still wraps experimental policy outputs.

For the dedicated glucose-model workflow, start with IINTS Glucose Forecast Model. That page covers the new iints research glucose-model ... commands, long-training setup, and Hugging Face-safe export.

Recommended Data Strategy¶

Start by generating the dataset acquisition plan:

iints data research-plan --output-dir data_packs/research_dataset_plan

The full source map lives in Diabetes Research Datasets.

Dataset family	Best current role	Why
AZT1D	primary multimodal predictor training	real AID-system data with detailed bolus variables
HUPA-UCM	multimodal predictor training and subgroup analysis	CGM, insulin, carbs, steps, calories, heart rate, sleep
OhioT1DM	external held-out benchmark	widely used T1D forecasting benchmark with CGM, insulin, meals, exercise, and life events
DCLP3/iDCL and Loop	closed-loop / AID external validation	useful for benchmark-style comparisons after schema conversion
T1DEXI / T1DexiP	exercise-aware stress testing	captures exercise context that ordinary meal-only scenarios miss
MetaboNet / Glucose-ML	dataset-selection references	useful for cross-dataset benchmark design and source discovery
Jetson / simulator teacher runs	controller-policy imitation data	gives exact safe-action labels under known scenarios

The scientific split is intentional: - real datasets teach glucose dynamics - teacher-labeled simulator runs teach experimental action policies

That is stronger than pretending one small dataset can safely solve both problems.

1. Build A Better Real-Data Predictor Blend¶

Prepare the source datasets first:

iints research prepare-azt1d
iints research prepare-hupa
iints research prepare-ohio

If you have the full request-gated OhioT1DM XML release locally, prepare it as train/test/all files and keep the raw data outside git:

export OHIO_T1DM_ROOT="/path/to/OhioT1DM-volledig"

PYTHONPATH=src python3 research/prepare_ohio_t1dm.py \
  --input "$OHIO_T1DM_ROOT" \
  --splits train \
  --output data_packs/public/ohio_t1dm_full/processed/ohio_train.csv \
  --report data_packs/public/ohio_t1dm_full/processed/ohio_train_quality_report.json

PYTHONPATH=src python3 research/prepare_ohio_t1dm.py \
  --input "$OHIO_T1DM_ROOT" \
  --splits test \
  --output data_packs/public/ohio_t1dm_full/processed/ohio_test.csv \
  --report data_packs/public/ohio_t1dm_full/processed/ohio_test_quality_report.json

Recommended leakage-safe Ohio workflow:

PYTHONPATH=src python3 research/train_predictor.py \
  --data data_packs/public/ohio_t1dm_full/processed/ohio_train.csv \
  --config research/configs/predictor_ohio_dual_guard_v2.yaml \
  --out models/ohio_t1dm_full

PYTHONPATH=src python3 research/evaluate_predictor.py \
  --data data_packs/public/ohio_t1dm_full/processed/ohio_test.csv \
  --model models/ohio_t1dm_full/predictor.pt \
  --reference-data data_packs/public/ohio_t1dm_full/processed/ohio_train.csv \
  --subgroup-column subject_id \
  --subgroup-column dataset_year \
  --out results/ohio_t1dm_full_eval.json

Use ohio_train.csv for fitting, ohio_test.csv for held-out evaluation, and ohio_all.csv only for descriptive cohort summaries or final re-training after the benchmark protocol is frozen.

Then blend the real training sources while preserving source-aware subject IDs:

iints research blend-datasets \
  --source azt1d=data_packs/public/azt1d/processed/azt1d_merged.csv \
  --source hupa=data_packs/public/hupa_ucm/processed/hupa_ucm_merged.csv \
  --output data_packs/processed/predictor_blend.csv \
  --manifest data_packs/processed/predictor_blend_manifest.json

Use OhioT1DM separately as an external benchmark rather than silently mixing every dataset together:

PYTHONPATH=src python3 research/train_predictor.py \
  --data data_packs/processed/predictor_blend.csv \
  --config research/configs/predictor_multimodal_dual_guard.yaml \
  --out models/predictor_blend

PYTHONPATH=src python3 research/evaluate_predictor.py \
  --data data_packs/public/ohio_t1dm/processed/ohio_t1dm_merged.csv \
  --model models/predictor_blend/predictor.pt \
  --external-data ohio=data_packs/public/ohio_t1dm/processed/ohio_t1dm_merged.csv \
  --reference-data data_packs/processed/predictor_blend.csv \
  --out results/predictor_blend_external_eval.json

2. Forecast One Run Before Training A Controller¶

Glucose prediction now has a dedicated evidence command. Use it on a run folder or CSV before you start making controller claims:

iints research forecast-run \
  --input results/jetson_research_day \
  --output-dir results/jetson_research_day_forecast

With a trained predictor checkpoint:

iints research forecast-run \
  --input results/jetson_research_day \
  --predictor models/predictor_blend/predictor.pt \
  --output-dir results/jetson_research_day_forecast_ai

For hidden-biology stress testing, you can explicitly simulate an insulin-antibody-like delay in the forecast evidence:

iints research forecast-run \
  --input results/jetson_research_day \
  --hidden-biology insulin-antibody \
  --output-dir results/jetson_research_day_forecast_antibody

This does not diagnose antibody problems and it is not a normal assumption for most people with type 1 diabetes. It is a research-only stress test for the idea that some relevant biological variables are hidden from the algorithm.

This writes:

results/jetson_research_day_forecast/
  forecast_predictions.csv
  forecast_report.json
  forecast_report.md
  forecast_manifest.json

The forecast bundle compares:

a last-value baseline
a transparent physiology-aware baseline using glucose trend, IOB, COB, ISF, ICR, activity and stress features
the neural predictor when a checkpoint is provided
optional hidden-biology feature overrides such as insulin-antibody-like binding/release

The key rule is simple: the AI predictor must beat transparent baselines and must be checked for missed hypoglycemia, uncertainty, calibration and risk-level behavior before it is used in a research controller experiment.

3. Build Controller Training Data¶

The Jetson research runner now writes:

research/
  predictor_training.csv
  controller_teacher_dataset.csv
  training_manifest.json

For a true 24-hour research acquisition:

iints jetson endurance start \
  --algo algorithms/example_algorithm.py \
  --duration 1d \
  --profile normal \
  --wall-clock \
  --output-dir results/jetson_research_day

You can combine several safe supervised runs into one controller dataset:

iints research build-control-dataset \
  --run day1=results/jetson_research_day \
  --run stress=results/jetson_stress_day \
  --output data_packs/processed/controller_teacher_dataset.csv \
  --manifest data_packs/processed/controller_teacher_manifest.json

4. Train A Local Controller¶

The first controller learner is intentionally simple and auditable:

iints research train-controller \
  --data data_packs/processed/controller_teacher_dataset.csv \
  --output models/controller_imitation.json \
  --metrics-output models/controller_imitation_metrics.json

Use it in Python research code with:

from iints.core.algorithms.imitation_controller import ExperimentalImitationController

algorithm = ExperimentalImitationController(
    settings={"model_path": "models/controller_imitation.json"}
)

This controller is not presented as clinically validated. It is a baseline that proves the full local-AI research loop:

collect or simulate data
build a supervised dataset
train a local policy
run it behind the supervisor
compare it against rule-based baselines

When you need a stronger local model after that auditable baseline, train the PyTorch policy:

iints research train-neural-controller \
  --data data_packs/processed/controller_teacher_dataset.csv \
  --output models/controller_neural.pt \
  --metrics-output models/controller_neural_metrics.json

Then require held-out closed-loop evidence before treating it as a serious research candidate:

iints research evaluate-controller \
  --model models/controller_neural.pt \
  --model-kind neural \
  --output-dir results/controller_neural_eval

That report compares the learned controller with ClinicalBaselineAlgorithm on unseen presets such as hypo_prone_night, hyper_challenge, pizza_paradox, and midnight_crash.

5. One-Step Jetson Research Finalization¶

After a completed endurance run you can close the whole post-run loop in one command:

iints jetson endurance finalize-research \
  --output-dir results/jetson_research_day

It will:

train the auditable linear imitation baseline
train the stronger PyTorch controller
train the glucose predictor when the exported dataset is large enough
run held-out closed-loop evaluation against the clinical baseline
write research/RESEARCH_PIPELINE_REPORT.md

If you want the same work to happen automatically at the end of the endurance command, add:

--finalize-research

to iints jetson endurance start.

6. Multi-Run Local AI Lab¶

For actual AI research, one run should not be the whole story. The SDK now has a higher-level lab command that combines multiple completed Jetson or simulator bundles into one training workspace:

iints research train-local-ai \
  --run day1=results/jetson_research_day \
  --run day2=results/jetson_research_day_2 \
  --output-dir results/local_ai_lab

It writes:

results/local_ai_lab/
  datasets/
    predictor_training.csv
    predictor_dataset_manifest.json
    controller_teacher_dataset.csv
    controller_dataset_manifest.json
    LOCAL_AI_DATASET_CARD.json
  models/
    linear_controller.json
    neural_controller.pt
    predictor/
  evaluation/
    CONTROL_EVALUATION_REPORT.md
  LOCAL_AI_RESEARCH_SUMMARY.json
  LOCAL_AI_RESEARCH_REPORT.md

This is the clearest workflow when you want to use long Jetson runs as training data:

collect one or more 24h/7d research bundles
merge predictor rows and controller-teacher rows with source labels
train an auditable linear local controller
optionally train the PyTorch neural controller
optionally train the glucose predictor
evaluate learned controllers on held-out scenarios

For a first smoke test without heavy PyTorch/predictor work:

iints research train-local-ai \
  --run day1=results/jetson_research_day \
  --output-dir results/local_ai_lab_smoke \
  --skip-predictor \
  --skip-neural \
  --skip-evaluation

The important separation is:

predictor_training.csv trains a glucose forecasting model
controller_teacher_dataset.csv trains a research controller policy
LOCAL_AI_DATASET_CARD.json records lineage, row counts, sources, and research-only limits

The older command name iints research local-ai-lab remains available, but train-local-ai is the clearest command for new users.

7. Publish The Research Evidence¶

After training, attach the run folders and local AI lab to one evidence bundle:

iints evidence build \
  --run day1=results/jetson_research_day \
  --run day2=results/jetson_research_day_2 \
  --local-ai-dir results/local_ai_lab \
  --output-dir results/local_ai_evidence

This writes:

README.md with run-level glucose metrics
MODEL_CARD.md with research-only use, non-use, and safety gate status
evidence_summary.json for machine-readable review
run_index.csv for quick comparison tables

8. What Good Research Looks Like¶

Before making any strong claim, require all of the following:

subject/source-aware splits
external-data evaluation for predictors
calibration and hypo-detection analysis
held-out scenario evaluation for controllers
comparison against deterministic baselines
safety-supervisor intervention counts
exact run manifests and dataset manifests

The SDK now gives you both the model layer and the first evidence layer:

auditable linear imitation baseline
stronger PyTorch neural controller
held-out closed-loop controller evaluation
automatic post-run Jetson research finalization

The next scientific step after that is not "make the model bigger"; it is stricter promotion gates, more held-out patients, and external real-data validation for every predictor claim.