Use this plan to turn the current export/prep automation into a measurable RL improvement loop for agent behavior.
Current system already has:
- myflow export to Harbor snapshots (
assistant_sft.jsonl,train_events.jsonl,summary.json) - deterministic Harbor split prep (
train/val/test/canary+manifest.json) - infra timer wiring for recurring export/prepare jobs
- Maple telemetry hooks for export visibility
Goal: convert this into a closed loop where training updates are driven by observed failures/regressions and promoted only through hard gates.
Primary outcomes:
- Better action selection in real workflows (fewer wrong tool/actions).
- Lower production regressions (canary deltas trend positive).
- Faster convergence per run (more useful data per training cycle).
- Higher reliability under ambiguous/long-horizon tasks.
- Enforce snapshot integrity checks in Harbor ingest.
- Fail job if
assistant_sft.jsonlis empty or split counts are invalid. - Persist run metadata keyed by snapshot timestamp and git SHA of training config.
Done definition:
- every snapshot has a valid manifest + non-empty train split
- every training run can be traced back to one exact snapshot + config
- Define reward schema from
train_events.jsonl(success, retries, rollback, human override, time-to-fix). - Map each signal to normalized reward components in Harbor.
- Store per-sample reward breakdown for auditability.
Done definition:
- reward function is versioned (
reward_schema_version) - each trained sample has explainable reward components
- Train candidate adapters on latest prepared snapshot.
- Evaluate on fixed holdout + canary split from same manifest.
- Add strict promotion gate: holdout pass + canary pass + no action-collapse.
Done definition:
- promotion is blocked automatically on gate failure
- gate outputs are attached to snapshot and run IDs
- Mine failed canary/production cases into a hardcase set.
- Re-inject hardcases with higher sampling weight in next cycle.
- Track “failure class recurrence” across runs.
Done definition:
- recurring failure classes trend downward across 3+ cycles
canary_reward_delta_meancanary_reward_delta_ci95_low/highaction_error_ratefallback_or_override_ratetime_to_resolution_p50/p95hardcase_recurrence_rate
# 1) Export latest data from myflow to Harbor
cd ~/code/myflow
f harbor-export-data-maple
# 2) Prepare deterministic splits
cd ~/repos/laude-institute/harbor
python3 scripts/prepare_myflow_dataset.py --snapshot latest
# 3) Train/eval candidate in Harbor (task names TBD in harbor)
# 4) Promote only if holdout + canary gates pass- Add Harbor task:
myflow-validate-snapshot(manifest + split sanity checks). - Add Harbor task:
myflow-eval-canary(fixed JSON report schema for promotion gate). - Add Harbor task:
myflow-mine-hardcases(from failed canary/prod traces). - Add one weekly dashboard cut from Maple + Harbor manifests for trend review.