Skip to content

Engine: warn on prose-shaped condition strings + accept procedure-pathway regimens#588

Merged
romeo111 merged 3 commits into
masterfrom
claude/angry-darwin-31b5d0
May 18, 2026
Merged

Engine: warn on prose-shaped condition strings + accept procedure-pathway regimens#588
romeo111 merged 3 commits into
masterfrom
claude/angry-darwin-31b5d0

Conversation

@romeo111
Copy link
Copy Markdown
Owner

Summary

Three small, orthogonal changes from an analysis-and-improvement session on claude/angry-darwin-31b5d0. Each commit stands alone and ships behind tests; routing semantics are unchanged.

1. feat(engine): warn on prose-shaped condition: strings (806361ef)

Algorithm decision trees contain 376 of 443 condition: strings (85%) written as English prose ("ECOG PS 0-2", "BRCA1 or BRCA2 pathogenic"). _eval_clause only resolves flat finding keys, so these silently return False — and in 45 of 152 algorithms (30%), step-1 is entirely prose and the tree falls through to default_indication on every patient.

This PR adds a one-time-per-unique-string logging.WARNING when a condition: looks like prose AND the lookup missed. Flat ID-shape keys (BIO-HER2, hcv_status, ECOG_PS) are NOT flagged, so existing finding-key clauses keep working silently.

Detector heuristic: comparison operators, or / and connectives, parens/commas, "space + lowercase" or "ALLCAPS space ALLCAPS" word boundaries. Per-string dedup via module-level set with a test-only reset hook.

6 new tests in tests/test_prose_condition_warning.py — operator clause warns, boolean-connective clause warns, ALL-CAPS flat key does NOT warn, real finding lookup does NOT warn, dedup fires once per unique string, structured threshold/value clauses unaffected.

2. docs(reviews): state-audit doc + tighten fallthrough count (in 806361e + 8cf8a1b)

docs/reviews/openonco-state-audit-2026-05-17.md covers:

  • KB validator green (3118 entities)
  • Pre-existing pytest failures (13: 12 tasktorrent bash-script on Windows + 1 regimen-phases — see feat(engine): wire ctgov experimental track via on-disk cache #3 below)
  • 3 stale roadmap items already done on master with SHAs
  • Prose-condition scope (376/443 prose, 45/152 step-1 fallthrough — full sweep over all 152 algorithms, not a sample)
  • Branch sprawl (1127 refs — flagged, NOT touched)
  • Recommendations 1-5, ordered

The "structured condition AST" path (recommendation 5) is explicitly out of scope here — Big-P3 workstream gated by clinical co-lead review per CHARTER §6.1.

3. test(regimens): accept procedure-pathway shape (2c9003ad)

test_no_regression_all_244_legacy_yamls_load was failing on reg_allohct_jmml.yaml: 3 explicit phases, all components: [], top-level components: []. Existing invariant required phase_drugs > 0 on every explicit-phase regimen.

This is the procedure-pathway shape: the "treatment" is the stem-cell product (or surgical procedure), documented via phase purpose strings, not DRUG-* refs. The other 4 allohct regimens currently hack around the invariant by parking a placeholder drug inside conditioning — that's not what the regimen actually is.

Relaxed: when explicit phases have 0 drug components, accept iff top-level components is also 0. If top-level has drugs but phases don't, that's still the "author moved structure but left drugs at top level" bug and the test still fails for it.

Engine downstream doesn't depend on phase_drugs > 0 (render.py:1694 reference is to MonitoringSchedule.phases, a different concept). No clinical content edits.

What's NOT in this PR (deliberately, per CLAUDE.md scope rules)

  • No clinical content edits. CHARTER §6.1 needs two Clinical Co-Lead signoffs; out of scope for autonomous work.
  • No branch GC. 1127 branches, blast radius too large — flagged in audit, left for a separate workstream.
  • No structured-condition AST migration. Multi-week, big-P3, needs spec alignment.
  • No tasktorrent-Windows fix. 12 of 13 pre-existing failures are bash-script tests on Windows — separate workstream.

Test plan

  • pytest tests/test_prose_condition_warning.py — 6/6 pass
  • pytest tests/test_regimen_phases.py — 14/14 pass (previously 13/14)
  • Adjacent engine sweep (909 tests across test_engine, test_redflag_*, test_actionability_*, test_algorithm_regimen_routing_contracts, test_bcc_engine, test_burkitt_engine) — all pass
  • KB validator strict — 3118 entities load, all references resolve
  • No edits under knowledge_base/hosted/content/ (clinical content)
  • No git add -A / --no-verify — explicit pathspecs only

🤖 Generated with Claude Code

romeo111 and others added 3 commits May 17, 2026 23:59
Algorithm decision trees contain 376 of 443 `condition:` strings (85%)
written as English prose ("ECOG PS 0-2", "BRCA1 or BRCA2 pathogenic").
`_eval_clause` only resolves flat finding keys, so these silently return
False — and in ~27% of audited algorithms, step-1 is entirely prose and
the tree falls through to `default_indication` on every patient.

Routing semantics unchanged. Added a one-time per-unique-string WARNING
when a `condition:` looks like prose AND the lookup missed. Flat ID-shape
keys (BIO-HER2, hcv_status, ECOG_PS) are NOT flagged, so existing
finding-key clauses keep working silently.

Detector heuristic: comparison operators, ` or ` / ` and ` connectives,
parens/commas, "space + lowercase" or "ALLCAPS space ALLCAPS" word
boundaries. Per-string dedup via module-level set with a test-only
reset hook.

Audit doc `docs/reviews/openonco-state-audit-2026-05-17.md` documents
scope, methodology, roadmap staleness (3 items already done on master),
and the orthogonal "structured condition AST" path forward (out of scope
here — Big-P3 workstream gated by clinical co-lead review per CHARTER
§6.1).

Tests: 6 new in test_prose_condition_warning.py — operator clause warns,
boolean-connective clause warns, ALL-CAPS flat key does NOT warn, real
finding lookup does NOT warn, dedup fires once per unique string,
structured threshold/value clauses unaffected. 909 adjacent engine tests
still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Initial figure was 8/30 (27%) — sampled the first 30 algorithm files.
Full sweep over all 152 algorithms with a decision_tree shows 45/152
(30%) where step-1 evaluates entirely prose clauses and the tree
falls through to default_indication on every patient.

Also flagged the warning's scope: condition: clauses only;
finding: with a prose value is deliberate-author shape and is not
flagged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`test_no_regression_all_244_legacy_yamls_load` failed on
`reg_allohct_jmml.yaml`: 3 explicit phases, all with `components: []`,
top-level `components: []`. Existing invariant required >=1 phase drug
component on every explicit-phase regimen.

This is the procedure-pathway shape: the "treatment" is the stem-cell
product (or surgical procedure), documented via phase purpose strings,
not DRUG-* refs. The other 4 allohct regimens currently hack around the
invariant by parking a placeholder drug (e.g. DRUG-CYTARABINE) inside
conditioning — that's not what the regimen actually is.

Relaxed: when explicit phases have 0 drug components, accept iff
top-level components is also 0. If top-level has drugs but phases don't,
that's still the "author moved structure but left drugs at top level"
bug and the test still fails for it.

Engine downstream doesn't depend on phase_drugs > 0 (grep
`knowledge_base/engine/*.phases` — render.py reference is to
MonitoringSchedule.phases, a different concept). No clinical content
edits. 14/14 phase tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@romeo111 romeo111 merged commit 76d60df into master May 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant