fix(grader): apply classified fallback for after_step constraint#1339
fix(grader): apply classified fallback for after_step constraint#1339kuishou68 wants to merge 1 commit intoaffaan-m:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughFixed the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a bug in
Confidence Score: 5/5Safe to merge — the fix is correct, minimal, and consistent with the existing before_step pattern. The change is a three-line fix that directly mirrors an already-validated pattern in the same function. No logic regressions were found. The only finding is P2 (missing regression test), which does not affect correctness. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_check_temporal_order(step, event, resolved, classified)"] --> B{after_step set?}
B -- No --> E{before_step set?}
B -- Yes --> C["after_events = resolved.get(after_step)"]
C --> D{after_events is None?}
D -- Yes --> D2["after_events = classified.get(after_step, [])"]
D -- No --> F
D2 --> F{not after_events?}
F -- Yes --> G["return 'after_step not yet detected'"]
F -- No --> H["latest_after = max(timestamps)"]
H --> I{event.timestamp <= latest_after?}
I -- Yes --> J["return ordering violation"]
I -- No --> E
E -- No --> K["return None (pass)"]
E -- Yes --> L["before_events = resolved.get(before_step)"]
L --> M{before_events is None?}
M -- Yes --> N["before_events = classified.get(before_step, [])"]
M -- No --> O
N --> O{before_events non-empty?}
O -- No --> K
O -- Yes --> P["earliest_before = min(timestamps)"]
P --> Q{event.timestamp >= earliest_before?}
Q -- Yes --> R["return ordering violation"]
Q -- No --> K
style D2 fill:#90EE90,stroke:#228B22
style G fill:#FFB6C1
Reviews (1): Last reviewed commit: "fix(grader): apply classified fallback f..." | Re-trigger Greptile |
| if step.detector.after_step is not None: | ||
| after_events = resolved.get(step.detector.after_step, []) | ||
| after_events = resolved.get(step.detector.after_step) | ||
| if after_events is None: | ||
| after_events = classified.get(step.detector.after_step, []) | ||
| if not after_events: | ||
| return f"after_step '{step.detector.after_step}' not yet detected" |
There was a problem hiding this comment.
Missing regression test for fixed scenario
No test covers the specific case this PR fixes: an after_step that references a step defined later in spec.steps. Without a test, the bug could silently regress. Consider adding a case to tests/test_grader.py where the classification mock returns events for a forward-referenced step (e.g., step B has after_step: step_C while step_C appears after B in the spec) and asserts that grading still succeeds with the correct temporal order.
Closes #1338
Problem
In
_check_temporal_order()(skills/skill-comply/scripts/grader.py), theafter_stepconstraint checks only theresolveddict (steps already matched in the current grading pass):resolvedis populated sequentially as each step inspec.stepsis processed. If anafter_stepreference points to a step listed later in the spec,resolvedwill not contain it yet, so the check always fails — even when the trace events are correctly ordered temporally.By contrast, the
before_stepconstraint already applies the correct fix by falling back to theclassifieddict (all LLM-classified events):Fix
Apply the same
classifiedfallback toafter_step:Impact
Without this fix, compliance specs where an
after_stepconstraint references a step appearing later inspec.stepswill always produce incorrect grading failures, regardless of the actual temporal ordering in the observation trace.Summary by cubic
Fixes
after_stephandling in_check_temporal_order()so it falls back toclassifiedevents whenresolvedhas none, matching the existingbefore_stepbehavior. This prevents false grading failures when anafter_stepreferences a step that appears later in the spec.Written for commit e6cf69f. Summary will update on new commits.
Summary by CodeRabbit