Skip to content

fix(grader): apply classified fallback for after_step constraint#1339

Open
kuishou68 wants to merge 1 commit intoaffaan-m:mainfrom
kuishou68:fix/issue-1338-after-step-fallback
Open

fix(grader): apply classified fallback for after_step constraint#1339
kuishou68 wants to merge 1 commit intoaffaan-m:mainfrom
kuishou68:fix/issue-1338-after-step-fallback

Conversation

@kuishou68
Copy link
Copy Markdown

@kuishou68 kuishou68 commented Apr 10, 2026

Closes #1338

Problem

In _check_temporal_order() (skills/skill-comply/scripts/grader.py), the after_step constraint checks only the resolved dict (steps already matched in the current grading pass):

# BEFORE (buggy)
after_events = resolved.get(step.detector.after_step, [])
if not after_events:
    return f"after_step '{step.detector.after_step}' not yet detected"

resolved is populated sequentially as each step in spec.steps is processed. If an after_step reference points to a step listed later in the spec, resolved will not contain it yet, so the check always fails — even when the trace events are correctly ordered temporally.

By contrast, the before_step constraint already applies the correct fix by falling back to the classified dict (all LLM-classified events):

before_events = resolved.get(step.detector.before_step)
if before_events is None:
    before_events = classified.get(step.detector.before_step, [])

Fix

Apply the same classified fallback to after_step:

# AFTER (fixed)
after_events = resolved.get(step.detector.after_step)
if after_events is None:
    after_events = classified.get(step.detector.after_step, [])
if not after_events:
    return f"after_step '{step.detector.after_step}' not yet detected"

Impact

Without this fix, compliance specs where an after_step constraint references a step appearing later in spec.steps will always produce incorrect grading failures, regardless of the actual temporal ordering in the observation trace.


Summary by cubic

Fixes after_step handling in _check_temporal_order() so it falls back to classified events when resolved has none, matching the existing before_step behavior. This prevents false grading failures when an after_step references a step that appears later in the spec.

Written for commit e6cf69f. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes
    • Improved temporal constraint verification by implementing a fallback mechanism for event ordering validation. The system can now proceed with temporal checks in more scenarios, particularly when initial resolution methods don't provide necessary timestamp data, ensuring more comprehensive constraint validation.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 28616f68-d0a8-412e-9753-935d10478c4f

📥 Commits

Reviewing files that changed from the base of the PR and between 181bc26 and e6cf69f.

📒 Files selected for processing (1)
  • skills/skill-comply/scripts/grader.py

📝 Walkthrough

Walkthrough

Fixed the _check_temporal_order function in the grader to properly handle after_step constraints by falling back to the classified dict when the step isn't yet in the resolved dict, matching the existing behavior for before_step constraints.

Changes

Cohort / File(s) Summary
Temporal Order Constraint Fix
skills/skill-comply/scripts/grader.py
Modified _check_temporal_order to retrieve after_events from classified dict when resolved.get(step.detector.after_step) returns None, enabling temporal checks to use LLM-classified event timestamps even when deterministic resolution hasn't populated prior-step matches.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A temporal tangle has been untied,
Where after_step checks had nowhere to hide,
Now classified comes forth when resolved falls short,
Sequential wisdom of a consistent sort! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(grader): apply classified fallback for after_step constraint' accurately summarizes the main change: applying a fallback mechanism to the after_step constraint checking logic in the grader.
Linked Issues check ✅ Passed The code changes fully address the requirements from issue #1338 by removing the default empty list fallback, attempting to derive after_events from classified when resolved returns None, and preserving error behavior when no events are found.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the after_step constraint handling in _check_temporal_order() as specified in issue #1338, with no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR fixes a bug in _check_temporal_order() where the after_step constraint only checked the sequentially-built resolved dict, causing guaranteed false failures when an after_step reference points to a step defined later in the spec. The fix mirrors the existing before_step pattern by falling back to the classified dict (all LLM-classified events) when the referenced step is not yet in resolved.

  • No regression test is added for the forward-reference after_step scenario that triggered the bug.

Confidence Score: 5/5

Safe to merge — the fix is correct, minimal, and consistent with the existing before_step pattern.

The change is a three-line fix that directly mirrors an already-validated pattern in the same function. No logic regressions were found. The only finding is P2 (missing regression test), which does not affect correctness.

No files require special attention.

Important Files Changed

Filename Overview
skills/skill-comply/scripts/grader.py Applies classified fallback to after_step check, matching the existing before_step pattern — fix is correct and minimal; no test added for the forward-reference scenario.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_check_temporal_order(step, event, resolved, classified)"] --> B{after_step set?}
    B -- No --> E{before_step set?}
    B -- Yes --> C["after_events = resolved.get(after_step)"]
    C --> D{after_events is None?}
    D -- Yes --> D2["after_events = classified.get(after_step, [])"]
    D -- No --> F
    D2 --> F{not after_events?}
    F -- Yes --> G["return 'after_step not yet detected'"]
    F -- No --> H["latest_after = max(timestamps)"]
    H --> I{event.timestamp <= latest_after?}
    I -- Yes --> J["return ordering violation"]
    I -- No --> E
    E -- No --> K["return None (pass)"]
    E -- Yes --> L["before_events = resolved.get(before_step)"]
    L --> M{before_events is None?}
    M -- Yes --> N["before_events = classified.get(before_step, [])"]
    M -- No --> O
    N --> O{before_events non-empty?}
    O -- No --> K
    O -- Yes --> P["earliest_before = min(timestamps)"]
    P --> Q{event.timestamp >= earliest_before?}
    Q -- Yes --> R["return ordering violation"]
    Q -- No --> K

    style D2 fill:#90EE90,stroke:#228B22
    style G fill:#FFB6C1
Loading

Reviews (1): Last reviewed commit: "fix(grader): apply classified fallback f..." | Re-trigger Greptile

Comment on lines 35 to 40
if step.detector.after_step is not None:
after_events = resolved.get(step.detector.after_step, [])
after_events = resolved.get(step.detector.after_step)
if after_events is None:
after_events = classified.get(step.detector.after_step, [])
if not after_events:
return f"after_step '{step.detector.after_step}' not yet detected"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing regression test for fixed scenario

No test covers the specific case this PR fixes: an after_step that references a step defined later in spec.steps. Without a test, the bug could silently regress. Consider adding a case to tests/test_grader.py where the classification mock returns events for a forward-referenced step (e.g., step B has after_step: step_C while step_C appears after B in the spec) and asserts that grading still succeeds with the correct temporal order.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(grader): after_step constraint always fails when referencing a later spec step

1 participant