[codex] Add lightweight watchlist trigger evals by dd3ok · Pull Request #25 · dd3ok/WATCHLIST.md

dd3ok · 2026-06-15T00:32:20Z

Summary

Add a lightweight deterministic trigger corpus in evals/trigger_cases.json with 20 balanced trigger/no-trigger cases.
Wire the existing semantic checker to validate trigger corpus size, schema, decision balance, required reasons, reason polarity, and non-empty string IDs.
Keep runtime smoke documentation compact and manual-only by explicitly forbidding transcripts, screenshots, raw logs, and long runtime output.

Runtime weight

No changes under .agents/skills/watchlist-md/.
No runtime bundle files, references, scripts, Python files, transcripts, or smoke logs were added.
Runtime smoke remains repo-only under docs/.
Trigger eval remains repo-only under evals/ and deterministic; it performs no LLM, runtime, browser, GitHub API, or network calls.

Validation

PYTHONDONTWRITEBYTECODE=1 python -m unittest discover -s evals -p 'test_*.py' - 95 tests passed
python evals/check_semantic_cases.py - 30 semantic cases; 20 trigger cases
python evals/check_skill_package.py
python evals/check_policy_markers.py
python evals/check_release_metadata.py
python evals/check_watchlist.py examples/WATCHLIST.example.md --strict-format --strict-safety --require-archive-section
python evals/check_watchlist.py .agents/skills/watchlist-md/assets/WATCHLIST.template.md --strict-format --strict-safety --require-archive-section
python tools/validate_watchlist.py examples/WATCHLIST.example.md --strict-format --strict-safety --require-archive-section
python tools/validate_watchlist.py .agents/skills/watchlist-md/assets/WATCHLIST.template.md --strict-format --strict-safety --require-archive-section
runtime bundle scan: NO_RUNTIME_PYTHON_OR_TOOLING_MATCHES
staged scope scan: NO_STAGED_RUNTIME_OR_ANCHOR_CHANGES

Review follow-up

Added explicit trigger case id validation before using the value as case_id.
Added invalid-id regression coverage for non-string IDs and leading/trailing whitespace.
Runtime-weight review remains unchanged: no .agents/skills/watchlist-md diff and no runtime bundle additions.

gemini-code-assist

Code Review

This pull request introduces a new trigger evaluation corpus (trigger_cases.json) along with validation logic in check_semantic_cases.py and corresponding unit tests in test_check_watchlist.py. Additionally, the manual smoke check documentation in runtime-smoke.md is updated to forbid storing raw logs or transcripts. Feedback on the changes suggests explicitly validating that the id field in each trigger case is a non-empty string before using it, preventing potential type errors or confusing validation messages.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Add lightweight watchlist trigger evals

b4fcb79

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread evals/check_semantic_cases.py Outdated

Validate trigger case ids

41760de

dd3ok marked this pull request as ready for review June 15, 2026 10:36

dd3ok merged commit 41ecad6 into main Jun 15, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add lightweight watchlist trigger evals#25

[codex] Add lightweight watchlist trigger evals#25
dd3ok merged 2 commits into
mainfrom
codex/watchlist-runtime-smoke-trigger-eval

dd3ok commented Jun 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dd3ok commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Runtime weight

Validation

Review follow-up

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dd3ok commented Jun 15, 2026 •

edited

Loading