Harden + test the verification core and live-dashboard control surface by flo7up · Pull Request #1 · flo7up/swe-rl-forge-lite

flo7up · 2026-06-13T09:18:22Z

@-

Fixes the two must-fix issues surfaced in review plus four supporting correctness/security gaps. All paths covered by new tests (65 pass). - task_builder: write gold.patch as LF bytes (write_gold_patch) instead of text mode. On Windows write_text rewrote \n -> \r\n, making `git apply --check` reject valid patches and silently marking real fixes as non-applying. Proven end-to-end: CRLF patch is rejected, LF applies. - dashboard_live: restrict state-changing control endpoints to loopback Host + Origin (defeats CSRF and DNS-rebinding) and stop reflecting `Access-Control-Allow-Origin: *`; reflect a local Origin only. The dev workflow (5173 -> 8765) still works. - docker_runner / reward.py: run the untrusted test command with `--network none` and bounded memory/cpu/pids plus no-new-privileges. - docker_runner / package_task: neutralize an upstream `.dockerignore` during the build and strip it from the packaged snapshot so the verifier and reward image test the same tree the agent sees. - reward_runner: bound the reward subprocess with a task-derived timeout (was timeout=None, could hang forever). - task_builder: document that the "deterministic rerun" attests post-patch idempotence, not pre-patch baseline flakiness. - README: scope the "no arbitrary shell" claim to the host, fix the Docker install-order description, correct the reward JSON example, document the run-time network isolation and the real default explore query. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

verify_task() and run_tests_in_docker() are the project's declared "only source of truth" but were effectively untested (every prior test stubbed verify_task; the only docker test asserted a Dockerfile string). A regression there silently flips reward labels with no failing test. - tests/test_verify_task.py: inject fakes for git/patch/docker collaborators and assert the derived verification booleans + recommended status across the branch matrix — usable happy path, missing base commit, patch-does-not- apply, infrastructure failure (exit 127) zeroing test signals, non- deterministic rerun -> needs_review, and docker build failure -> invalid. - tests/test_docker_runner.py: drive run_tests_in_docker with a fake subprocess.run to cover missing-docker, missing-repo, build failure, build timeout, successful run, failing run (product signal), and run timeout; also asserts the run command carries the network/no-new-privileges isolation flags and force-removes the container on timeout. 78 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

t and others added 2 commits June 13, 2026 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden + test the verification core and live-dashboard control surface#1

Harden + test the verification core and live-dashboard control surface#1
flo7up wants to merge 2 commits into
mainfrom
fix/verifier-trust-batch

flo7up commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

flo7up commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant