Skip to content

feat(runner): library-first batch rollout runner (agentix.runner) + CLI + example#60

Merged
FatPigeorz merged 2 commits into
Agentix-Project:masterfrom
Meirtz:feat/agentix-runner
May 29, 2026
Merged

feat(runner): library-first batch rollout runner (agentix.runner) + CLI + example#60
FatPigeorz merged 2 commits into
Agentix-Project:masterfrom
Meirtz:feat/agentix-runner

Conversation

@Meirtz
Copy link
Copy Markdown
Collaborator

@Meirtz Meirtz commented May 29, 2026

What

New standalone uv-workspace member plugins/runner exposing agentix.runner — a generic, library-first batch rollout runner.

run_rollouts(...) runs an agent over a dataset of instances, each in its own sandbox: an agent phase (setup → solve) then a fresh scoring sandbox (so scoring starts from a clean task image). It returns one typed Rollout per instance, in input order, bounded by n_concurrent. Per-instance failures are isolated as Rollout.error and never abort the batch.

Built only on the stable surface — provider.session(config) for sandboxes and sandbox.remote(fn, ...) for in-sandbox calls (+ bundle). It carries no benchmark- or agent-specific logic: datasets and agents plug in through two small Protocols.

  • Datasetinstances(), image(inst), setup(sandbox, inst) -> bool, score(sandbox, inst, patch) -> dict
  • Agentsolve(sandbox, inst, *, model) -> AgentResult
  • Provider — anything with session(config) -> async context manager (i.e. a SandboxProvider)

A thin agentix-run CLI wraps the same function. The library is the real interface — an RL/eval loop calls run_rollouts(...) directly.

Also includes: examples/run-swe-rollouts

A runnable example that expresses the eval-cc-swe flow through the runner: a SweDataset + ClaudeCodeAgent adapter plus one run_rollouts(...) call replace the hand-written per-instance orchestration (--ground-truth reuses the same scoring path via a GroundTruthAgent). It mirrors eval-cc-swe's exact remote-call wiring, so it builds/runs the same way; eval-cc-swe/runner.py's bespoke loop can later be reduced to this.

Gate

  • 9 unit tests for the runner via an in-process fake provider (no Docker).
  • ruff check, pyright (0 errors, full include set), pytest — all green.
  • Wired into [tool.uv.workspace] members and [tool.pyright] include/extraPaths.

Scope note

Deliberately steers clear of the agentix/gateway (#41) and agentix/orchestrator (#2) areas — this is the batch-rollout layer that composes client.remote, complementary to the gateway's session/training bridge.

Meirtz and others added 2 commits May 30, 2026 01:41
…hin CLI

New uv-workspace member `plugins/runner` exposing `agentix.runner`:

- `run_rollouts(...)` / `rollout_one(...)` run an agent over a dataset of
  instances, each in its own sandbox; the agent phase and the scoring phase
  each get a fresh sandbox. Built only on the stable surface —
  `provider.session(config)` for sandboxes and `sandbox.remote(fn, ...)`
  for in-sandbox calls (+ `bundle`).
- Generic `Dataset` / `Agent` / `Provider` Protocols + `AgentResult` /
  `Rollout` dataclasses; the runner has no benchmark- or agent-specific
  logic. Per-instance failures surface as `Rollout.error`, never abort the
  batch; results return in input order, bounded by `n_concurrent`.
- Thin `agentix-run` CLI (library is the real interface; RL/eval calls
  `run_rollouts` directly) that resolves `module:attr` dataset/agent
  adapters and a provider backend.
- 9 unit tests via an in-process fake provider (no docker). ruff + pyright
  clean; wired into `[tool.uv.workspace]` members and `[tool.pyright]`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`examples/run-swe-rollouts` is the `eval-cc-swe` flow expressed through the
reusable runner: a `SweDataset` + `ClaudeCodeAgent` adapter plus one
`run_rollouts(...)` call replace the hand-written per-instance
orchestration. `--ground-truth` swaps in a `GroundTruthAgent` that submits
each row's gold patch, reusing the identical scoring path.

Patch extraction goes through `agentix.bash.run`; the real provider call
stays host-side via the bridge gateway. Additive (a new standalone example,
not a workspace member); ruff-clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Meirtz Meirtz changed the title feat(runner): library-first batch rollout runner (agentix.runner) + thin CLI feat(runner): library-first batch rollout runner (agentix.runner) + CLI + example May 29, 2026
@FatPigeorz FatPigeorz merged commit 46cc916 into Agentix-Project:master May 29, 2026
5 checks passed
Meirtz added a commit that referenced this pull request May 30, 2026
Documents the batch rollout runner merged in #60: `run_rollouts(...)`, the
`Dataset`/`Agent` adapter Protocols, the `Rollout` result, and the
`agentix-run` CLI, with a pointer to the `examples/run-swe-rollouts` worked
example. Sits in the How-to group beside integrate-agent / integrate-dataset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants