feat(runner): library-first batch rollout runner (agentix.runner) + CLI + example by Meirtz · Pull Request #60 · Agentix-Project/Agentix

Meirtz · 2026-05-29T17:54:19Z

What

New standalone uv-workspace member plugins/runner exposing agentix.runner — a generic, library-first batch rollout runner.

run_rollouts(...) runs an agent over a dataset of instances, each in its own sandbox: an agent phase (setup → solve) then a fresh scoring sandbox (so scoring starts from a clean task image). It returns one typed Rollout per instance, in input order, bounded by n_concurrent. Per-instance failures are isolated as Rollout.error and never abort the batch.

Built only on the stable surface — provider.session(config) for sandboxes and sandbox.remote(fn, ...) for in-sandbox calls (+ bundle). It carries no benchmark- or agent-specific logic: datasets and agents plug in through two small Protocols.

Dataset — instances(), image(inst), setup(sandbox, inst) -> bool, score(sandbox, inst, patch) -> dict
Agent — solve(sandbox, inst, *, model) -> AgentResult
Provider — anything with session(config) -> async context manager (i.e. a SandboxProvider)

A thin agentix-run CLI wraps the same function. The library is the real interface — an RL/eval loop calls run_rollouts(...) directly.

Also includes: `examples/run-swe-rollouts`

A runnable example that expresses the eval-cc-swe flow through the runner: a SweDataset + ClaudeCodeAgent adapter plus one run_rollouts(...) call replace the hand-written per-instance orchestration (--ground-truth reuses the same scoring path via a GroundTruthAgent). It mirrors eval-cc-swe's exact remote-call wiring, so it builds/runs the same way; eval-cc-swe/runner.py's bespoke loop can later be reduced to this.

Gate

9 unit tests for the runner via an in-process fake provider (no Docker).
ruff check, pyright (0 errors, full include set), pytest — all green.
Wired into [tool.uv.workspace] members and [tool.pyright] include/extraPaths.

Scope note

Deliberately steers clear of the agentix/gateway (#41) and agentix/orchestrator (#2) areas — this is the batch-rollout layer that composes client.remote, complementary to the gateway's session/training bridge.

…hin CLI New uv-workspace member `plugins/runner` exposing `agentix.runner`: - `run_rollouts(...)` / `rollout_one(...)` run an agent over a dataset of instances, each in its own sandbox; the agent phase and the scoring phase each get a fresh sandbox. Built only on the stable surface — `provider.session(config)` for sandboxes and `sandbox.remote(fn, ...)` for in-sandbox calls (+ `bundle`). - Generic `Dataset` / `Agent` / `Provider` Protocols + `AgentResult` / `Rollout` dataclasses; the runner has no benchmark- or agent-specific logic. Per-instance failures surface as `Rollout.error`, never abort the batch; results return in input order, bounded by `n_concurrent`. - Thin `agentix-run` CLI (library is the real interface; RL/eval calls `run_rollouts` directly) that resolves `module:attr` dataset/agent adapters and a provider backend. - 9 unit tests via an in-process fake provider (no docker). ruff + pyright clean; wired into `[tool.uv.workspace]` members and `[tool.pyright]`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`examples/run-swe-rollouts` is the `eval-cc-swe` flow expressed through the reusable runner: a `SweDataset` + `ClaudeCodeAgent` adapter plus one `run_rollouts(...)` call replace the hand-written per-instance orchestration. `--ground-truth` swaps in a `GroundTruthAgent` that submits each row's gold patch, reusing the identical scoring path. Patch extraction goes through `agentix.bash.run`; the real provider call stays host-side via the bridge gateway. Additive (a new standalone example, not a workspace member); ruff-clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Documents the batch rollout runner merged in #60: `run_rollouts(...)`, the `Dataset`/`Agent` adapter Protocols, the `Rollout` result, and the `agentix-run` CLI, with a pointer to the `examples/run-swe-rollouts` worked example. Sits in the How-to group beside integrate-agent / integrate-dataset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz and others added 2 commits May 30, 2026 01:41

Meirtz changed the title ~~feat(runner): library-first batch rollout runner (agentix.runner) + thin CLI~~ feat(runner): library-first batch rollout runner (agentix.runner) + CLI + example May 29, 2026

FatPigeorz merged commit 46cc916 into Agentix-Project:master May 29, 2026
5 checks passed

This was referenced May 29, 2026

docs: add "Run rollouts" how-to for agentix.runner #61

Merged

feat(examples): Agentix TUI — tabbed control room (rollouts + catalog) #63

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runner): library-first batch rollout runner (agentix.runner) + CLI + example#60

feat(runner): library-first batch rollout runner (agentix.runner) + CLI + example#60
FatPigeorz merged 2 commits into
Agentix-Project:masterfrom
Meirtz:feat/agentix-runner

Meirtz commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Meirtz commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Also includes: examples/run-swe-rollouts

Gate

Scope note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Meirtz commented May 29, 2026 •

edited

Loading

Also includes: `examples/run-swe-rollouts`