Skip to content

refactor(mini-swe-agent)#39

Open
FatPigeorz wants to merge 1 commit into
masterfrom
refactor/mini-swe-agent-harbor
Open

refactor(mini-swe-agent)#39
FatPigeorz wants to merge 1 commit into
masterfrom
refactor/mini-swe-agent-harbor

Conversation

@FatPigeorz
Copy link
Copy Markdown
Collaborator

Summary

Task 5 of the 7-task batch. Expands the mini-swe-agent integration from a 25-line passthrough to a runner that captures the agent's native v2 trajectory, converts it into a structured Trajectory (the same ATIF v1.2 shape harbor uses in MiniSweAgent.populate_context_post_run), and aggregates token usage + cost for downstream eval / RL consumers.

New return shape

result = await client.remote(
    mini_swe.run,
    task=task,
    workdir=workdir,
    agent=agent,
    trajectory_path="/tmp/mini-swe-agent.trajectory.json",
)
# result: MiniSweAgentResult(
#     exit_status=..., submission=..., workdir=...,
#     raw_trajectory={...},                       # mini-swe-agent v2 native
#     trajectory=Trajectory(...),                 # structured ATIF-v1.2
#     usage={"n_input_tokens": ..., "n_output_tokens": ..., "cost_usd": ...},
# )

Module layout

agentix.agents.mini_swe_agent/
├── trajectory.py   Trajectory + Step + ToolCall + Observation + Metrics + FinalMetrics
│                   + from_mini_swe_agent(raw, session_id=...)
│                   + aggregate_usage(raw)
└── runner.py       run(task, workdir, agent, trajectory_path=None, session_id=None)
                    -> MiniSweAgentResult

The structured trajectory rides back to the host inside client.remote(...)'s pickled return value — no shared filesystem assumed. Tool calls are lifted into structured ToolCalls, tool results attach as observations on the preceding agent step, and cost is apportioned by completion-token share (matching harbor's converter).

Example update: examples/run-mini-swe-agent/main.py prints mini_usage and mini_trajectory_steps lines alongside the existing mini_exit_status / mini_submission.

Test plan

  • 8 new tests in plugins/agents/mini-swe-agent/tests/test_mini_swe_agent.py cover the structured result shape, trajectory load + parse, inline trajectory passthrough, exception propagation, aggregate-vs-final-metrics consistency, dict / json round trip, and the string-arguments fallback for tool calls.
  • Full repo suite: 184 passing.
  • pyright clean.
  • ruff clean.

Made with Cursor

Expands the mini-swe-agent integration from a 25-line passthrough to
a runner that captures the agent's native v2 trajectory, converts it
into a structured `Trajectory` (the same ATIF v1.2 shape harbor uses
in `MiniSweAgent.populate_context_post_run`), and aggregates token
usage + cost for downstream eval / RL consumers.

New modules:

* `agentix.agents.mini_swe_agent.trajectory` —
  `Trajectory` / `Step` / `ToolCall` / `Observation` /
  `Metrics` / `FinalMetrics` / `AgentInfo` dataclasses; a
  `from_mini_swe_agent(raw, session_id=...)` converter that handles
  system / user / assistant / tool roles, lifts assistant
  `tool_calls` into structured `ToolCall`s, attaches tool results as
  observations on the preceding agent step, and apportions cost by
  completion-token share.
  Plus a cheap `aggregate_usage(...)` for callers that only need
  totals (matches harbor's `populate_context_post_run` shape:
  `n_input_tokens`, `n_output_tokens`, `n_cache_tokens`,
  `cost_usd`).

* `agentix.agents.mini_swe_agent.runner` — `run(task, workdir,
  agent, trajectory_path=None, session_id=None)` now returns a
  `MiniSweAgentResult(exit_status, submission, workdir,
  raw_trajectory, trajectory, usage)`. When `trajectory_path` points
  at the file mini-swe-agent's CLI flag `--output` writes, the
  structured trajectory rides back inside `client.remote(...)`'s
  pickled return value — no shared filesystem assumed.

Example update: `examples/run-mini-swe-agent/main.py` prints
`mini_usage` (input/output/cached/cost) and `mini_trajectory_steps`
in addition to the existing `mini_exit_status` / `mini_submission`
lines, so a user sees the new shape end-to-end.

Tests:

* 8 tests in `plugins/agents/mini-swe-agent/tests/`:
    - structured `MiniSweAgentResult` shape
    - trajectory load + parse from a file path
    - inline trajectory passthrough
    - exception propagation
    - `aggregate_usage` ↔ `final_metrics` consistency
    - `to_dict` strips None / `to_json` round trip
    - string `tool_calls.function.arguments` fallback to `command`

Full repo suite: 184 passing. pyright clean. ruff clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
@FatPigeorz FatPigeorz changed the title refactor(mini-swe-agent): harbor-style trajectory + post-run metrics refactor(mini-swe-agent) May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant