refactor(mini-swe-agent)#39
Open
FatPigeorz wants to merge 1 commit into
Open
Conversation
Expands the mini-swe-agent integration from a 25-line passthrough to
a runner that captures the agent's native v2 trajectory, converts it
into a structured `Trajectory` (the same ATIF v1.2 shape harbor uses
in `MiniSweAgent.populate_context_post_run`), and aggregates token
usage + cost for downstream eval / RL consumers.
New modules:
* `agentix.agents.mini_swe_agent.trajectory` —
`Trajectory` / `Step` / `ToolCall` / `Observation` /
`Metrics` / `FinalMetrics` / `AgentInfo` dataclasses; a
`from_mini_swe_agent(raw, session_id=...)` converter that handles
system / user / assistant / tool roles, lifts assistant
`tool_calls` into structured `ToolCall`s, attaches tool results as
observations on the preceding agent step, and apportions cost by
completion-token share.
Plus a cheap `aggregate_usage(...)` for callers that only need
totals (matches harbor's `populate_context_post_run` shape:
`n_input_tokens`, `n_output_tokens`, `n_cache_tokens`,
`cost_usd`).
* `agentix.agents.mini_swe_agent.runner` — `run(task, workdir,
agent, trajectory_path=None, session_id=None)` now returns a
`MiniSweAgentResult(exit_status, submission, workdir,
raw_trajectory, trajectory, usage)`. When `trajectory_path` points
at the file mini-swe-agent's CLI flag `--output` writes, the
structured trajectory rides back inside `client.remote(...)`'s
pickled return value — no shared filesystem assumed.
Example update: `examples/run-mini-swe-agent/main.py` prints
`mini_usage` (input/output/cached/cost) and `mini_trajectory_steps`
in addition to the existing `mini_exit_status` / `mini_submission`
lines, so a user sees the new shape end-to-end.
Tests:
* 8 tests in `plugins/agents/mini-swe-agent/tests/`:
- structured `MiniSweAgentResult` shape
- trajectory load + parse from a file path
- inline trajectory passthrough
- exception propagation
- `aggregate_usage` ↔ `final_metrics` consistency
- `to_dict` strips None / `to_json` round trip
- string `tool_calls.function.arguments` fallback to `command`
Full repo suite: 184 passing. pyright clean. ruff clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Task 5 of the 7-task batch. Expands the mini-swe-agent integration from a 25-line passthrough to a runner that captures the agent's native v2 trajectory, converts it into a structured
Trajectory(the same ATIF v1.2 shape harbor uses inMiniSweAgent.populate_context_post_run), and aggregates token usage + cost for downstream eval / RL consumers.New return shape
Module layout
The structured trajectory rides back to the host inside
client.remote(...)'s pickled return value — no shared filesystem assumed. Tool calls are lifted into structuredToolCalls, tool results attach as observations on the preceding agent step, and cost is apportioned by completion-token share (matching harbor's converter).Example update:
examples/run-mini-swe-agent/main.pyprintsmini_usageandmini_trajectory_stepslines alongside the existingmini_exit_status/mini_submission.Test plan
plugins/agents/mini-swe-agent/tests/test_mini_swe_agent.pycover the structured result shape, trajectory load + parse, inline trajectory passthrough, exception propagation, aggregate-vs-final-metrics consistency, dict / json round trip, and the string-arguments fallback for tool calls.Made with Cursor