feat: trajectory completion + test suite (Phase 1)#1
Open
yitianlian wants to merge 1 commit into
Open
Conversation
- Add Trajectory.save()/load() for JSONL serialization - Add Trajectory.validate_trajectory() for integrity checks - Add Claude Code stream-json parser (agentix/parsers/claude_code.py) converting CLI output to ATIF v1.4 trajectories - Update claude-code runner to return RunResult with trajectory using --output-format stream-json --verbose --bare - Fix lint issues across codebase (ruff auto-fixes) - Add 47 tests covering models, trajectory, executor, server, and parser Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 9, 2026
FatPigeorz
added a commit
that referenced
this pull request
May 15, 2026
Foundation for the multi-axis plugin system. One generic registry tool
replaces ad-hoc if/elif dispatch across the framework's extension axes.
agentix/_plugin.py — Registry[T]
- Generic wrapper over `importlib.metadata.entry_points(group=...)`.
- Lazy load on first `get()` / `all()`; conflicts raise
`PluginConflictError` with both dist+version labels (never silent
last-wins); per-entry load failures cached on the registry so one
broken plugin doesn't poison the rest.
- In-process `register(name, factory)` shim for tests and dynamic use
— same registry object handles both paths so consumers don't branch
on "is it a real plugin?".
- `PluginSource(dist_name, dist_version)` records provenance for
`agentix plugins` output and conflict diagnostics.
Deployment → Protocol + plugin registry (proof of pattern)
- `agentix/deployment/base.py:Deployment` is now a `@runtime_checkable`
Protocol — three methods, no inheritance edge. Backends are
structurally compatible classes.
- `Sandbox`'s `.session()` method moved to a free function
`agentix.deployment.session(deployment, config)` so the Protocol
stays at three methods.
- `_deployments: Registry[type[Deployment]]` keyed on
`agentix.deployment`; `register_deployment` / `load_deployment` /
`deployments()` accessors.
- Builtin `local` / `daytona` / `e2b` registered in the framework's
own pyproject.toml under `[project.entry-points."agentix.deployment"]`.
Downstream backends add their own entry the same way; `pip install
agentix-deployment-fly` plus a 4-line pyproject snippet makes
`agentix deploy fly` work with zero framework changes.
- DaytonaDeployment / E2BDeployment now take no constructor args; they
read `DAYTONA_API_KEY` / `E2B_API_KEY` / `E2B_TEMPLATE_ID` from env
inside `__init__`. The plugin registry instantiates every backend
uniformly via `cls()`.
- `agentix/cli/deploy.py:_make_deployment` — drops the if/elif and
the `--api-key` / `--template-id` flags. Backend name is no longer
`choices=` constrained; the registry's KeyError surfaces unknown
names with the available list.
agentix plugins subcommand
- New `agentix/cli/plugins.py` walks every framework axis group
(`agentix.closure`, `agentix.deployment`, `agentix.trace_sink`,
`agentix.spec_resolver`, `agentix.wire_pattern`, `agentix.cli`) and
prints one line per registered plugin: name, target, dist@version,
load status. `--group` filters; `--verbose` prints full tracebacks
for load failures.
- Wired into the CLI dispatcher; `agentix plugins` is itself an
agentix.cli entry-point candidate (dogfoods the same pattern).
End-to-end check:
$ agentix plugins
agentix.closure
files → agentix.files:Files [agentix-files@0.1.0] ok
bash → agentix.bash:Bash [agentix-bash@0.1.0] ok
agentix.deployment
daytona → agentix.deployment.daytona:DaytonaDeployment [agentix@0.1.0] ok
e2b → agentix.deployment.e2b:E2BDeployment [agentix@0.1.0] ok
local → agentix.deployment.docker:DockerDeployment [agentix@0.1.0] ok
$ agentix deploy nonexistent --image foo:bar
error: no plugin 'nonexistent' in group 'agentix.deployment'; available: …
39 tests pass. Stage 1 (tasks #1–#3) of the 10-task plugin-system plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FatPigeorz
added a commit
that referenced
this pull request
May 15, 2026
Stage 5+6 of the plugin-system plan. Locks in the API surface across all six extension axes with explicit test coverage and a public authors-guide doc. tests/test_plugin_registry.py — Registry[T] mechanics - happy-path lookup via entry points + via in-process register() - in-process register() overrides entry-point entries (test convenience) - duplicate-name conflict raises PluginConflictError with both dist@version labels (no silent last-wins) - one plugin's loader failing doesn't poison the rest; the error is cached and re-raised on `get(name)` - KeyError for unknown name lists available names - register() invalidates cache; reset() clears extras tests/test_plugin_axes.py — per-axis semantics - deployment (select-one): register + load round-trip, structural Protocol check via isinstance, unknown-name KeyError - trace sinks (fan-out): every registered sink receives each event; one sink raising doesn't block siblings; emit() with no sinks is a no-op (closure-author safety) - spec resolvers (chain): priority desc wins; None falls through; no-claim raises SystemExit with a clear message - wire patterns (ordered merge): in-process register_pattern beats built-ins; fallback to UnaryPattern when nothing else matches - `agentix plugins` lists every known framework group docs/plugins.md — author guide - One section per axis with: Protocol contract / pyproject snippet / working example. - Highlights the install→use UX: `pip install your-plugin` + zero framework patches → `agentix plugins` shows it green → the plugin works. - Documents the in-process `register_*` helpers as the test path, separate from the production entry-point surface. 59 tests pass + 1 skipped throughout. With this, tasks #1–#10 of the plugin-system plan are all closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1: M1 (Trajectory Completion) + M3 (Test Suite).
M1: Trajectory Completion — agent runners can now produce ATIF training data
agentix/trajectory.py): Full ATIF v1.4 model withsave()/load()(JSONL) andvalidate_trajectory()integrity checksagentix/parsers/claude_code.py): Converts Claude Code CLI--output-format stream-jsonoutput into ATIF v1.4 trajectories — tool calls, observations, token/cost tracking, synthetic message filtering, malformed JSON handlingagents/claude-code/runner.py): ReturnsAgentOutputwithatif_trajectoryfield for full ATIF dataatif_trajectory: Trajectory | Nonefield toAgentOutputalongside upstream's lightweighttrajectory: list[Step]M3: Test Suite (45 tests, all passing)
test_executor.py— exec, timeout, cwd, env, output capping, path guardstest_models.py— model validation including max_output, dataset_closuretest_trajectory.py— add_step metrics, serialization roundtrip, validationtest_server.py— FastAPI endpoints /health, /exec, /upload, /downloadtest_parser_claude_code.py— stream-json parsing, edge casesTest plan
uv run ruff check— all checks passeduv run pytest tests/ -v— 45 passed, 0 failed🤖 Generated with Claude Code