Skip to content

feat: trajectory completion + test suite (Phase 1)#1

Open
yitianlian wants to merge 1 commit into
Agentix-Project:masterfrom
yitianlian:feat/m1-trajectory-completion
Open

feat: trajectory completion + test suite (Phase 1)#1
yitianlian wants to merge 1 commit into
Agentix-Project:masterfrom
yitianlian:feat/m1-trajectory-completion

Conversation

@yitianlian
Copy link
Copy Markdown

Summary

Phase 1: M1 (Trajectory Completion) + M3 (Test Suite).

M1: Trajectory Completion — agent runners can now produce ATIF training data

  • ATIF Trajectory module (agentix/trajectory.py): Full ATIF v1.4 model with save()/load() (JSONL) and validate_trajectory() integrity checks
  • Claude Code stream-json parser (agentix/parsers/claude_code.py): Converts Claude Code CLI --output-format stream-json output into ATIF v1.4 trajectories — tool calls, observations, token/cost tracking, synthetic message filtering, malformed JSON handling
  • Claude Code runner (agents/claude-code/runner.py): Returns AgentOutput with atif_trajectory field for full ATIF data
  • Protocol extension: Added atif_trajectory: Trajectory | None field to AgentOutput alongside upstream's lightweight trajectory: list[Step]

M3: Test Suite (45 tests, all passing)

  • test_executor.py — exec, timeout, cwd, env, output capping, path guards
  • test_models.py — model validation including max_output, dataset_closure
  • test_trajectory.py — add_step metrics, serialization roundtrip, validation
  • test_server.py — FastAPI endpoints /health, /exec, /upload, /download
  • test_parser_claude_code.py — stream-json parsing, edge cases

Test plan

  • uv run ruff check — all checks passed
  • uv run pytest tests/ -v — 45 passed, 0 failed

🤖 Generated with Claude Code

- Add Trajectory.save()/load() for JSONL serialization
- Add Trajectory.validate_trajectory() for integrity checks
- Add Claude Code stream-json parser (agentix/parsers/claude_code.py)
  converting CLI output to ATIF v1.4 trajectories
- Update claude-code runner to return RunResult with trajectory
  using --output-format stream-json --verbose --bare
- Fix lint issues across codebase (ruff auto-fixes)
- Add 47 tests covering models, trajectory, executor, server, and parser

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FatPigeorz added a commit that referenced this pull request May 15, 2026
Foundation for the multi-axis plugin system. One generic registry tool
replaces ad-hoc if/elif dispatch across the framework's extension axes.

agentix/_plugin.py — Registry[T]
- Generic wrapper over `importlib.metadata.entry_points(group=...)`.
- Lazy load on first `get()` / `all()`; conflicts raise
  `PluginConflictError` with both dist+version labels (never silent
  last-wins); per-entry load failures cached on the registry so one
  broken plugin doesn't poison the rest.
- In-process `register(name, factory)` shim for tests and dynamic use
  — same registry object handles both paths so consumers don't branch
  on "is it a real plugin?".
- `PluginSource(dist_name, dist_version)` records provenance for
  `agentix plugins` output and conflict diagnostics.

Deployment → Protocol + plugin registry (proof of pattern)
- `agentix/deployment/base.py:Deployment` is now a `@runtime_checkable`
  Protocol — three methods, no inheritance edge. Backends are
  structurally compatible classes.
- `Sandbox`'s `.session()` method moved to a free function
  `agentix.deployment.session(deployment, config)` so the Protocol
  stays at three methods.
- `_deployments: Registry[type[Deployment]]` keyed on
  `agentix.deployment`; `register_deployment` / `load_deployment` /
  `deployments()` accessors.
- Builtin `local` / `daytona` / `e2b` registered in the framework's
  own pyproject.toml under `[project.entry-points."agentix.deployment"]`.
  Downstream backends add their own entry the same way; `pip install
  agentix-deployment-fly` plus a 4-line pyproject snippet makes
  `agentix deploy fly` work with zero framework changes.
- DaytonaDeployment / E2BDeployment now take no constructor args; they
  read `DAYTONA_API_KEY` / `E2B_API_KEY` / `E2B_TEMPLATE_ID` from env
  inside `__init__`. The plugin registry instantiates every backend
  uniformly via `cls()`.
- `agentix/cli/deploy.py:_make_deployment` — drops the if/elif and
  the `--api-key` / `--template-id` flags. Backend name is no longer
  `choices=` constrained; the registry's KeyError surfaces unknown
  names with the available list.

agentix plugins subcommand
- New `agentix/cli/plugins.py` walks every framework axis group
  (`agentix.closure`, `agentix.deployment`, `agentix.trace_sink`,
  `agentix.spec_resolver`, `agentix.wire_pattern`, `agentix.cli`) and
  prints one line per registered plugin: name, target, dist@version,
  load status. `--group` filters; `--verbose` prints full tracebacks
  for load failures.
- Wired into the CLI dispatcher; `agentix plugins` is itself an
  agentix.cli entry-point candidate (dogfoods the same pattern).

End-to-end check:

  $ agentix plugins
  agentix.closure
    files     → agentix.files:Files                       [agentix-files@0.1.0] ok
    bash      → agentix.bash:Bash                         [agentix-bash@0.1.0] ok
  agentix.deployment
    daytona   → agentix.deployment.daytona:DaytonaDeployment  [agentix@0.1.0] ok
    e2b       → agentix.deployment.e2b:E2BDeployment      [agentix@0.1.0] ok
    local     → agentix.deployment.docker:DockerDeployment [agentix@0.1.0] ok

  $ agentix deploy nonexistent --image foo:bar
  error: no plugin 'nonexistent' in group 'agentix.deployment'; available: …

39 tests pass. Stage 1 (tasks #1#3) of the 10-task plugin-system plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FatPigeorz added a commit that referenced this pull request May 15, 2026
Stage 5+6 of the plugin-system plan. Locks in the API surface across
all six extension axes with explicit test coverage and a public
authors-guide doc.

tests/test_plugin_registry.py — Registry[T] mechanics
- happy-path lookup via entry points + via in-process register()
- in-process register() overrides entry-point entries (test convenience)
- duplicate-name conflict raises PluginConflictError with both
  dist@version labels (no silent last-wins)
- one plugin's loader failing doesn't poison the rest; the error is
  cached and re-raised on `get(name)`
- KeyError for unknown name lists available names
- register() invalidates cache; reset() clears extras

tests/test_plugin_axes.py — per-axis semantics
- deployment (select-one): register + load round-trip, structural
  Protocol check via isinstance, unknown-name KeyError
- trace sinks (fan-out): every registered sink receives each event;
  one sink raising doesn't block siblings; emit() with no sinks is
  a no-op (closure-author safety)
- spec resolvers (chain): priority desc wins; None falls through;
  no-claim raises SystemExit with a clear message
- wire patterns (ordered merge): in-process register_pattern beats
  built-ins; fallback to UnaryPattern when nothing else matches
- `agentix plugins` lists every known framework group

docs/plugins.md — author guide
- One section per axis with: Protocol contract / pyproject snippet /
  working example.
- Highlights the install→use UX: `pip install your-plugin` +
  zero framework patches → `agentix plugins` shows it green →
  the plugin works.
- Documents the in-process `register_*` helpers as the test path,
  separate from the production entry-point surface.

59 tests pass + 1 skipped throughout. With this, tasks #1#10 of the
plugin-system plan are all closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant