feat(examples): Agentix TUI — tabbed control room (rollouts + catalog) by Meirtz · Pull Request #63 · Agentix-Project/Agentix

Meirtz · 2026-05-29T19:38:07Z

What

A modern, tabbed Textual control room for Agentix (examples/eval-tui). Started as a rollout dashboard; now a multi-tab app (AgentixTUI) that surfaces each Agentix area, iterated against an explicit design rubric (DESIGN.md).

Tabs (all implemented)

Overview — landing dashboard: branded banner + live ecosystem stat cards (installed packages, providers, Docker readiness).
Rollouts — live batch-rollout dashboard over agentix.runner: per-instance phase grid (pending → setup → agent → scoring → PASS/FAIL/skip/error), summary bar (progress / resolved / failed / running / throughput), event log + drill-down detail. Phase transitions come from wrapping the dataset/agent adapters, so agentix.runner is unchanged.
Catalog — installed agentix* distributions + agentix.provider and agentix.nix entry points, with a live filter. Pure introspection — no Docker.
Sandboxes — async readiness probes for the deployment backends (docker / podman / apptainer / daytona / e2b).
Build — an agentix build command planner: type a project path → live agentix build … command + nix-closure introspection.
Observability — split live /trace + /log event stream.

Idle-safe: launches with no run attached (bare agentix-eval-tui) so the Catalog is browsable without infra.

Verification

✅ ruff check
✅ 10 headless run_test() pilots — tabbed demo run to completion, idle path, catalog discovery + filter, overview counts, sandbox backend list, observability stream, rollout drill-down, build command, tab keybinding — all no-Docker
✅ Standalone example with its own lock → Textual stays out of the core-dev workspace (zero lock impact)

See examples/eval-tui/DESIGN.md for the rubrics + iteration log.

`examples/eval-tui` is a modern, reactive Textual dashboard over `agentix.runner`: a per-instance grid (pending -> setup -> agent -> scoring -> PASS/FAIL/skip/error), a live summary bar (done / resolved / failed / running + throughput), and an event log. In-flight phases are observed by wrapping the dataset/agent adapters (`_adapters.py`), so `agentix.runner` is unchanged. - `--demo N` runs a synthetic, no-Docker batch (reproducible from a seed) — try it instantly. Real runs resolve `module:attr` dataset/agent + a provider, exactly like `agentix-run`. - Standalone example (own lock) — its TUI deps stay out of the core-dev venv. - Verified headlessly: ruff clean + a Textual `run_test()` pilot test that drives the demo to completion (no Docker). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Restructure the single rollout dashboard into a multi-tab Textual app (AgentixTUI) that surfaces each Agentix area: - Rollouts — the live dashboard, refactored into a reusable view widget. - Catalog — installed `agentix*` distributions + `agentix.provider` and `agentix.nix` entry points (pure introspection, no Docker). - Sandboxes / Build / Observability — signposted placeholders for the follow-up PRs that flesh them out. Adds DESIGN.md (the rubrics this iterates against), an idle state so the app is useful with no run attached, and pilot tests for the tabbed app, the idle path, and catalog discovery. ruff + headless run_test green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz · 2026-05-29T20:30:47Z

Expanded from the single rollout dashboard into a tabbed control room (Rollouts + Catalog now; Sandboxes/Build/Observability signposted for follow-ups), iterated against a design-rubric doc (DESIGN.md). ruff + 3 headless pilot tests green.

…l pane Highlight a row in the Rollouts grid to see that instance's full detail (verdict, duration, agent exit, patch size, score breakdown, error) in a side panel, alongside the live event log. The rendered detail text is also exposed on the view for headless assertions. ruff + 4 pilot tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz · 2026-05-29T20:49:26Z

Added a drill-down detail pane to the Rollouts tab: highlight an instance → verdict / duration / agent exit / patch size / score breakdown / error in a side panel. 4 headless pilot tests green.

A branded landing tab: a warm-gradient "AGENTIX" banner, live ecosystem stat cards (packages / providers / nix-closures from the same introspection the Catalog uses), a Docker-readiness indicator, and quick hints. Registers a branded Textual theme (best-effort; falls back to the default if the running Textual version's theme API differs). Pure introspection — renders with or without Docker. Adds an Aesthetics rubric (DESIGN.md) and a pilot test. ruff + 5 pilot tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz · 2026-05-29T21:03:11Z

Added an Overview dashboard as the landing tab — a warm-gradient AGENTIX banner, live ecosystem stat cards (packages / providers / nix-closures), a Docker-readiness indicator, and a branded Textual theme. New Aesthetics rubric in DESIGN.md. 5 headless pilot tests green.

Replaces the Sandboxes placeholder with a live readiness view: the known backends (docker / podman / apptainer / daytona / e2b) each probed for usability here — binary on PATH, daemon reachable (a real `<bin> info` subprocess in a worker), or SDK + API key present — plus a short note on the session + remote-invoke model. Degrades gracefully when nothing is installed. ruff + 6 pilot tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz · 2026-05-29T21:13:05Z

Added the Sandboxes view: each provider backend (docker/podman/apptainer/daytona/e2b) is probed for real usability here (binary on PATH, daemon reachable, SDK + key). 6 pilot tests green. Build + Observability tabs next.

Replaces the Observability placeholder with a split live feed of the two Agentix side channels: /trace (OTel-style spans) on the left, /log (bridged stdlib logging) on the right. With no run attached it plays a short synthetic demo so the shape is visible; real streams arrive from running sandboxes. ruff + 7 pilot tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz · 2026-05-29T21:16:19Z

Added the Observability view: a split live feed of /trace spans + /log records (synthetic demo when no run is attached). The TUI now has 5 live tabs — Overview · Rollouts · Catalog · Sandboxes · Observability — plus a Build placeholder. 7 pilot tests green.

Replaces the Build placeholder with an interactive planner: a project-path input that live-constructs the `agentix build … --platform … --output …` command, the build model (uv owns Python, Nix owns binaries), and the `agentix.nix` closures that would be staged (real entry-point introspection). Adds number keybindings (1–6) to jump between tabs. The control room now has six live tabs — Overview · Rollouts · Catalog · Sandboxes · Build · Observability. ruff + 9 pilot tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz · 2026-05-29T21:36:10Z

Added the Build planner (live agentix build command from a project path + the nix-closures that would be staged) and tab keybindings (1–6). All six tabs are now live — Overview · Rollouts · Catalog · Sandboxes · Build · Observability. 9 headless pilot tests green; rubric Coverage is now full.

Meirtz · 2026-05-29T21:42:22Z

Added a live filter to the Catalog tab (narrows by name/kind/detail as you type). 10 headless pilot tests now green across all six tabs.

The Catalog tab gets a filter input that narrows the distributions/entry-points table by name / kind / detail as you type (title shows matched/total). ruff + 10 pilot tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Meirtz · 2026-05-30T12:08:20Z

Superseded by #68: that PR was stacked on this branch, so merging it brought the entire eval-tui base into master along with the JSON-export feature. This PR's diff vs master is now only deletions (it would remove #68's export), so it's redundant. Closing — the TUI base is already on master. The remaining polish (theme switcher #66, help overlay #67) rebases on top.

Meirtz and others added 2 commits May 30, 2026 03:37

Meirtz changed the title ~~feat(examples): live Textual TUI dashboard for batch rollouts~~ feat(examples): Agentix TUI — tabbed control room (rollouts + catalog) May 29, 2026

This was referenced May 29, 2026

feat(examples): TUI theme switcher #66

Merged

feat(examples): TUI help overlay (?) #67

Merged

feat(examples): export rollouts to JSON (s) #68

Merged

Meirtz force-pushed the feat/eval-tui branch from 5d9d9a4 to 7075757 Compare May 30, 2026 10:15

Meirtz closed this May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples): Agentix TUI — tabbed control room (rollouts + catalog)#63

feat(examples): Agentix TUI — tabbed control room (rollouts + catalog)#63
Meirtz wants to merge 8 commits into
Agentix-Project:masterfrom
Meirtz:feat/eval-tui

Meirtz commented May 29, 2026 •

edited

Loading

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Meirtz commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Tabs (all implemented)

Verification

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 29, 2026

Uh oh!

Meirtz commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Meirtz commented May 29, 2026 •

edited

Loading