Agentix-Project · Meirtz · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
diff --git a/examples/eval-tui/DESIGN.md b/examples/eval-tui/DESIGN.md
@@ -0,0 +1,51 @@
+# Agentix TUI — design & rubrics
+
+A modern, reactive [Textual](https://textual.textualize.io/) control room for
+Agentix. The goal is a single TUI that surfaces **every core Agentix surface** —
+not just batch rollouts — built on the stable `client.remote` + `bundle` APIs
+(plus `provider.session`) and degrading gracefully when no Docker/runtime is
+present.
+
+## Rubrics (v1 — scored 0–5, target ≥4, revisable)
+
+| # | Dimension | "Advanced" looks like |
+|---|-----------|------------------------|
+| 1 | **Coverage** | Rollouts · plugin **Catalog** · **Sandboxes**/providers + remote-invoke · **Build**/bundle · **Observability** (traces + logs) |
+| 2 | **Reactivity** | Fully async, live updates, bounded concurrency, never blocks the UI |
+| 3 | **Navigation / IA** | Discoverable multi-area nav (tabs), command palette, help |
+| 4 | **Visual design** | Cohesive theme, semantic color, responsive layout, dark/light |
+| 5 | **Interaction** | Keybindings, mouse, search/filter, drill-down detail |
+| 6 | **Robustness** | Graceful with no Docker (demo/empty states), error surfaces, cancellation |
+| 7 | **Feedback** | Progress, throughput, status, notifications |
+| 8 | **Code quality** | Typed, ruff-clean, modular, documented |
+| 9 | **Verifiability** | Headless `run_test` pilots per screen; demo mode without infra |
+| 10 | **Polish / UX** | Help screen, sensible defaults, onboarding |
+
+## Architecture
+
+```text
+AgentixTUI(App)                      # shell: Header + TabbedContent + Footer, theme, palette
+├── Rollouts   (views/rollouts.py)   # live batch-rollout dashboard over agentix.runner
+├── Catalog    (views/catalog.py)    # installed agentix dists + entry points (no Docker)
+├── Sandboxes  (views/…)             # providers + live sessions + remote-invoke   [planned]
+├── Build      (views/…)             # trigger & stream `agentix build`            [planned]
+└── Observability (views/…)          # live /trace spans + /log streams            [planned]
+```
+
+Each area is a self-contained view widget with its own demo/empty state, so the
+app is useful (and testable headlessly) with no runtime attached.
+
+## Rubric addendum (v2)
+
+| # | Dimension | "Advanced" looks like |
+|---|-----------|------------------------|
+| 11 | **Aesthetics** | A landing dashboard that's genuinely beautiful — branded gradient banner, ecosystem stat cards, cohesive theme; a "sexy" first impression |
+
+## Iteration log
+
+- **PR-A** — app shell (TabbedContent nav) + **Catalog** view (real entry-point /
+  distribution introspection) + this rubric doc + theming. Coverage 1→2, IA 1→4, Visual 3→4.
+- **drill-down** — Rollouts instance detail pane (verdict/duration/score/error). Interaction 3→4.
+- **Overview dashboard** — branded gradient banner + live ecosystem stat cards +
+  environment readiness as the landing tab; branded Textual theme. Aesthetics →4, Polish →4.
+- **next** — Sandboxes, Build, Observability views; command palette; search/filter.
diff --git a/examples/eval-tui/README.md b/examples/eval-tui/README.md
@@ -0,0 +1,54 @@
+# eval-tui
+
+A modern [Textual](https://textual.textualize.io/) **control room** for Agentix —
+a tabbed TUI that surfaces each Agentix area in one place. See
+[`DESIGN.md`](DESIGN.md) for the rubrics it's iterated against.
+
+```text
+┌─ Agentix · agent ↔ environment control room ───────────────────────────────┐
+│  Rollouts │ Catalog │ Sandboxes │ Build │ Observability                      │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ [████████████············] 18/40 done    ✓ 11   ✗ 7   ⟳ 4 running   62.3/min │
+│ Instance              Status      Time  Result │ ▶ starting 40 rollouts        │
+│ demo__task-000        ✓ PASS      1.2s  resolved│ ✓ PASS demo__task-000 · 1.2s │
+│ demo__task-001        ⟳ scoring   …          … │ …                            │
+└─────────────────────────────────────────────────────────────────────────────┘
+ q Quit
+```
+
+## Tabs
+
+- **Rollouts** — live batch-rollout dashboard over
+  [`agentix.runner`](../../plugins/runner): per-instance phase grid (`pending →
+  setup → agent → scoring → PASS/FAIL/skip/error`), summary bar (progress /
+  resolved / failed / running / throughput), and an event log. Phase
+  transitions are observed by wrapping the dataset/agent adapters
+  (`_adapters.py`), so `agentix.runner` is unchanged.
+- **Catalog** — the installed Agentix ecosystem: every `agentix*` distribution
+  plus `agentix.provider` (backends) and `agentix.nix` (agents/datasets shipping
+  a Nix closure) entry points. Pure introspection — no Docker.
+- **Sandboxes · Build · Observability** — signposted; landing in follow-up PRs.
+
+## Run
+
+```bash
+cd examples/eval-tui
+uv sync
+
+# No-Docker synthetic demo:
+uv run agentix-eval-tui --demo 40 --n-concurrent 6
+
+# Real run — adapters resolved like `agentix-run`:
+uv run agentix-eval-tui --dataset my_pkg:dataset --agent my_pkg:agent \
+    --provider docker --bundle eval:0.1.0 --model claude-3-5-sonnet-latest
+
+# Bare launch — just browse the Catalog (no run):
+uv run agentix-eval-tui
+```
+
+## Test
+
+```bash
+uv sync --extra dev
+uv run pytest        # headless Textual run_test pilots — no Docker
+```
diff --git a/examples/eval-tui/eval_tui/__init__.py b/examples/eval-tui/eval_tui/__init__.py
@@ -0,0 +1,8 @@
+"""Agentix TUI — a modern Textual control room for Agentix."""
+
+from __future__ import annotations
+
+from .app import AgentixTUI
+from .models import RunSpec
+
+__all__ = ["AgentixTUI", "RunSpec"]
diff --git a/examples/eval-tui/eval_tui/__main__.py b/examples/eval-tui/eval_tui/__main__.py
@@ -0,0 +1,6 @@
+from __future__ import annotations
+
+from .cli import main
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/examples/eval-tui/eval_tui/_adapters.py b/examples/eval-tui/eval_tui/_adapters.py
@@ -0,0 +1,49 @@
+"""Phase-tracing wrappers around a runner `Dataset` / `Agent`.
+
+The runner exposes a per-instance `on_result` callback but no in-flight phase
+hook. To drive a live UI we wrap the dataset/agent so the dashboard learns
+when each instance enters `setup`, `agent`, and `score` — without changing
+`agentix.runner` itself. Each wrapper simply emits a phase event, then
+delegates to the wrapped object.
+"""
+
+from __future__ import annotations
+
+from collections.abc import Callable
+from typing import Any
+
+OnPhase = Callable[[str, str], None]
+
+
+def instance_id(instance: dict[str, Any]) -> str:
+    return str(instance.get("instance_id") or instance.get("id") or "?")
+
+
+class TracingDataset:
+    def __init__(self, inner: Any, on_phase: OnPhase) -> None:
+        self._inner = inner
+        self._on_phase = on_phase
+
+    def instances(self) -> Any:
+        return self._inner.instances()
+
+    def image(self, instance: dict[str, Any]) -> str:
+        return self._inner.image(instance)
+
+    async def setup(self, sandbox: Any, instance: dict[str, Any]) -> bool:
+        self._on_phase(instance_id(instance), "setup")
+        return await self._inner.setup(sandbox, instance)
+
+    async def score(self, sandbox: Any, instance: dict[str, Any], patch: str) -> dict[str, Any]:
+        self._on_phase(instance_id(instance), "score")
+        return await self._inner.score(sandbox, instance, patch)
+
+
+class TracingAgent:
+    def __init__(self, inner: Any, on_phase: OnPhase) -> None:
+        self._inner = inner
+        self._on_phase = on_phase
+
+    async def solve(self, sandbox: Any, instance: dict[str, Any], *, model: str | None) -> Any:
+        self._on_phase(instance_id(instance), "agent")
+        return await self._inner.solve(sandbox, instance, model=model)
diff --git a/examples/eval-tui/eval_tui/app.py b/examples/eval-tui/eval_tui/app.py
@@ -0,0 +1,128 @@
+"""Agentix TUI — a modern Textual control room for Agentix.
+
+A tabbed shell that surfaces each Agentix area as its own view: an Overview
+landing dashboard, live **Rollouts** over `agentix.runner`, a plugin
+**Catalog**, **Sandboxes** readiness, a **Build** planner, and live
+**Observability**. See `DESIGN.md` for the rubrics this iterates against.
+
+Run a no-Docker demo with `agentix-eval-tui --demo 40`, point it at real
+adapters like `agentix-run`, or launch it bare to browse the Catalog.
+"""
+
+from __future__ import annotations
+
+from textual.app import App, ComposeResult
+from textual.widgets import Footer, Header, TabbedContent, TabPane
+
+from .models import RunSpec
+from .views import (
+    BuildView,
+    CatalogView,
+    ObservabilityView,
+    OverviewView,
+    RolloutsView,
+    SandboxesView,
+)
+
+
+class AgentixTUI(App):
+    """Tabbed control room for Agentix."""
+
+    TITLE = "Agentix"
+    SUB_TITLE = "agent ↔ environment control room"
+
+    CSS = """
+    TabbedContent { height: 1fr; }
+    TabPane { padding: 0; }
+
+    #rollouts-summary {
+        height: 3;
+        padding: 0 2;
+        content-align: left middle;
+        border: round $primary;
+        background: $panel;
+    }
+    #rollouts-body { height: 1fr; }
+    #rollouts-table { width: 3fr; height: 1fr; border: round $primary; }
+    #rollouts-side { width: 2fr; height: 1fr; }
+    #rollouts-detail { height: 2fr; border: round $primary; padding: 0 1; }
+    #rollouts-log { height: 3fr; border: round $primary; padding: 0 1; }
+
+    #catalog-title { height: 1; padding: 0 1; }
+    #catalog-filter { margin: 0 1; }
+    #catalog-table { height: 1fr; border: round $primary; }
+
+    #ov-banner { height: auto; padding: 1 2; content-align: center middle; text-align: center; }
+    #ov-cards { height: 7; padding: 0 1; }
+    .ov-card {
+        width: 1fr;
+        height: 5;
+        border: round $primary;
+        padding: 1 1;
+        margin: 0 1;
+        content-align: center middle;
+        text-align: center;
+    }
+    #ov-hints { height: auto; padding: 1 2; }
+
+    #sb-title { height: 1; padding: 0 1; }
+    #sb-table { height: 1fr; border: round $primary; }
+    #sb-explainer { height: auto; padding: 1 1; }
+
+    #obs-title { height: 1; padding: 0 1; }
+    #obs-body { height: 1fr; }
+    #obs-trace { width: 1fr; height: 1fr; border: round $primary; padding: 0 1; }
+    #obs-log { width: 1fr; height: 1fr; border: round $primary; padding: 0 1; }
+
+    #build-title { height: 1; padding: 0 1; }
+    #build-path { margin: 0 1; }
+    #build-cmd { height: auto; padding: 1 2; }
+    #build-info { height: 1fr; padding: 0 2; }
+    """
+
+    BINDINGS = [
+        ("1", "show_tab('overview')", "Overview"),
+        ("2", "show_tab('rollouts')", "Rollouts"),
+        ("3", "show_tab('catalog')", "Catalog"),
+        ("4", "show_tab('sandboxes')", "Sandboxes"),
+        ("5", "show_tab('build')", "Build"),
+        ("6", "show_tab('observability')", "Obs"),
+        ("q", "quit", "Quit"),
+    ]
+
+    def __init__(self, *, rollout_spec: RunSpec | None = None) -> None:
+        super().__init__()
+        self._spec = rollout_spec
+
+    def on_mount(self) -> None:
+        # Best-effort branded theme; falls back to the default if the running
+        # Textual version's theme API differs.
+        try:
+            from textual.theme import Theme
+
+            self.register_theme(
+                Theme(name="agentix", primary="#cc785c", secondary="#a45a45", accent="#e08a6d", dark=True)
+            )
+            self.theme = "agentix"
+        except Exception:
+            pass
+
+    def compose(self) -> ComposeResult:
+        yield Header(show_clock=True)
+        with TabbedContent(initial="overview"):
+            with TabPane("Overview", id="overview"):
+                yield OverviewView()
+            with TabPane("Rollouts", id="rollouts"):
+                yield RolloutsView(self._spec)
+            with TabPane("Catalog", id="catalog"):
+                yield CatalogView()
+            with TabPane("Sandboxes", id="sandboxes"):
+                yield SandboxesView()
+            with TabPane("Build", id="build"):
+                yield BuildView()
+            with TabPane("Observability", id="observability"):
+                yield ObservabilityView()
+        yield Footer()
+
+    def action_show_tab(self, tab: str) -> None:
+        self.query_one(TabbedContent).active = tab
diff --git a/examples/eval-tui/eval_tui/cli.py b/examples/eval-tui/eval_tui/cli.py
@@ -0,0 +1,93 @@
+"""CLI for the Agentix TUI.
+
+- `agentix-eval-tui --demo 40` — synthetic, no-Docker rollouts.
+- `agentix-eval-tui --dataset m:d --agent m:a --bundle eval:0.1.0` — real run,
+  adapters resolved like `agentix-run`.
+- `agentix-eval-tui` — no run; browse the Catalog (and the planned tabs).
+"""
+
+from __future__ import annotations
+
+import argparse
+import importlib
+import sys
+from typing import Any
+
+from .app import AgentixTUI
+from .models import RunSpec
+
+
+def _load(path: str) -> Any:
+    module_name, sep, attr = path.partition(":")
+    if not module_name or not sep or not attr:
+        raise SystemExit(f"expected 'module:attr', got {path!r}")
+    obj = getattr(importlib.import_module(module_name), attr)
+    return obj() if isinstance(obj, type) else obj
+
+
+def _load_provider(name_or_path: str) -> Any:
+    if ":" in name_or_path:
+        return _load(name_or_path)
+    module = importlib.import_module(f"agentix.provider.{name_or_path}")
+    classes = [
+        value
+        for key, value in vars(module).items()
+        if isinstance(value, type) and key.endswith("Provider") and value.__module__ == module.__name__
+    ]
+    if len(classes) != 1:
+        raise SystemExit(f"could not find a single *Provider class in agentix.provider.{name_or_path}")
+    return classes[0]()
+
+
+def _parse_args(argv: list[str]) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(prog="agentix-eval-tui", description="Modern TUI control room for Agentix.")
+    parser.add_argument("--demo", type=int, metavar="N", default=None, help="Run N synthetic instances (no Docker).")
+    parser.add_argument("--dataset", help="Dataset adapter as 'module:attr'.")
+    parser.add_argument("--agent", help="Agent adapter as 'module:attr'.")
+    parser.add_argument("--provider", default="docker", help="Provider backend name or 'module:attr'.")
+    parser.add_argument("--bundle", help="Agentix bundle reference (from `agentix build`).")
+    parser.add_argument("--model", default=None)
+    parser.add_argument("--n-concurrent", type=int, default=4)
+    parser.add_argument("--limit", type=int, default=None)
+    return parser.parse_args(argv)
+
+
+def _build_spec(args: argparse.Namespace) -> RunSpec | None:
+    if args.demo is not None:
+        from .demo import DemoAgent, DemoDataset, DemoProvider
+
+        dataset = DemoDataset(args.demo)
+        return RunSpec(
+            dataset=dataset,
+            agent=DemoAgent(),
+            provider=DemoProvider(),
+            bundle="demo",
+            instances=dataset.instances(),
+            n_concurrent=args.n_concurrent,
+        )
+
+    given = [bool(args.dataset), bool(args.agent), bool(args.bundle)]
+    if not any(given):
+        return None  # bare launch: browse the Catalog / planned tabs
+    if not all(given):
+        raise SystemExit("--dataset, --agent and --bundle must be given together (or use --demo N)")
+
+    dataset = _load(args.dataset)
+    instances = list(dataset.instances())
+    if args.limit is not None:
+        instances = instances[: args.limit]
+    return RunSpec(
+        dataset=dataset,
+        agent=_load(args.agent),
+        provider=_load_provider(args.provider),
+        bundle=args.bundle,
+        model=args.model,
+        instances=instances,
+        n_concurrent=args.n_concurrent,
+    )
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = _parse_args(sys.argv[1:] if argv is None else argv)
+    AgentixTUI(rollout_spec=_build_spec(args)).run()
+    return 0