AgentSys: Design doc

# AgentSys— Backend Parrellel Agent System Design Document

> Architecture reference for the multi-agent backend powering the Research Agent
> application. Covers the 3-level agent hierarchy, chat integration, LLM
> orchestration, memory system, and inter-process communication.

---

<img width="2752" height="1536" alt="Image" src="https://github.com/user-attachments/assets/14cd52a3-53f0-452b-91f9-987696a3e843" />
Note: This is a Gemini-generated graph so there's detail misaligning with the actual design. The graph is just serving for a rough understanding of the broader picture. Refer to the design doc below for every detail in the system.

## 0. Holistic View

The Research Agent backend is a **3-level multi-process agent system** built on
top of a FastAPI server and an external LLM gateway (OpenCode). A user chats in
the browser; every message flows through an LLM-powered Level 1 agent that
decides — based on its operating mode — whether to answer directly, spawn
autonomous research experiments, or manage running children.

```
       ┌─────────────────────────────────────────────────────────────────────┐
       │                     DIMENSION 1: AGENTS                            │
       │                                                                    │
       │   Browser ──HTTP──→ FastAPI Server ──OpenCode──→ LLM Gateway      │
       │                          │                                         │
       │                    SessionAgent (L1)                               │
       │                   [LLM brain, in-process]                          │
       │                   [3 modes: wild/auto/agent]                       │
       │                          │                                         │
       │            ┌─────────────┼─────────────┐                           │
       │            │ Runtime IPC │             │ Runtime IPC               │
       │            ▼             ▼             ▼                           │
       │     ResearchAgent   ResearchAgent   ExecutorAgent ─┐              │
       │         (L2)            (L2)           (L3)        │              │
       │          │               │                         │              │
       │          │ spawn_child   │ spawn_child             │              │
       │          ▼               ▼                         │              │
       │    ┌─────────────────────────────────┐             │              │
       │    │       SCHEDULER                 │◄────────────┘              │
       │    │  ┌─────────┐  ┌────────────┐   │                            │
       │    │  │ FIFO    │  │ GPU        │   │ ← all executor spawns      │
       │    │  │ Queue   │  │ Tracker    │   │   routed here              │
       │    │  └────┬────┘  └─────┬──────┘   │                            │
       │    │       └──────┬──────┘          │                            │
       │    │         Policy Engine           │                            │
       │    │         (tick ~1s)              │                            │
       │    └──────────────┬─────────────────┘                            │
       │                   │ approve & spawn                               │
       │                   ▼                                               │
       │           ExecutorAgent (L3)  ×N                                  │
       │          [subprocess, per task/run]                                │
       │          [GPU-gated, queued]                                       │
       └─────────────────────────────────────────────────────────────────────┘

       ┌─────────────────────────────────────────────────────────────────────┐
       │                     DIMENSION 2: MEMORY                            │
       │                                                                    │
       │    Agent  →  MemoryView  →  FileStore                              │
       │   (scoped)   (auto-fill)   (flat KV on disk)                      │
       │                                                                    │
       │    Hierarchical scope queries:                                     │
       │    project > session > sweep > run                                 │
       └─────────────────────────────────────────────────────────────────────┘

       ┌─────────────────────────────────────────────────────────────────────┐
       │                  DIMENSION 3: COMMUNICATION                        │
       │                                                                    │
       │   CRITICAL steer  →  kill + respawn                                │
       │   PRIORITY steer  →  buffered, polled                              │
       │   MESSAGE          →  append to target memory                      │
       │                                                                    │
       │   All cross-process via IPC queues                                 │
       └─────────────────────────────────────────────────────────────────────┘
```

---

## 1. End-to-End Flow

### 1.1 The Big Picture

```
 User (browser)            FastAPI Server              OpenCode (LLM)         OS Subprocesses
 ──────────────            ──────────────              ──────────────         ───────────────

  POST /chat ───────────→ Resolve mode
  {message, mode}          Build L1 prompt
                           (skill template + context)
                                │
                           ChatStreamRuntime ──────→ POST /session (create)
                                │                    POST /session/{id}/prompt_async
                                │                         │
                                │                    GET /global/event (SSE stream)
                                │                    ← text deltas, tool updates
                                │
                           ← SSE (NDJSON) ─────────→ User sees streaming response
                                │
                           Post-process:
                           parse XML action tags
                                │
                           ┌────┴──────────────────────────┐
                           │ <spawn_research>               │
                           │   → Runtime.spawn(L2)  ───────┼──→ ResearchAgent (L2 proc)
                           │ <steer_child>                  │     ├── Phase 0: Plan
                           │   → Runtime.steer(L2)          │     ├── Phase 1+: Iterate
                           │ <stop_child>                   │     │   ├── Design (OpenCode LLM)
                           │   → Runtime.stop(L2)           │     │   ├── spawn_child(L3)
                           │ <spawn_command>                │     │   │         │
                           │   → spawn_child(L3)            │     │   │    ┌────▼──────────┐
                           │         │                      │     │   │    │  SCHEDULER    │
                           │    ┌────▼──────────┐           │     │   │    │  Queue + GPU  │
                           │    │  SCHEDULER    │           │     │   │    │  gate → spawn │
                           │    │  Queue + GPU  │           │     │   │    └────┬──────────┘
                           │    │  gate → spawn │           │     │   │         │
                           │    └────┬──────────┘           │     │   │    ExecutorAgent (L3)
                           │         │                      │     │   └── Collect results
                           │    ExecutorAgent (L3)          │     └── Reflection
                           │ (direct tool use)              │
                           │   → OpenCode handles           │
                           └────────────────────────────────┘

  GET /sessions/{id}/stream ─→ SSE replay + live events
  GET /agents/events ────────→ EventRelay SSE (agent lifecycle)

  [L2/L3 completion] ───────→ Synthetic chat turn
                               (L1 auto-responds with results)
```

### 1.2 Detailed Chat Request Flow

**Step 1 — User sends message:**
```
POST /chat
{
  "session_id": "abc123",
  "message": "Optimize the learning rate schedule",
  "mode": "auto"          // "wild" | "auto" | "agent"
}
```

**Step 2 — Build L1 prompt:**
- Resolve mode → select skill template (`l1_wild_session`, `l1_auto_session`,
  or `l1_agent_session`)
- Get or create per-session `SessionAgent` instance
- Call `agent.build_wild_prompt()` which renders the skill template with:
  - `{{children_status}}` — active/completed L2 experiments
  - `{{memories}}` — reflection bank from past sessions
  - `{{experiment_context}}` — runs/sweeps/alerts summary
  - `{{workdir}}` — project root directory
  - `{{conversation_history}}` — last 10 chat messages
- Append `[USER] {message}` to rendered prompt

**Step 3 — Create ChatStreamRuntime:**
- In-memory object that tracks the streaming response
- Generates unique `run_id`, initializes event list, subscriber queues
- Stored in `active_chat_streams[session_id]`

**Step 4 — Background worker sends prompt to OpenCode:**
- Get or create OpenCode session: `POST {OPENCODE_URL}/session?directory={workdir}`
- Send prompt: `POST {OPENCODE_URL}/session/{id}/prompt_async`
  with `{model: {providerID, modelID}, parts: [{type: "text", text: prompt}]}`
- Stream SSE: `GET {OPENCODE_URL}/global/event`
  - `message.part.delta` → incremental text
  - `message.part.updated` → tool call state
  - `session.status: idle` → done

**Step 5 — Stream to frontend:**
- Each SSE event is wrapped with a sequence number and broadcast to subscribers
- Frontend receives NDJSON stream: text deltas, thinking deltas, tool updates
- Periodic snapshot persisted to `chat_data.json` (debounced every 0.75s)

**Step 6 — Post-process L1 response (agent/wild/auto modes only):**
- Call `agent.handle_llm_response(full_text)` which parses XML action tags:
  - `<spawn_research>` → `agent.start_experiment()` → spawns L2 ResearchAgent
  - `<spawn_command>` → `agent.start_run()` → spawns L3 ExecutorAgent
  - `<steer_child>` → `runtime.steer()` → injects context into running L2
  - `<stop_child>` → `runtime.stop()` → terminates running agent
- Strip XML tags from text before saving assistant message

**Step 7 — Finalize:**
- Save assistant message (content + thinking + tool parts) to session
- Auto-name session from OpenCode title on first turn
- Send `session_status: idle` terminal event
- 120-second retention window for client reconnect

### 1.3 Synthetic Chat Turns (Reactive)

When an L2/L3 child finishes, the system automatically generates a follow-up:

```
L1.on_monitor() detects child completion
  → reads child results (run.log, iteration_log.md)
  → calls _on_child_complete_callback(session_id, system_message)
  → trigger_synthetic_chat_turn()
      → build_wild_prompt(system_message, ...)
      → _start_chat_worker() → OpenCode → SSE stream
      → L1 summarizes results for user
```

The frontend sees a continuous SSE stream — the user gets automatic result
summaries without refreshing or polling.

---

## 2. Level 1 — SessionAgent (LLM-Enabled Orchestrator)

### 2.1 Overview

The L1 SessionAgent is the **user-facing gateway**. Unlike a traditional
orchestrator that just routes HTTP requests, the L1 has an **LLM brain** —
every user message goes through OpenCode, and the LLM's structured response
drives what happens next.

```
┌──────────────────────────────────────────────────────────────────────┐
│  Level 1 (L1) — SessionAgent         [in-process, one per chat     │
│  role = "orchestrator"                 session, LLM-powered]        │
│  allowed_children = {orchestrator, executor, sidecar}               │
│  monitor_interval = 5.0s                                            │
│                                                                      │
│  The L1 does NOT solve problems itself. It:                         │
│    1. Receives user messages via the chat system                    │
│    2. Gets its prompt rendered with context (mode-specific)         │
│    3. OpenCode generates a response (streamed to user)              │
│    4. Post-processing parses XML action tags from the response      │
│    5. Actions are executed: spawn L2, steer L2, stop L2             │
│    6. Monitors children, triggers synthetic turns on completion     │
│                                                                      │
│  Three operating modes (selected per chat message):                 │
│    WILD  — must spawn research for every request                    │
│    AUTO  — LLM decides: direct tools vs spawn research              │
│    AGENT — direct tool use only, no research spawning               │
│                                                                      │
│  Key methods:                                                       │
│    build_wild_prompt(message, session_id, skill_manager, skill_id)  │
│    handle_llm_response(full_text) → list[action]                    │
│    start_experiment(goal, config) → agent_id                        │
│    start_run(command, name, workdir) → agent_id                     │
│    on_monitor() → detect child completion, trigger callbacks        │
│                                                                      │
│  Instance management:                                               │
│    _session_agents: dict[chat_session_id → SessionAgent]            │
│    _get_or_create_session_agent(session_id) → SessionAgent          │
│    One L1 per chat session, created on demand                       │
└──────────────────────────────────────────────────────────────────────┘
```

### 2.2 The Three Modes

| Aspect | WILD | AUTO | AGENT |
|--------|------|------|-------|
| Skill template | `l1_wild_session` | `l1_auto_session` | `l1_agent_session` |
| Spawn research | **MANDATORY** | LLM decides | **NEVER** |
| Direct tool use | No | Yes | Yes |
| Steer/stop children | Yes | Yes | No |
| Best for | Full automation | General use | Quick Q&A, debugging |

**WILD mode** — the LLM MUST output `<spawn_research>` for every request. No
exceptions. The agent is purely a delegation gateway. Use when you want
autonomous research for all tasks.

**AUTO mode** — the LLM freely chooses between using tools directly (bash, file
read/write, code editing via OpenCode) and spawning research experiments. The
prompt gives heuristics: spawn research if the task needs iteration, try
multiple approaches, or would take more than ~3 tool calls. Use for balanced
autonomy.

**AGENT mode** — the LLM uses tools directly to answer questions and complete
tasks. No research spawning. The agent handles everything in the conversation
turn. Use for interactive debugging and development.

### 2.3 XML Action Tags

The L1's response is parsed for structured action blocks:

```xml

<spawn_research>
goal: Investigate learning rate sensitivity with ablation studies
max_iterations: 15
</spawn_research>


<spawn_command>
goal: Run the baseline training script
command: python train.py --lr 0.001 --epochs 50
workdir: /path/to/project
</spawn_command>


<steer_child>
experiment_id: orchestrator-abc123
context: Focus on reward shaping instead of policy gradient
</steer_child>


<stop_child>
experiment_id: orchestrator-abc123
</stop_child>
```

**Parsing detail:** The parser uses `_last_match()` (last regex occurrence) to
avoid matching template examples in the prompt itself — only the LLM's actual
output is parsed.

### 2.4 Prompt Construction (build_wild_prompt)

```
┌─────────────────────────────────────────────────────────┐
│  Skill Template (SKILL.md)                              │
│  ┌───────────────────────────────────────────────────┐  │
│  │  # Session Agent                                  │  │
│  │  ...role instructions...                          │  │
│  │  ## Working Directory                             │  │
│  │  {{workdir}}                                      │  │
│  │  ## Active Experiments                            │  │
│  │  {{children_status}}                              │  │
│  │  {{experiment_context}}                           │  │
│  │  ## Memory Bank                                   │  │
│  │  {{memories}}                                     │  │
│  │  ## Recent Conversation                           │  │
│  │  {{conversation_history}}                         │  │
│  └───────────────────────────────────────────────────┘  │
│                                                         │
│  + [USER] {user_message}                                │
└─────────────────────────────────────────────────────────┘
```

Context variables filled dynamically:
- `children_status` — markdown list of active/completed L2 experiments
- `memories` — last 20 REFLECTION entries (cross-session memory bank)
- `experiment_context` — active runs count, finished/failed counts, alerts
- `workdir` — absolute path to project root
- `conversation_history` — last 10 messages (role + content, truncated)

### 2.5 L1 Tools Available via OpenCode

The LLM running inside OpenCode has access to OpenCode's built-in tools:
`bash`, `read`, `glob`, `grep`, `edit`, `write`, `task`, `webfetch`,
`websearch`, `codesearch`, `skill`, and others.

Additionally, MCP tools registered in `opencode.json` provide:
- `quick_bash` — shell execution with timeout (never hangs)
- `list_directory` — instant directory listing (pure Python)
- `create_run` / `start_run` — run management via the server API

**Important:** MCP tools require the MCP server process to start successfully.
The `opencode.json` must point to the correct Python interpreter (the project
venv at `.ra-venv/bin/python`, not the system Python).

---

## 3. Level 2 — ResearchAgent (Autonomous Research Loop)

### 3.1 Overview

The L2 ResearchAgent runs as an **OS subprocess** spawned by the L1 via
Runtime. It operates a full autonomous research loop: plan → iterate → reflect.
Each L2 instance handles one research experiment.

```
┌──────────────────────────────────────────────────────────────────────┐
│  Level 2 (L2) — ResearchAgent         [subprocess, one per         │
│  role = "orchestrator"                  experiment]                  │
│  allowed_children = {executor}                                      │
│  monitor_interval = 5.0s                                            │
│                                                                      │
│  Configuration (ResearchAgentConfig):                               │
│    opencode_url, model_provider, model_id, workdir,                 │
│    server_url, skills_dir, max_iterations, wait_seconds,            │
│    autonomy_level, evo_sweep_enabled                                │
│                                                                      │
│  Research Loop:                                                     │
│    Phase 0: Plan (one-shot LLM → task decomposition)               │
│    Phase 1+: Iterate (design → spawn L3s → collect results)        │
│    Reflection: learn + decide continue/stop                         │
│                                                                      │
│  LLM Integration:                                                   │
│    Direct HTTP to OpenCode (own sessions, own SSE streams)          │
│    Prompt templates from PromptSkillManager (loaded from disk)      │
│                                                                      │
│  L2 = BRAIN: designs experiments, analyzes results, reflects        │
│  L3 = HANDS: executes experiments (subprocess commands)             │
└──────────────────────────────────────────────────────────────────────┘
```

### 3.2 The Research Loop

```
┌─────────────────────────────────────────────────────────────────┐
│                    RESEARCH LOOP (L2)                            │
│                                                                 │
│  ┌──────────────────┐                                          │
│  │  Phase 0: PLAN   │  One-shot LLM call                      │
│  │  Iteration = 0   │  → Parse <plan>, write tasks.md          │
│  │                  │  → git commit                            │
│  │                  │  → PLAN entry to MemoryView              │
│  └────────┬─────────┘                                          │
│           │                                                    │
│           ↓                                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Phase 1+: ITERATE (until DONE or max_iterations)        │  │
│  │                                                          │  │
│  │  1. check_pause() + consume_steer()                      │  │
│  │  2. Snapshot files (git ls-files)                         │  │
│  │  3. Build iteration prompt (memories, steer, plan, ctx)   │  │
│  │  4. Send to OpenCode LLM                                 │  │
│  │  5. Parse response:                                      │  │
│  │     - <promise>DONE|WAITING|...</promise>                 │  │
│  │     - <plan>updated tasks</plan>                         │  │
│  │     - <summary>what happened</summary>                   │  │
│  │     - <experiment> specs (optional)                       │  │
│  │  6. If <experiment> tags:                                │  │
│  │     → spawn L3 ExecutorAgents per spec                   │  │
│  │     → poll 5s intervals, read run.log                    │  │
│  │     → format results for next iteration                  │  │
│  │  7. Diff files, count errors, track progress             │  │
│  │  8. git commit, append iteration_log.md                  │  │
│  │  9. Write entries to MemoryView, emit SSE events         │  │
│  │                                                          │  │
│  │  If promise == "DONE" → enter Reflection                 │  │
│  │  If promise == "WAITING" → sleep(wait_seconds)           │  │
│  │  Else → sleep(2s), continue                              │  │
│  └──────────┬───────────────────────────────────────────────┘  │
│             │                                                  │
│             ↓ (on DONE)                                        │
│  ┌──────────────────┐                                          │
│  │  REFLECTION      │  One-shot LLM call                      │
│  │                  │  → Parse <reflection>, <continue>,      │
│  │                  │    <memories> with [tag] prefix          │
│  │                  │  → Store as REFLECTION entries           │
│  │                  │  → If continue=yes: back to Phase 1     │
│  │                  │  → If continue=no: session done          │
│  └──────────────────┘                                          │
└─────────────────────────────────────────────────────────────────┘
```

### 3.3 OpenCode Interaction (L2)

The L2 communicates with the LLM via direct HTTP to OpenCode — separate from
the L1's OpenCode sessions:

```
_run_opencode_prompt(prompt)
  ├── 1. Create session:  POST /session?directory={workdir}
  ├── 2. Connect SSE:     GET /global/event (before prompt, no missed events)
  ├── 3. Send prompt:     POST /session/{id}/prompt_async
  │     payload = {model: {providerID, modelID}, parts: [{type: "text", text: ...}]}
  ├── 4. Stream events:
  │     message.part.delta   → incremental text
  │     message.part.updated → full text snapshots
  │     session.status: idle → done
  └── 5. Return (full_text, opencode_session_id)

Timeouts:
  - Time to first token: 180s
  - Total response: 600s
  - SSE line idle: 120s
```

### 3.4 Prompt Templates

L2 prompts are rendered from SKILL.md templates via `PromptSkillManager`:

```
skills/prompts/
  wild_v2_planning/SKILL.md     → Phase 0 (planning)
  wild_v2_iteration/SKILL.md    → Phase 1+ (iteration)
  wild_v2_reflection/SKILL.md   → Reflection
```

Each template has `{{variable}}` placeholders filled by `PromptContext`:

```python
@dataclass
class PromptContext:
    goal: str
    iteration: int
    max_iterations: int
    session_id: str
    workdir: str
    tasks_path: str           # .agents/wild/{session_id}/tasks.md
    log_path: str             # .agents/wild/{session_id}/iteration_log.md
    steer_context: str        # user steering (if any)
    history: list[dict]       # all iteration records
    no_progress_streak: int   # consecutive no-change iterations
    short_iteration_count: int
    autonomy_level: str       # "cautious" | "balanced" | "full"
    memories_text: str        # REFLECTION entries from MemoryView
    evo_sweep_enabled: bool
```

### 3.5 Session State (ResearchSession)

```python
@dataclass
class ResearchSession:
    session_id: str
    goal: str
    status: str = "running"       # running | paused | done | failed
    iteration: int = 0
    max_iterations: int = 25
    plan: str = ""
    history: list = []            # iteration records
    started_at: float = 0.0
    finished_at: float | None = None
    steer_context: str = ""
    opencode_sessions: list = []  # track all OC session IDs used
    reflection: str = ""
    no_progress_streak: int = 0
    short_iteration_count: int = 0
```

Persisted to `.agents/wild/{session_id}/state.json` on each iteration.

---

## 4. Level 3 — ExecutorAgent (Task Execution)

```
┌──────────────────────────────────────────────────────────────────────┐
│  Level 3 (L3) — ExecutorAgent          [subprocess, leaf node,      │
│  role = "executor"                       one per task/run]           │
│  allowed_children = frozenset()  (LEAF — cannot spawn)              │
│  monitor_interval = 3.0s (auto-enabled for subprocess backend)      │
│                                                                      │
│  Backends:                                                          │
│    "subprocess" — asyncio.create_subprocess_shell(command)           │
│    "callback"   — await config["callback"](goal)                    │
│    "mock"       — simulated delay + canned result (tests)           │
│                                                                      │
│  Subprocess Features:                                               │
│    - GPU detection via gpuwrap_detect.py                            │
│    - Retry on GPU conflict (CUDA busy / OOM patterns)               │
│    - Stdout streaming to run.log                                    │
│    - WANDB env var injection                                        │
│    - Exit code → RESULT entry                                       │
│                                                                      │
│  Watchdog (on_monitor, every 3s):                                   │
│    _tail_metrics() — reads agent_metrics.jsonl from run dir         │
│    _check_anomalies():                                              │
│      - NaN/Inf in loss → CRITICAL alert                             │
│      - High loss (>8.0) → WARNING alert                             │
│      - Loss spike (>3x rolling avg) → WARNING alert                │
│    Writes ALERT entries to FileStore on anomaly detection           │
│                                                                      │
│  Output files:                                                      │
│    .agents/runs/{agent_id}/command.txt   — shell command            │
│    .agents/runs/{agent_id}/run.log       — stdout/stderr stream     │
│    .agents/runs/{agent_id}/job.done      — exit code marker         │
│    .agents/runs/{agent_id}/agent_metrics.jsonl — training metrics   │
└──────────────────────────────────────────────────────────────────────┘
```

---

## 5. Scheduler — GPU-Aware Executor Queue

### 5.1 Overview

The Scheduler sits between spawn requests and actual process creation for
**executor (L3) agents only**. L1→L2 spawning bypasses it entirely.

```
                  spawn_child(ExecutorAgent, goal, gpu_required=2)
                      │
                      ▼
              ┌───────────────────┐
              │    Scheduler      │
              │                   │
              │  FIFO Queue       │  ← file-persisted (.json per request)
              │  GPU Tracker      │  ← nvidia-smi at init, tracks allocations
              │  Policy Engine    │  ← pluggable (default: FIFOPolicy)
              │                   │
              │  Tick loop (~1s): │
              │    policy.select_ │
              │    next() →       │
              │    Runtime.spawn() │
              └────────┬──────────┘
                       │
                       ▼
              Runtime.spawn(ExecutorAgent, ...)
                       │
                       ▼
                  OS Process (L3)
```

### 5.2 Key Types

```python
class RequestStatus(str, Enum):
    QUEUED   = "queued"     # waiting for resources
    RUNNING  = "running"    # agent spawned
    DONE     = "done"       # agent finished successfully
    FAILED   = "failed"     # agent failed or request cancelled
    REJECTED = "rejected"   # impossible to satisfy (e.g. gpu_required > total)

@dataclass
class ScheduleRequest:
    request_id: str         # "sched-{uuid8}"
    agent_cls_path: str     # "agentsys.agents.executor.ExecutorAgent"
    goal: str
    parent_id: str | None
    config: dict | None
    gpu_required: int = 0
    created_at: float
    status: RequestStatus
    agent_id: str | None    # set when spawned
    rejection_reason: str | None

@dataclass
class ScheduleTicket:
    request_id: str         # returned to caller for status queries
```

### 5.3 GPU Tracking

```python
class GPUTracker:
    total_gpus: int         # detect_total_gpus() via nvidia-smi at init
    _allocated: dict[str, int]  # request_id → gpu count

    can_allocate(n) → bool  # n <= total - sum(allocated)
    allocate(request_id, n) # reserve GPUs
    release(request_id)     # free GPUs (on agent DONE/FAILED)
```

GPU detection runs `nvidia-smi --list-gpus` at scheduler init. Falls back to
0 on systems without GPUs — all `gpu_required=0` requests pass through
immediately.

### 5.4 Scheduling Policy

```python
class SchedulingPolicy(ABC):
    def select_next(queue, running, gpu_tracker) → request_id | None

class FIFOPolicy(SchedulingPolicy):
    # Returns first queued request where gpu_required <= gpu_tracker.available
    # Skips requests that need more GPUs than currently free
```

The policy abstraction enables future algorithms (priority queues, fair
scheduling, deadline-aware) without changing the Scheduler core.

### 5.5 Scheduler Lifecycle

```
_init_agentsys()                    lifespan startup
  │                                   │
  ├── Scheduler(runtime, store_dir)   ├── scheduler.start()
  │     ├── detect_total_gpus()       │     └── create_task(_tick_loop)
  │     ├── mkdir _scheduler/         │
  │     └── _load_from_disk()         │
  │           └── orphaned RUNNING    │
  │              requests → FAILED    │
  │                                   │
  └── runtime._scheduler = scheduler  │
                                      │
                              lifespan shutdown
                                │
                                └── scheduler.shutdown()
                                      └── cancel tick task
```

### 5.6 Request Flow

```
1. Agent.spawn_child(ExecutorAgent, goal, gpu_required=2)
     │
     ├── scheduler.submit(cls, goal, parent_id, config, gpu_required)
     │     ├── If gpu_required > total_gpus → REJECTED immediately
     │     ├── Create ScheduleRequest (status=QUEUED)
     │     ├── Persist to {store_dir}/{request_id}.json
     │     └── Return ScheduleTicket
     │
     └── Return ScheduledChildHandle(ticket, scheduler)

2. Tick loop (every ~1s):
     ├── Collect QUEUED requests, sort by created_at
     ├── policy.select_next(queue, running, gpu_tracker)
     │     └── FIFOPolicy: first request where gpu fits
     ├── If selected:
     │     ├── Runtime.spawn(cls, goal, parent_id, config)
     │     ├── gpu_tracker.allocate(request_id, gpu_required)
     │     ├── request.status = RUNNING, request.agent_id = child.id
     │     └── Persist updated request
     └── If nothing fits: no-op (try again next tick)

3. Agent completes (DONE/FAILED event in Runtime):
     ├── Runtime._handle_event notifies scheduler.on_agent_done()
     ├── gpu_tracker.release(request_id)
     ├── request.status = DONE or FAILED
     └── Persist — freed GPU enables next queued request
```

### 5.7 ScheduledChildHandle

Duck-type compatible with `ChildHandle`. Returned when an executor spawn goes
through the scheduler.

```python
class ScheduledChildHandle:
    .id       → request_id (while queued) or agent_id (once spawned)
    .status   → IDLE (queued), RUNNING, DONE, FAILED (maps to AgentStatus)
    .wait()   → polls scheduler then agent status
    .cancel() → scheduler.cancel() if queued, runtime.stop() if running
```

### 5.8 Persistence

Each request is atomically persisted as `{store_dir}/{request_id}.json`:

```json
{
  "request_id": "sched-a1b2c3d4",
  "agent_cls_path": "agentsys.agents.executor.ExecutorAgent",
  "goal": "train model v2",
  "parent_id": "orchestrator-abc12345",
  "config": {"command": "python train.py", "workdir": "/exp"},
  "gpu_required": 2,
  "created_at": 1740000000.0,
  "status": "queued",
  "agent_id": null,
  "rejection_reason": null
}
```

On restart, `_load_from_disk()` restores all requests. Requests that were
`RUNNING` when the scheduler crashed are marked `FAILED` (orphaned processes).

### 5.9 Cleanup

- `cleanup_parent(parent_id)` — cancels all QUEUED requests for a dead parent.
  Called automatically by `Runtime.stop()`.
- `cancel(request_id)` — cancels a single QUEUED request.

### 5.10 API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/scheduler/queue` | GET | List requests (filter by `parent_id`, `status`) |
| `/scheduler/queue/{request_id}` | GET | Get single request |
| `/scheduler/stats` | GET | GPU availability + queue summary |

### 5.11 What Does NOT Go Through the Scheduler

- **L1→L2 spawning** — `cls.role == "orchestrator"` bypasses the scheduler.
- **In-process agents** — L1 `SessionAgent` registered via `register_local()`.
- **Sidecar agents** — `cls.role == "sidecar"` bypasses the scheduler.

Only `cls.role == "executor"` triggers the scheduler path, and only when
`runtime._scheduler` is set.

---

## 6. Runtime & IPC

### 6.1 Runtime — The Multiprocess Supervisor

Runtime sits between every parent-child pair. It is **not** an agent — it
manages OS processes, IPC queues, and the shared FileStore.

```
                        Runtime (server process)
                       ┌──────────────────────┐
                       │  FileStore           │ ← shared filesystem
                       │  _processes {}       │ ← OS Process objects
                       │  _cmd_queues {}      │ ← supervisor → worker
                       │  _event_queues {}    │ ← worker → supervisor
                       │  _agent_meta {}      │ ← in-memory cache
                       │  _local_agents {}    │ ← in-process (L1 only)
                       │  _event_listener     │ ← async polling task
                       └──────────┬───────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   │                   │
         ┌────┴────┐        ┌────┴────┐        ┌────┴────┐
         │ Worker  │        │ Worker  │        │ Worker  │
         │ Process │        │ Process │        │ Process │
         │ (L2)    │        │ (L3)    │        │ (L3)    │
         └─────────┘        └─────────┘        └─────────┘
```

**Multiprocessing model:** Uses `multiprocessing.get_context("spawn")` to avoid
fork-related deadlocks. Each worker gets its own asyncio event loop, its own
FileStore instance (same disk path), and a RuntimeProxy for IPC.

#### spawn() — Creating a Child Process

```
Runtime.spawn(cls, goal, parent_id, config, session, sweep, run)
  ├── 1. Validate hierarchy (parent.allowed_child_roles includes cls.role)
  ├── 2. Generate agent_id = f"{cls.role}-{uuid4().hex[:8]}"
  ├── 3. Build scope (inherited from parent)
  ├── 4. Create IPC queues (cmd_queue, event_queue)
  ├── 5. Write initial .meta.json to FileStore
  ├── 6. Create OS Process(target=run_worker, args=(...))
  ├── 7. process.start()
  ├── 8. Start event listener (if not running)
  └── 9. Return AgentInfo (lightweight metadata)
```

#### register_local() — In-Process Agent (L1 only)

```
Runtime.register_local(agent, goal, config)
  ├── Wire agent._runtime = self (direct reference, not proxy)
  ├── No subprocess created
  └── Caller must await agent.start() separately
```

### 6.2 Worker Process (worker.py)

```
run_worker(cls_path, agent_id, goal, config, scope_dict, ...)
  └── asyncio.run(_async_worker(...))
        ├── Import agent class: path_to_cls(cls_path)
        ├── Instantiate: agent = cls()
        ├── Create FileStore + MemoryView
        ├── Wire agent: id, goal, config, scope, memory, _runtime=RuntimeProxy
        ├── Write "spawned" CONTEXT entry
        ├── Start command listener (polls cmd_queue)
        ├── Send STARTED event
        ├── await agent.start() → await agent._task
        ├── Send DONE or FAILED event
        └── Write final .meta.json
```

### 6.3 RuntimeProxy — Agent-Side IPC

Workers communicate with the supervisor through RuntimeProxy:

```
Agent code                RuntimeProxy              Supervisor Runtime
──────────                ────────────              ────────────────
spawn_child(cls, goal)
  └──→ proxy.spawn(...)
         ├── event_queue.put(SPAWN_REQUEST)  ──→  _handle_event()
         ├── await future  (blocks)                  ├── Runtime.spawn()
         │                                           └── cmd_queue.put(SPAWN_RESPONSE)
         └── ← future.set_result(payload)    ←──  _listen_commands()
         return AgentInfo

stop(agent_id)
  └──→ proxy.stop(...)
         └── event_queue.put(STOP_REQUEST)   ──→  Runtime.stop(agent_id)
```

### 6.4 ChildHandle — Cross-Process Child Reference

When an agent spawns a child, it gets a `ChildHandle`:

```python
class ChildHandle:
    id: str
    _task: asyncio.Task | None       # in-process mode (L1 → L2)
    _agent: Agent | None             # in-process reference
    _store: FileStore | None         # multiprocess mode (reads .meta.json)
    _runtime_proxy: RuntimeProxy     # multiprocess IPC

    @property
    def status(self) -> AgentStatus:
        # In-process:   direct agent.status
        # Multiprocess: read .meta.json

    async def wait(self, timeout=None):
        # In-process:   await asyncio.shield(task)
        # Multiprocess: poll .meta.json until DONE/FAILED

    async def cancel(self):
        # runtime.stop(id) or runtime_proxy.stop(id)
```

### 6.5 IPC Message Types

**Supervisor → Worker (Command via cmd_queue):**

| CmdType | Payload | Effect |
|---------|---------|--------|
| `STOP` | `{}` | `agent.stop()` |
| `PAUSE` | `{}` | `agent.pause()` |
| `RESUME` | `{}` | `agent.resume()` |
| `STEER` | `{context, urgency}` | `agent._inject_steer()` |
| `SPAWN_RESPONSE` | `{request_id, child_id, ...}` | Fulfills pending spawn future |

**Worker → Supervisor (Event via event_queue):**

| EventType | Payload | Effect |
|-----------|---------|--------|
| `STARTED` | `{}` | Status → RUNNING |
| `DONE` | `{}` | Status → DONE |
| `FAILED` | `{error}` | Status → FAILED |
| `SPAWN_REQUEST` | `{request_id, cls_path, goal, ...}` | Runtime.spawn() |
| `STOP_REQUEST` | `{target_id}` | Runtime.stop() |

### 6.6 Agent ABC — Lifecycle

```
     ┌─────────────────────────────────────────────┐
     │   IDLE ──start()──→ RUNNING ──→ DONE        │
     │                       │  ↑        ↑         │
     │                 pause()│  │resume()│stop()   │
     │                       ↓  │        │         │
     │                     PAUSED ───────┘         │
     │                       │                     │
     │                  exception                  │
     │                       ↓                     │
     │                    FAILED                   │
     └─────────────────────────────────────────────┘

start()
  └── status = RUNNING → on_start() → create_task(_run_loop())

_run_loop()
  ├── start _watchdog_loop() if monitor_interval > 0
  ├── await run()              ← ABSTRACT (subclass implements)
  ├── status = DONE (or FAILED on exception)
  └── await on_stop()

_watchdog_loop()
  └── while RUNNING/PAUSED: sleep → check_pause() → on_monitor()
```

---

## 7. Memory System

### 7.1 Overview

```
┌───────────────┐     ┌───────────────┐     ┌──────────────────────┐
│    Agent      │     │  MemoryView   │     │     FileStore        │
│               │     │               │     │                      │
│ self.memory   │────→│ .write()      │────→│ atomic JSON write    │
│   .write()    │     │ .read_self()  │     │ to disk              │
│   .query()    │     │ .assemble_    │     │                      │
│               │     │   context()   │     │ .query() scans dirs  │
│               │     │               │     │ with 2-phase filter  │
│               │     │ auto-fills:   │     │                      │
│               │     │  agent_id     │     │ Dir structure:       │
│               │     │  project      │     │  {root}/{project}/   │
│               │     │  session      │     │    {agent_id}/       │
│               │     │  sweep        │     │      .meta.json      │
│               │     │  run          │     │      {entry}.json    │
│               │     │  role         │     │                      │
└───────────────┘     └───────────────┘     └──────────────────────┘
```

### 7.2 Entry — The Atomic Unit

```python
@dataclass
class Entry:
    key: str                # auto-generated 12-char hex
    agent_id: str           # who produced this
    target_id: str | None   # recipient (MESSAGE type only)
    type: EntryType         # PLAN, CONTEXT, METRICS, ALERT, RESULT, REFLECTION, MESSAGE, RAW_FILE
    project: str            # always set
    session: str | None     # L1 scope
    sweep: str | None       # L2 scope
    run: str | None         # L3 scope
    role: str               # producer's role
    tags: list[str]         # free-form labels
    data: dict              # arbitrary JSON payload
    created_at: float       # unix timestamp
```

| EntryType | Written By | Purpose |
|-----------|-----------|---------|
| `PLAN` | L2 | Task decomposition |
| `CONTEXT` | Any | Lifecycle events, status |
| `METRICS` | L3 | Training metrics (jsonl rows) |
| `ALERT` | L3 | Anomaly detection (NaN, loss spike) |
| `RESULT` | L3 | Task output (exit code, run dir) |
| `REFLECTION` | L2 | Lessons learned (persists across sessions) |
| `MESSAGE` | Any → Any | Inter-agent communication |
| `RAW_FILE` | External | Log files, scripts, artifacts |

### 7.3 FileStore — Flat File KV on Disk

```
{store_root}/{project}/{agent_id}/
  .meta.json
  {timestamp_ms}_{entry_type}_{12char_key}.json
```

- **Atomic writes:** `.tmp` → `os.rename()` (no partial reads)
- **2-phase query:** Phase 1 scans filenames (fast type/time filter), Phase 2
  reads JSON for scope/tag filtering
- **Filename encoding** embeds type for fast filtering without JSON parse

### 7.4 Hierarchical Scope & Context Assembly

```
Level 0: project    (always present)
Level 1: session    (research session)
Level 2: sweep      (parameter sweep)
Level 3: run        (individual training run)

Scope inheritance on spawn:
  SessionAgent  scope = {project: "mlexp"}
    └── ResearchAgent  scope = {project: "mlexp", session: "sess-42"}
          └── ExecutorAgent  scope = {project: "mlexp", session: "sess-42", run: "run-7"}
```

**Context assembly** (`MemoryView.assemble_context(token_budget)`) builds
prompt context by widening scope from narrow to broad:

```
Step 1: "My Entries" — agent's own entries (chronological)
Step 2: Widen by scope:
  ├── Sweep level:   sibling runs in same sweep
  ├── Session level: other sweeps in same session
  └── Project level: cross-session learnings (memory bank)
Step 3: Truncate to token budget
```

---

## 8. Communication Protocol

### 8.1 Three Tiers of Steer

| Tier | Mechanism | Delivery | Effect |
|------|-----------|----------|--------|
| **CRITICAL** | `Runtime.steer(id, ctx, CRITICAL)` | Kill + respawn | Agent restarts with `[STEER: ctx]` appended to goal |
| **PRIORITY** | `Runtime.steer(id, ctx, PRIORITY)` | IPC queue → buffer | Agent reads at next `consume_steer()` |
| **MESSAGE** | `agent.msg(target, data)` | FileStore entry | Target reads from inbox on next context assembly |

### 8.2 Event Relay — Runtime to Frontend

```
Agent._emit_event("iteration", {session_id, iteration, summary})
  │
  ├── In-process: runtime.emit_event() → EventRelay.emit()
  │     └── SSE listeners (GET /agents/events)
  │
  └── Subprocess: event_queue → runtime._handle_event()
        └── EventRelay.emit() → SSE listeners
```

EventRelay maintains a ring buffer of the last 1000 events. SSE clients can
reconnect and replay from a timestamp via `recent(n, since)`.

### 8.3 Stop Cascade

When any agent is stopped, Runtime cascades to all descendants (leaves first):

```
stop(orchestrator-xyz)
  ├── Collect descendants (BFS)
  ├── Reverse order (leaves first)
  └── For each: send STOP → join(5s) → terminate() if alive
```

---

## 9. Chat System Integration

### 9.1 ChatStreamRuntime

The in-memory object tracking a live streaming response:

```python
class ChatStreamRuntime:
    session_id: str
    run_id: str               # unique per response
    status: str               # "running" | "completed" | "failed" | "stopped"
    events: list[dict]        # sequenced event log
    next_sequence: int
    subscribers: set[Queue]   # live SSE client queues
    full_text: str            # accumulated LLM text
    full_thinking: str        # accumulated reasoning
    parts_accumulator: StreamPartsAccumulator
    started_at: float
    updated_at: float
```

### 9.2 StreamPartsAccumulator

Collects ordered message parts from the OpenCode SSE stream:

- **Text parts:** accumulated from `part_delta` events
- **Thinking parts:** accumulated from reasoning deltas
- **Tool parts:** keyed by ID, state updates amend existing entry
- `snapshot()` returns immutable ordered list for persistence
- `finalize()` flushes buffers and returns final parts list

### 9.3 SSE Event Format (NDJSON to frontend)

```json
{"seq": 1, "type": "provenance", "skill_id": "l1_auto_session", ...}
{"seq": 2, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "I'll "}
{"seq": 3, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "investigate..."}
{"seq": 4, "type": "part_update", "id": "prt_2", "ptype": "tool", "name": "bash", ...}
{"seq": 5, "type": "l1_action", "action_type": "spawn", "goal": "...", "agent_id": "..."}
{"seq": 6, "type": "session_status", "status": "idle"}
```

### 9.4 Persistence

Chat state is saved to `{WORKDIR}/.agents/chat_data.json`:

```json
{
  "chat_sessions": {
    "session_id": {
      "title": "Optimize learning rate",
      "created_at": 1771745054.69,
      "messages": [
        {"role": "user", "content": "...", "timestamp": ...},
        {"role": "assistant", "content": "...", "thinking": "...", "parts": [...]}
      ],
      "opencode_session_id": "ses_abc123",
      "model_provider": "opencode",
      "model_id": "minimax-m2.5-free",
      "active_stream": { ... }   // snapshot during streaming, removed on complete
    }
  }
}
```

---

## 10. MCP Tools

### 10.1 Research Agent MCP Server

Registered in `opencode.json` as a local MCP server, providing tools the LLM
can call during its response:

```json
"research-agent": {
  "type": "local",
  "command": ["./.ra-venv/bin/python", "./server/tools/research_agent_mcp.py"],
  "enabled": true
}
```

**Tools:**

| Tool | Purpose | Key Params |
|------|---------|------------|
| `quick_bash` | Shell command with timeout (never hangs) | `command`, `workdir`, `timeout_seconds` (max 120) |
| `list_directory` | Instant directory listing (pure Python) | `path`, `include_hidden` |
| `create_run` | Create a run via server API | `name`, `command`, `workdir`, `launch_policy` |
| `start_run` | Start an existing run | `run_id` |

### 10.2 SLURM MCP Server

For HPC cluster environments:

```json
"slurm-cli": {
  "type": "local",
  "command": ["./.ra-venv/bin/python", "./server/tools/slurm_mcp_cli.py"],
  "enabled": true
}
```

Provides wrappers for `squeue`, `sinfo`, `sbatch`, `scancel`, `sacct`, `scontrol`.

### 10.3 Context7 (Remote)

```json
"context7": {
  "type": "remote",
  "url": "https://mcp.context7.com/mcp",
  "enabled": true
}
```

---

## 11. Server Bootstrap

### 11.1 Startup Sequence

```
python server.py --workdir ../../sandbox-wild-test --port 10000
  │
  ├── Parse arguments (workdir, port, host, tmux-session)
  ├── init_paths(workdir)
  │     → WORKDIR, DATA_DIR, CHAT_DATA_FILE, etc.
  │     → mkdir .agents/, .agents/runs/
  │
  ├── _init_agentsys()
  │     ├── Create FileStore at .agents/store
  │     ├── Create Runtime (multiprocess supervisor)
  │     ├── Bridge Runtime events → EventRelay
  │     └── Initialize route modules with runtime refs
  │
  ├── Load persisted state
  │     ├── chat_data.json → chat_sessions
  │     ├── jobs.json → runs, sweeps
  │     ├── alerts.json → alerts
  │     ├── plans.json → plans
  │     ├── journey_state.json → journey
  │     └── settings.json → cluster config
  │
  └── uvicorn.run(app, host, port)
```

### 11.2 Key Configuration

| Variable | Default | Source |
|----------|---------|--------|
| `MODEL_PROVIDER` | `"opencode"` | env `MODEL_PROVIDER` |
| `MODEL_ID` | `"minimax-m2.5-free"` | env `MODEL_ID` |
| `OPENCODE_URL` | `"http://127.0.0.1:4096"` | env `OPENCODE_URL` |
| `OPENCODE_USERNAME` | `"opencode"` | env `OPENCODE_SERVER_USERNAME` |
| `USER_AUTH_TOKEN` | None | env `RESEARCH_AGENT_USER_AUTH_TOKEN` |
| `TMUX_SESSION_NAME` | `"research-agent"` | env `RESEARCH_AGENT_TMUX_SESSION` |
| `SERVER_CALLBACK_URL` | `"http://127.0.0.1:10000"` | computed |

---

## 12. File Inventory

### Core Framework (`agentsys/`)

| File | Purpose |
|------|---------|
| `types.py` | AgentStatus, EntryType, SteerUrgency, Entry, Steer, Scope, AgentInfo |
| `agent.py` | Agent ABC, ChildHandle, lifecycle, steer, spawn_child |
| `runtime.py` | Multiprocess supervisor, spawn, stop, steer routing, event listener |
| `worker.py` | Subprocess entry point, wires agent + proxy + store |
| `proxy.py` | RuntimeProxy — agent-side IPC (spawn requests, command listener) |
| `ipc.py` | CmdType, EventType, Command, Event, cls_to_path, path_to_cls |
| `filestore.py` | Flat file KV store, atomic writes, 2-phase query |
| `memory.py` | MemoryView — scoped interface, context assembly |

### Agent Implementations (`agentsys/agents/`)

| File | Level | Purpose |
|------|-------|---------|
| `session_agent.py` | L1 | LLM-enabled orchestrator, per-session, build_wild_prompt, handle_llm_response |
| `research_agent.py` | L2 | Autonomous research loop (plan → iterate → reflect), OpenCode integration |
| `executor.py` | L3 | Subprocess runner, GPU handling, metrics tailing, anomaly detection |

### Chat System (`chat/`)

| File | Purpose |
|------|---------|
| `routes.py` | Chat CRUD, POST /chat, _chat_worker, synthetic turns, mode routing |
| `streaming.py` | OpenCode SSE parsing, ChatStreamRuntime, StreamPartsAccumulator, event broadcasting |

### Agent Routes & Support (`agent/`)

| File | Purpose |
|------|---------|
| `wild_routes.py` | Per-session L1 registry, wild mode endpoints (start/stop/steer/status) |
| `v2_prompts.py` | PromptContext, XML tag parsers (spawn_research, steer_child, etc.) |
| `core/event_relay.py` | EventRelay: Runtime events → SSE stream |
| `runtime_routes.py` | SSE endpoint (GET /agents/events), agent management |

### Prompt Templates (`skills/prompts/`)

| Directory | Purpose |
|-----------|---------|
| `l1_wild_session/` | L1 prompt: forced research mode |
| `l1_auto_session/` | L1 prompt: agent-decides mode |
| `l1_agent_session/` | L1 prompt: direct tool use mode |
| `wild_v2_planning/` | L2 prompt: Phase 0 planning |
| `wild_v2_iteration/` | L2 prompt: Phase 1+ iteration |
| `wild_v2_reflection/` | L2 prompt: reflection |

### MCP Tools (`tools/`)

| File | Purpose |
|------|---------|
| `research_agent_mcp.py` | FastMCP server: quick_bash, list_directory, create_run, start_run |
| `slurm_mcp_cli.py` | FastMCP server: SLURM job management wrappers |

### Configuration (`core/`)

| File | Purpose |
|------|---------|
| `config.py` | Paths, auth, model helpers, OpenCode config loading |
| `state.py` | Global state dicts, JSON persistence (chat, runs, alerts, etc.) |

### Data Files (`.agents/`)

```
{workdir}/.agents/
  chat_data.json              # chat sessions + messages
  jobs.json                   # runs + sweeps
  alerts.json                 # active alerts
  settings.json               # cluster config
  plans.json                  # plans
  journey_state.json          # journey events
  store/{project}/            # FileStore root
    {agent_id}/
      .meta.json
      {timestamp}_{type}_{key}.json
  wild/{session_id}/          # L2 research session state
    state.json
    history.json
    tasks.md
    iteration_log.md
  runs/{agent_id}/            # L3 executor output
    command.txt
    run.log
    job.done
    agent_metrics.jsonl
```

CmdType	Payload	Effect
`STOP`	`{}`	`agent.stop()`
`PAUSE`	`{}`	`agent.pause()`
`RESUME`	`{}`	`agent.resume()`
`STEER`	`{context, urgency}`	`agent._inject_steer()`
`SPAWN_RESPONSE`	`{request_id, child_id, ...}`	Fulfills pending spawn future

EventType	Payload	Effect
`STARTED`	`{}`	Status → RUNNING
`DONE`	`{}`	Status → DONE
`FAILED`	`{error}`	Status → FAILED
`SPAWN_REQUEST`	`{request_id, cls_path, goal, ...}`	Runtime.spawn()
`STOP_REQUEST`	`{target_id}`	Runtime.stop()

Tool	Purpose	Key Params
`quick_bash`	Shell command with timeout (never hangs)	`command`, `workdir`, `timeout_seconds` (max 120)
`list_directory`	Instant directory listing (pure Python)	`path`, `include_hidden`
`create_run`	Create a run via server API	`name`, `command`, `workdir`, `launch_policy`
`start_run`	Start an existing run	`run_id`

Variable	Default	Source
`MODEL_PROVIDER`	`"opencode"`	env `MODEL_PROVIDER`
`MODEL_ID`	`"minimax-m2.5-free"`	env `MODEL_ID`
`OPENCODE_URL`	`"http://127.0.0.1:4096"`	env `OPENCODE_URL`
`OPENCODE_USERNAME`	`"opencode"`	env `OPENCODE_SERVER_USERNAME`
`USER_AUTH_TOKEN`	None	env `RESEARCH_AGENT_USER_AUTH_TOKEN`
`TMUX_SESSION_NAME`	`"research-agent"`	env `RESEARCH_AGENT_TMUX_SESSION`
`SERVER_CALLBACK_URL`	`"http://127.0.0.1:10000"`	computed

Aspect	WILD	AUTO	AGENT
Skill template	`l1_wild_session`	`l1_auto_session`	`l1_agent_session`
Spawn research	MANDATORY	LLM decides	NEVER
Direct tool use	No	Yes	Yes
Steer/stop children	Yes	Yes	No
Best for	Full automation	General use	Quick Q&A, debugging

Endpoint	Method	Purpose
`/scheduler/queue`	GET	List requests (filter by `parent_id`, `status`)
`/scheduler/queue/{request_id}`	GET	Get single request
`/scheduler/stats`	GET	GPU availability + queue summary

EntryType	Written By	Purpose
`PLAN`	L2	Task decomposition
`CONTEXT`	Any	Lifecycle events, status
`METRICS`	L3	Training metrics (jsonl rows)
`ALERT`	L3	Anomaly detection (NaN, loss spike)
`RESULT`	L3	Task output (exit code, run dir)
`REFLECTION`	L2	Lessons learned (persists across sessions)
`MESSAGE`	Any → Any	Inter-agent communication
`RAW_FILE`	External	Log files, scripts, artifacts

Tier	Mechanism	Delivery	Effect
CRITICAL	`Runtime.steer(id, ctx, CRITICAL)`	Kill + respawn	Agent restarts with `[STEER: ctx]` appended to goal
PRIORITY	`Runtime.steer(id, ctx, PRIORITY)`	IPC queue → buffer	Agent reads at next `consume_steer()`
MESSAGE	`agent.msg(target, data)`	FileStore entry	Target reads from inbox on next context assembly

File	Purpose
`types.py`	AgentStatus, EntryType, SteerUrgency, Entry, Steer, Scope, AgentInfo
`agent.py`	Agent ABC, ChildHandle, lifecycle, steer, spawn_child
`runtime.py`	Multiprocess supervisor, spawn, stop, steer routing, event listener
`worker.py`	Subprocess entry point, wires agent + proxy + store
`proxy.py`	RuntimeProxy — agent-side IPC (spawn requests, command listener)
`ipc.py`	CmdType, EventType, Command, Event, cls_to_path, path_to_cls
`filestore.py`	Flat file KV store, atomic writes, 2-phase query
`memory.py`	MemoryView — scoped interface, context assembly

File	Level	Purpose
`session_agent.py`	L1	LLM-enabled orchestrator, per-session, build_wild_prompt, handle_llm_response
`research_agent.py`	L2	Autonomous research loop (plan → iterate → reflect), OpenCode integration
`executor.py`	L3	Subprocess runner, GPU handling, metrics tailing, anomaly detection

File	Purpose
`routes.py`	Chat CRUD, POST /chat, _chat_worker, synthetic turns, mode routing
`streaming.py`	OpenCode SSE parsing, ChatStreamRuntime, StreamPartsAccumulator, event broadcasting

File	Purpose
`wild_routes.py`	Per-session L1 registry, wild mode endpoints (start/stop/steer/status)
`v2_prompts.py`	PromptContext, XML tag parsers (spawn_research, steer_child, etc.)
`core/event_relay.py`	EventRelay: Runtime events → SSE stream
`runtime_routes.py`	SSE endpoint (GET /agents/events), agent management

Directory	Purpose
`l1_wild_session/`	L1 prompt: forced research mode
`l1_auto_session/`	L1 prompt: agent-decides mode
`l1_agent_session/`	L1 prompt: direct tool use mode
`wild_v2_planning/`	L2 prompt: Phase 0 planning
`wild_v2_iteration/`	L2 prompt: Phase 1+ iteration
`wild_v2_reflection/`	L2 prompt: reflection

File	Purpose
`research_agent_mcp.py`	FastMCP server: quick_bash, list_directory, create_run, start_run
`slurm_mcp_cli.py`	FastMCP server: SLURM job management wrappers

File	Purpose
`config.py`	Paths, auth, model helpers, OpenCode config loading
`state.py`	Global state dicts, JSON persistence (chat, runs, alerts, etc.)

AgentSys: Design doc #294

Description

AgentSys— Backend Parrellel Agent System Design Document

0. Holistic View

1. End-to-End Flow

1.1 The Big Picture

1.2 Detailed Chat Request Flow

1.3 Synthetic Chat Turns (Reactive)

2. Level 1 — SessionAgent (LLM-Enabled Orchestrator)

2.1 Overview

2.2 The Three Modes

2.3 XML Action Tags

2.4 Prompt Construction (build_wild_prompt)

2.5 L1 Tools Available via OpenCode

3. Level 2 — ResearchAgent (Autonomous Research Loop)

3.1 Overview

3.2 The Research Loop

3.3 OpenCode Interaction (L2)

3.4 Prompt Templates

3.5 Session State (ResearchSession)

4. Level 3 — ExecutorAgent (Task Execution)

5. Scheduler — GPU-Aware Executor Queue

5.1 Overview

5.2 Key Types

5.3 GPU Tracking

5.4 Scheduling Policy

5.5 Scheduler Lifecycle

5.6 Request Flow

5.7 ScheduledChildHandle

5.8 Persistence

5.9 Cleanup

5.10 API Endpoints

5.11 What Does NOT Go Through the Scheduler

6. Runtime & IPC

6.1 Runtime — The Multiprocess Supervisor

spawn() — Creating a Child Process

register_local() — In-Process Agent (L1 only)

6.2 Worker Process (worker.py)

6.3 RuntimeProxy — Agent-Side IPC

6.4 ChildHandle — Cross-Process Child Reference

6.5 IPC Message Types

6.6 Agent ABC — Lifecycle

7. Memory System

7.1 Overview

7.2 Entry — The Atomic Unit

7.3 FileStore — Flat File KV on Disk

7.4 Hierarchical Scope & Context Assembly

8. Communication Protocol

8.1 Three Tiers of Steer

8.2 Event Relay — Runtime to Frontend

8.3 Stop Cascade

9. Chat System Integration

9.1 ChatStreamRuntime

9.2 StreamPartsAccumulator

9.3 SSE Event Format (NDJSON to frontend)

9.4 Persistence

10. MCP Tools

10.1 Research Agent MCP Server

10.2 SLURM MCP Server

10.3 Context7 (Remote)

11. Server Bootstrap

11.1 Startup Sequence

11.2 Key Configuration

12. File Inventory

Core Framework (agentsys/)

Agent Implementations (agentsys/agents/)

Chat System (chat/)

Agent Routes & Support (agent/)

Prompt Templates (skills/prompts/)

MCP Tools (tools/)

Configuration (core/)

Data Files (.agents/)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Core Framework (`agentsys/`)

Agent Implementations (`agentsys/agents/`)

Chat System (`chat/`)

Agent Routes & Support (`agent/`)

Prompt Templates (`skills/prompts/`)

MCP Tools (`tools/`)

Configuration (`core/`)

Data Files (`.agents/`)