Skip to content

AgentSys: Design doc #294

@JingwenGu0829

Description

@JingwenGu0829

AgentSys— Backend Parrellel Agent System Design Document

Architecture reference for the multi-agent backend powering the Research Agent
application. Covers the 3-level agent hierarchy, chat integration, LLM
orchestration, memory system, and inter-process communication.


Image Note: This is a Gemini-generated graph so there's detail misaligning with the actual design. The graph is just serving for a rough understanding of the broader picture. Refer to the design doc below for every detail in the system.

0. Holistic View

The Research Agent backend is a 3-level multi-process agent system built on
top of a FastAPI server and an external LLM gateway (OpenCode). A user chats in
the browser; every message flows through an LLM-powered Level 1 agent that
decides — based on its operating mode — whether to answer directly, spawn
autonomous research experiments, or manage running children.

       ┌─────────────────────────────────────────────────────────────────────┐
       │                     DIMENSION 1: AGENTS                            │
       │                                                                    │
       │   Browser ──HTTP──→ FastAPI Server ──OpenCode──→ LLM Gateway      │
       │                          │                                         │
       │                    SessionAgent (L1)                               │
       │                   [LLM brain, in-process]                          │
       │                   [3 modes: wild/auto/agent]                       │
       │                          │                                         │
       │            ┌─────────────┼─────────────┐                           │
       │            │ Runtime IPC │             │ Runtime IPC               │
       │            ▼             ▼             ▼                           │
       │     ResearchAgent   ResearchAgent   ExecutorAgent ─┐              │
       │         (L2)            (L2)           (L3)        │              │
       │          │               │                         │              │
       │          │ spawn_child   │ spawn_child             │              │
       │          ▼               ▼                         │              │
       │    ┌─────────────────────────────────┐             │              │
       │    │       SCHEDULER                 │◄────────────┘              │
       │    │  ┌─────────┐  ┌────────────┐   │                            │
       │    │  │ FIFO    │  │ GPU        │   │ ← all executor spawns      │
       │    │  │ Queue   │  │ Tracker    │   │   routed here              │
       │    │  └────┬────┘  └─────┬──────┘   │                            │
       │    │       └──────┬──────┘          │                            │
       │    │         Policy Engine           │                            │
       │    │         (tick ~1s)              │                            │
       │    └──────────────┬─────────────────┘                            │
       │                   │ approve & spawn                               │
       │                   ▼                                               │
       │           ExecutorAgent (L3)  ×N                                  │
       │          [subprocess, per task/run]                                │
       │          [GPU-gated, queued]                                       │
       └─────────────────────────────────────────────────────────────────────┘

       ┌─────────────────────────────────────────────────────────────────────┐
       │                     DIMENSION 2: MEMORY                            │
       │                                                                    │
       │    Agent  →  MemoryView  →  FileStore                              │
       │   (scoped)   (auto-fill)   (flat KV on disk)                      │
       │                                                                    │
       │    Hierarchical scope queries:                                     │
       │    project > session > sweep > run                                 │
       └─────────────────────────────────────────────────────────────────────┘

       ┌─────────────────────────────────────────────────────────────────────┐
       │                  DIMENSION 3: COMMUNICATION                        │
       │                                                                    │
       │   CRITICAL steer  →  kill + respawn                                │
       │   PRIORITY steer  →  buffered, polled                              │
       │   MESSAGE          →  append to target memory                      │
       │                                                                    │
       │   All cross-process via IPC queues                                 │
       └─────────────────────────────────────────────────────────────────────┘

1. End-to-End Flow

1.1 The Big Picture

 User (browser)            FastAPI Server              OpenCode (LLM)         OS Subprocesses
 ──────────────            ──────────────              ──────────────         ───────────────

  POST /chat ───────────→ Resolve mode
  {message, mode}          Build L1 prompt
                           (skill template + context)
                                │
                           ChatStreamRuntime ──────→ POST /session (create)
                                │                    POST /session/{id}/prompt_async
                                │                         │
                                │                    GET /global/event (SSE stream)
                                │                    ← text deltas, tool updates
                                │
                           ← SSE (NDJSON) ─────────→ User sees streaming response
                                │
                           Post-process:
                           parse XML action tags
                                │
                           ┌────┴──────────────────────────┐
                           │ <spawn_research>               │
                           │   → Runtime.spawn(L2)  ───────┼──→ ResearchAgent (L2 proc)
                           │ <steer_child>                  │     ├── Phase 0: Plan
                           │   → Runtime.steer(L2)          │     ├── Phase 1+: Iterate
                           │ <stop_child>                   │     │   ├── Design (OpenCode LLM)
                           │   → Runtime.stop(L2)           │     │   ├── spawn_child(L3)
                           │ <spawn_command>                │     │   │         │
                           │   → spawn_child(L3)            │     │   │    ┌────▼──────────┐
                           │         │                      │     │   │    │  SCHEDULER    │
                           │    ┌────▼──────────┐           │     │   │    │  Queue + GPU  │
                           │    │  SCHEDULER    │           │     │   │    │  gate → spawn │
                           │    │  Queue + GPU  │           │     │   │    └────┬──────────┘
                           │    │  gate → spawn │           │     │   │         │
                           │    └────┬──────────┘           │     │   │    ExecutorAgent (L3)
                           │         │                      │     │   └── Collect results
                           │    ExecutorAgent (L3)          │     └── Reflection
                           │ (direct tool use)              │
                           │   → OpenCode handles           │
                           └────────────────────────────────┘

  GET /sessions/{id}/stream ─→ SSE replay + live events
  GET /agents/events ────────→ EventRelay SSE (agent lifecycle)

  [L2/L3 completion] ───────→ Synthetic chat turn
                               (L1 auto-responds with results)

1.2 Detailed Chat Request Flow

Step 1 — User sends message:

POST /chat
{
  "session_id": "abc123",
  "message": "Optimize the learning rate schedule",
  "mode": "auto"          // "wild" | "auto" | "agent"
}

Step 2 — Build L1 prompt:

  • Resolve mode → select skill template (l1_wild_session, l1_auto_session,
    or l1_agent_session)
  • Get or create per-session SessionAgent instance
  • Call agent.build_wild_prompt() which renders the skill template with:
    • {{children_status}} — active/completed L2 experiments
    • {{memories}} — reflection bank from past sessions
    • {{experiment_context}} — runs/sweeps/alerts summary
    • {{workdir}} — project root directory
    • {{conversation_history}} — last 10 chat messages
  • Append [USER] {message} to rendered prompt

Step 3 — Create ChatStreamRuntime:

  • In-memory object that tracks the streaming response
  • Generates unique run_id, initializes event list, subscriber queues
  • Stored in active_chat_streams[session_id]

Step 4 — Background worker sends prompt to OpenCode:

  • Get or create OpenCode session: POST {OPENCODE_URL}/session?directory={workdir}
  • Send prompt: POST {OPENCODE_URL}/session/{id}/prompt_async
    with {model: {providerID, modelID}, parts: [{type: "text", text: prompt}]}
  • Stream SSE: GET {OPENCODE_URL}/global/event
    • message.part.delta → incremental text
    • message.part.updated → tool call state
    • session.status: idle → done

Step 5 — Stream to frontend:

  • Each SSE event is wrapped with a sequence number and broadcast to subscribers
  • Frontend receives NDJSON stream: text deltas, thinking deltas, tool updates
  • Periodic snapshot persisted to chat_data.json (debounced every 0.75s)

Step 6 — Post-process L1 response (agent/wild/auto modes only):

  • Call agent.handle_llm_response(full_text) which parses XML action tags:
    • <spawn_research>agent.start_experiment() → spawns L2 ResearchAgent
    • <spawn_command>agent.start_run() → spawns L3 ExecutorAgent
    • <steer_child>runtime.steer() → injects context into running L2
    • <stop_child>runtime.stop() → terminates running agent
  • Strip XML tags from text before saving assistant message

Step 7 — Finalize:

  • Save assistant message (content + thinking + tool parts) to session
  • Auto-name session from OpenCode title on first turn
  • Send session_status: idle terminal event
  • 120-second retention window for client reconnect

1.3 Synthetic Chat Turns (Reactive)

When an L2/L3 child finishes, the system automatically generates a follow-up:

L1.on_monitor() detects child completion
  → reads child results (run.log, iteration_log.md)
  → calls _on_child_complete_callback(session_id, system_message)
  → trigger_synthetic_chat_turn()
      → build_wild_prompt(system_message, ...)
      → _start_chat_worker() → OpenCode → SSE stream
      → L1 summarizes results for user

The frontend sees a continuous SSE stream — the user gets automatic result
summaries without refreshing or polling.


2. Level 1 — SessionAgent (LLM-Enabled Orchestrator)

2.1 Overview

The L1 SessionAgent is the user-facing gateway. Unlike a traditional
orchestrator that just routes HTTP requests, the L1 has an LLM brain
every user message goes through OpenCode, and the LLM's structured response
drives what happens next.

┌──────────────────────────────────────────────────────────────────────┐
│  Level 1 (L1) — SessionAgent         [in-process, one per chat     │
│  role = "orchestrator"                 session, LLM-powered]        │
│  allowed_children = {orchestrator, executor, sidecar}               │
│  monitor_interval = 5.0s                                            │
│                                                                      │
│  The L1 does NOT solve problems itself. It:                         │
│    1. Receives user messages via the chat system                    │
│    2. Gets its prompt rendered with context (mode-specific)         │
│    3. OpenCode generates a response (streamed to user)              │
│    4. Post-processing parses XML action tags from the response      │
│    5. Actions are executed: spawn L2, steer L2, stop L2             │
│    6. Monitors children, triggers synthetic turns on completion     │
│                                                                      │
│  Three operating modes (selected per chat message):                 │
│    WILD  — must spawn research for every request                    │
│    AUTO  — LLM decides: direct tools vs spawn research              │
│    AGENT — direct tool use only, no research spawning               │
│                                                                      │
│  Key methods:                                                       │
│    build_wild_prompt(message, session_id, skill_manager, skill_id)  │
│    handle_llm_response(full_text) → list[action]                    │
│    start_experiment(goal, config) → agent_id                        │
│    start_run(command, name, workdir) → agent_id                     │
│    on_monitor() → detect child completion, trigger callbacks        │
│                                                                      │
│  Instance management:                                               │
│    _session_agents: dict[chat_session_id → SessionAgent]            │
│    _get_or_create_session_agent(session_id) → SessionAgent          │
│    One L1 per chat session, created on demand                       │
└──────────────────────────────────────────────────────────────────────┘

2.2 The Three Modes

Aspect WILD AUTO AGENT
Skill template l1_wild_session l1_auto_session l1_agent_session
Spawn research MANDATORY LLM decides NEVER
Direct tool use No Yes Yes
Steer/stop children Yes Yes No
Best for Full automation General use Quick Q&A, debugging

WILD mode — the LLM MUST output <spawn_research> for every request. No
exceptions. The agent is purely a delegation gateway. Use when you want
autonomous research for all tasks.

AUTO mode — the LLM freely chooses between using tools directly (bash, file
read/write, code editing via OpenCode) and spawning research experiments. The
prompt gives heuristics: spawn research if the task needs iteration, try
multiple approaches, or would take more than ~3 tool calls. Use for balanced
autonomy.

AGENT mode — the LLM uses tools directly to answer questions and complete
tasks. No research spawning. The agent handles everything in the conversation
turn. Use for interactive debugging and development.

2.3 XML Action Tags

The L1's response is parsed for structured action blocks:

<!-- Spawn a Level 2 ResearchAgent -->
<spawn_research>
goal: Investigate learning rate sensitivity with ablation studies
max_iterations: 15
</spawn_research>

<!-- Spawn a standalone Level 3 ExecutorAgent -->
<spawn_command>
goal: Run the baseline training script
command: python train.py --lr 0.001 --epochs 50
workdir: /path/to/project
</spawn_command>

<!-- Redirect an active experiment -->
<steer_child>
experiment_id: orchestrator-abc123
context: Focus on reward shaping instead of policy gradient
</steer_child>

<!-- Stop an experiment -->
<stop_child>
experiment_id: orchestrator-abc123
</stop_child>

Parsing detail: The parser uses _last_match() (last regex occurrence) to
avoid matching template examples in the prompt itself — only the LLM's actual
output is parsed.

2.4 Prompt Construction (build_wild_prompt)

┌─────────────────────────────────────────────────────────┐
│  Skill Template (SKILL.md)                              │
│  ┌───────────────────────────────────────────────────┐  │
│  │  # Session Agent                                  │  │
│  │  ...role instructions...                          │  │
│  │  ## Working Directory                             │  │
│  │  {{workdir}}                                      │  │
│  │  ## Active Experiments                            │  │
│  │  {{children_status}}                              │  │
│  │  {{experiment_context}}                           │  │
│  │  ## Memory Bank                                   │  │
│  │  {{memories}}                                     │  │
│  │  ## Recent Conversation                           │  │
│  │  {{conversation_history}}                         │  │
│  └───────────────────────────────────────────────────┘  │
│                                                         │
│  + [USER] {user_message}                                │
└─────────────────────────────────────────────────────────┘

Context variables filled dynamically:

  • children_status — markdown list of active/completed L2 experiments
  • memories — last 20 REFLECTION entries (cross-session memory bank)
  • experiment_context — active runs count, finished/failed counts, alerts
  • workdir — absolute path to project root
  • conversation_history — last 10 messages (role + content, truncated)

2.5 L1 Tools Available via OpenCode

The LLM running inside OpenCode has access to OpenCode's built-in tools:
bash, read, glob, grep, edit, write, task, webfetch,
websearch, codesearch, skill, and others.

Additionally, MCP tools registered in opencode.json provide:

  • quick_bash — shell execution with timeout (never hangs)
  • list_directory — instant directory listing (pure Python)
  • create_run / start_run — run management via the server API

Important: MCP tools require the MCP server process to start successfully.
The opencode.json must point to the correct Python interpreter (the project
venv at .ra-venv/bin/python, not the system Python).


3. Level 2 — ResearchAgent (Autonomous Research Loop)

3.1 Overview

The L2 ResearchAgent runs as an OS subprocess spawned by the L1 via
Runtime. It operates a full autonomous research loop: plan → iterate → reflect.
Each L2 instance handles one research experiment.

┌──────────────────────────────────────────────────────────────────────┐
│  Level 2 (L2) — ResearchAgent         [subprocess, one per         │
│  role = "orchestrator"                  experiment]                  │
│  allowed_children = {executor}                                      │
│  monitor_interval = 5.0s                                            │
│                                                                      │
│  Configuration (ResearchAgentConfig):                               │
│    opencode_url, model_provider, model_id, workdir,                 │
│    server_url, skills_dir, max_iterations, wait_seconds,            │
│    autonomy_level, evo_sweep_enabled                                │
│                                                                      │
│  Research Loop:                                                     │
│    Phase 0: Plan (one-shot LLM → task decomposition)               │
│    Phase 1+: Iterate (design → spawn L3s → collect results)        │
│    Reflection: learn + decide continue/stop                         │
│                                                                      │
│  LLM Integration:                                                   │
│    Direct HTTP to OpenCode (own sessions, own SSE streams)          │
│    Prompt templates from PromptSkillManager (loaded from disk)      │
│                                                                      │
│  L2 = BRAIN: designs experiments, analyzes results, reflects        │
│  L3 = HANDS: executes experiments (subprocess commands)             │
└──────────────────────────────────────────────────────────────────────┘

3.2 The Research Loop

┌─────────────────────────────────────────────────────────────────┐
│                    RESEARCH LOOP (L2)                            │
│                                                                 │
│  ┌──────────────────┐                                          │
│  │  Phase 0: PLAN   │  One-shot LLM call                      │
│  │  Iteration = 0   │  → Parse <plan>, write tasks.md          │
│  │                  │  → git commit                            │
│  │                  │  → PLAN entry to MemoryView              │
│  └────────┬─────────┘                                          │
│           │                                                    │
│           ↓                                                    │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Phase 1+: ITERATE (until DONE or max_iterations)        │  │
│  │                                                          │  │
│  │  1. check_pause() + consume_steer()                      │  │
│  │  2. Snapshot files (git ls-files)                         │  │
│  │  3. Build iteration prompt (memories, steer, plan, ctx)   │  │
│  │  4. Send to OpenCode LLM                                 │  │
│  │  5. Parse response:                                      │  │
│  │     - <promise>DONE|WAITING|...</promise>                 │  │
│  │     - <plan>updated tasks</plan>                         │  │
│  │     - <summary>what happened</summary>                   │  │
│  │     - <experiment> specs (optional)                       │  │
│  │  6. If <experiment> tags:                                │  │
│  │     → spawn L3 ExecutorAgents per spec                   │  │
│  │     → poll 5s intervals, read run.log                    │  │
│  │     → format results for next iteration                  │  │
│  │  7. Diff files, count errors, track progress             │  │
│  │  8. git commit, append iteration_log.md                  │  │
│  │  9. Write entries to MemoryView, emit SSE events         │  │
│  │                                                          │  │
│  │  If promise == "DONE" → enter Reflection                 │  │
│  │  If promise == "WAITING" → sleep(wait_seconds)           │  │
│  │  Else → sleep(2s), continue                              │  │
│  └──────────┬───────────────────────────────────────────────┘  │
│             │                                                  │
│             ↓ (on DONE)                                        │
│  ┌──────────────────┐                                          │
│  │  REFLECTION      │  One-shot LLM call                      │
│  │                  │  → Parse <reflection>, <continue>,      │
│  │                  │    <memories> with [tag] prefix          │
│  │                  │  → Store as REFLECTION entries           │
│  │                  │  → If continue=yes: back to Phase 1     │
│  │                  │  → If continue=no: session done          │
│  └──────────────────┘                                          │
└─────────────────────────────────────────────────────────────────┘

3.3 OpenCode Interaction (L2)

The L2 communicates with the LLM via direct HTTP to OpenCode — separate from
the L1's OpenCode sessions:

_run_opencode_prompt(prompt)
  ├── 1. Create session:  POST /session?directory={workdir}
  ├── 2. Connect SSE:     GET /global/event (before prompt, no missed events)
  ├── 3. Send prompt:     POST /session/{id}/prompt_async
  │     payload = {model: {providerID, modelID}, parts: [{type: "text", text: ...}]}
  ├── 4. Stream events:
  │     message.part.delta   → incremental text
  │     message.part.updated → full text snapshots
  │     session.status: idle → done
  └── 5. Return (full_text, opencode_session_id)

Timeouts:
  - Time to first token: 180s
  - Total response: 600s
  - SSE line idle: 120s

3.4 Prompt Templates

L2 prompts are rendered from SKILL.md templates via PromptSkillManager:

skills/prompts/
  wild_v2_planning/SKILL.md     → Phase 0 (planning)
  wild_v2_iteration/SKILL.md    → Phase 1+ (iteration)
  wild_v2_reflection/SKILL.md   → Reflection

Each template has {{variable}} placeholders filled by PromptContext:

@dataclass
class PromptContext:
    goal: str
    iteration: int
    max_iterations: int
    session_id: str
    workdir: str
    tasks_path: str           # .agents/wild/{session_id}/tasks.md
    log_path: str             # .agents/wild/{session_id}/iteration_log.md
    steer_context: str        # user steering (if any)
    history: list[dict]       # all iteration records
    no_progress_streak: int   # consecutive no-change iterations
    short_iteration_count: int
    autonomy_level: str       # "cautious" | "balanced" | "full"
    memories_text: str        # REFLECTION entries from MemoryView
    evo_sweep_enabled: bool

3.5 Session State (ResearchSession)

@dataclass
class ResearchSession:
    session_id: str
    goal: str
    status: str = "running"       # running | paused | done | failed
    iteration: int = 0
    max_iterations: int = 25
    plan: str = ""
    history: list = []            # iteration records
    started_at: float = 0.0
    finished_at: float | None = None
    steer_context: str = ""
    opencode_sessions: list = []  # track all OC session IDs used
    reflection: str = ""
    no_progress_streak: int = 0
    short_iteration_count: int = 0

Persisted to .agents/wild/{session_id}/state.json on each iteration.


4. Level 3 — ExecutorAgent (Task Execution)

┌──────────────────────────────────────────────────────────────────────┐
│  Level 3 (L3) — ExecutorAgent          [subprocess, leaf node,      │
│  role = "executor"                       one per task/run]           │
│  allowed_children = frozenset()  (LEAF — cannot spawn)              │
│  monitor_interval = 3.0s (auto-enabled for subprocess backend)      │
│                                                                      │
│  Backends:                                                          │
│    "subprocess" — asyncio.create_subprocess_shell(command)           │
│    "callback"   — await config["callback"](goal)                    │
│    "mock"       — simulated delay + canned result (tests)           │
│                                                                      │
│  Subprocess Features:                                               │
│    - GPU detection via gpuwrap_detect.py                            │
│    - Retry on GPU conflict (CUDA busy / OOM patterns)               │
│    - Stdout streaming to run.log                                    │
│    - WANDB env var injection                                        │
│    - Exit code → RESULT entry                                       │
│                                                                      │
│  Watchdog (on_monitor, every 3s):                                   │
│    _tail_metrics() — reads agent_metrics.jsonl from run dir         │
│    _check_anomalies():                                              │
│      - NaN/Inf in loss → CRITICAL alert                             │
│      - High loss (>8.0) → WARNING alert                             │
│      - Loss spike (>3x rolling avg) → WARNING alert                │
│    Writes ALERT entries to FileStore on anomaly detection           │
│                                                                      │
│  Output files:                                                      │
│    .agents/runs/{agent_id}/command.txt   — shell command            │
│    .agents/runs/{agent_id}/run.log       — stdout/stderr stream     │
│    .agents/runs/{agent_id}/job.done      — exit code marker         │
│    .agents/runs/{agent_id}/agent_metrics.jsonl — training metrics   │
└──────────────────────────────────────────────────────────────────────┘

5. Scheduler — GPU-Aware Executor Queue

5.1 Overview

The Scheduler sits between spawn requests and actual process creation for
executor (L3) agents only. L1→L2 spawning bypasses it entirely.

                  spawn_child(ExecutorAgent, goal, gpu_required=2)
                      │
                      ▼
              ┌───────────────────┐
              │    Scheduler      │
              │                   │
              │  FIFO Queue       │  ← file-persisted (.json per request)
              │  GPU Tracker      │  ← nvidia-smi at init, tracks allocations
              │  Policy Engine    │  ← pluggable (default: FIFOPolicy)
              │                   │
              │  Tick loop (~1s): │
              │    policy.select_ │
              │    next() →       │
              │    Runtime.spawn() │
              └────────┬──────────┘
                       │
                       ▼
              Runtime.spawn(ExecutorAgent, ...)
                       │
                       ▼
                  OS Process (L3)

5.2 Key Types

class RequestStatus(str, Enum):
    QUEUED   = "queued"     # waiting for resources
    RUNNING  = "running"    # agent spawned
    DONE     = "done"       # agent finished successfully
    FAILED   = "failed"     # agent failed or request cancelled
    REJECTED = "rejected"   # impossible to satisfy (e.g. gpu_required > total)

@dataclass
class ScheduleRequest:
    request_id: str         # "sched-{uuid8}"
    agent_cls_path: str     # "agentsys.agents.executor.ExecutorAgent"
    goal: str
    parent_id: str | None
    config: dict | None
    gpu_required: int = 0
    created_at: float
    status: RequestStatus
    agent_id: str | None    # set when spawned
    rejection_reason: str | None

@dataclass
class ScheduleTicket:
    request_id: str         # returned to caller for status queries

5.3 GPU Tracking

class GPUTracker:
    total_gpus: int         # detect_total_gpus() via nvidia-smi at init
    _allocated: dict[str, int]  # request_id → gpu count

    can_allocate(n) → bool  # n <= total - sum(allocated)
    allocate(request_id, n) # reserve GPUs
    release(request_id)     # free GPUs (on agent DONE/FAILED)

GPU detection runs nvidia-smi --list-gpus at scheduler init. Falls back to
0 on systems without GPUs — all gpu_required=0 requests pass through
immediately.

5.4 Scheduling Policy

class SchedulingPolicy(ABC):
    def select_next(queue, running, gpu_tracker) → request_id | None

class FIFOPolicy(SchedulingPolicy):
    # Returns first queued request where gpu_required <= gpu_tracker.available
    # Skips requests that need more GPUs than currently free

The policy abstraction enables future algorithms (priority queues, fair
scheduling, deadline-aware) without changing the Scheduler core.

5.5 Scheduler Lifecycle

_init_agentsys()                    lifespan startup
  │                                   │
  ├── Scheduler(runtime, store_dir)   ├── scheduler.start()
  │     ├── detect_total_gpus()       │     └── create_task(_tick_loop)
  │     ├── mkdir _scheduler/         │
  │     └── _load_from_disk()         │
  │           └── orphaned RUNNING    │
  │              requests → FAILED    │
  │                                   │
  └── runtime._scheduler = scheduler  │
                                      │
                              lifespan shutdown
                                │
                                └── scheduler.shutdown()
                                      └── cancel tick task

5.6 Request Flow

1. Agent.spawn_child(ExecutorAgent, goal, gpu_required=2)
     │
     ├── scheduler.submit(cls, goal, parent_id, config, gpu_required)
     │     ├── If gpu_required > total_gpus → REJECTED immediately
     │     ├── Create ScheduleRequest (status=QUEUED)
     │     ├── Persist to {store_dir}/{request_id}.json
     │     └── Return ScheduleTicket
     │
     └── Return ScheduledChildHandle(ticket, scheduler)

2. Tick loop (every ~1s):
     ├── Collect QUEUED requests, sort by created_at
     ├── policy.select_next(queue, running, gpu_tracker)
     │     └── FIFOPolicy: first request where gpu fits
     ├── If selected:
     │     ├── Runtime.spawn(cls, goal, parent_id, config)
     │     ├── gpu_tracker.allocate(request_id, gpu_required)
     │     ├── request.status = RUNNING, request.agent_id = child.id
     │     └── Persist updated request
     └── If nothing fits: no-op (try again next tick)

3. Agent completes (DONE/FAILED event in Runtime):
     ├── Runtime._handle_event notifies scheduler.on_agent_done()
     ├── gpu_tracker.release(request_id)
     ├── request.status = DONE or FAILED
     └── Persist — freed GPU enables next queued request

5.7 ScheduledChildHandle

Duck-type compatible with ChildHandle. Returned when an executor spawn goes
through the scheduler.

class ScheduledChildHandle:
    .idrequest_id (while queued) or agent_id (once spawned)
    .statusIDLE (queued), RUNNING, DONE, FAILED (maps to AgentStatus)
    .wait()   → polls scheduler then agent status
    .cancel() → scheduler.cancel() if queued, runtime.stop() if running

5.8 Persistence

Each request is atomically persisted as {store_dir}/{request_id}.json:

{
  "request_id": "sched-a1b2c3d4",
  "agent_cls_path": "agentsys.agents.executor.ExecutorAgent",
  "goal": "train model v2",
  "parent_id": "orchestrator-abc12345",
  "config": {"command": "python train.py", "workdir": "/exp"},
  "gpu_required": 2,
  "created_at": 1740000000.0,
  "status": "queued",
  "agent_id": null,
  "rejection_reason": null
}

On restart, _load_from_disk() restores all requests. Requests that were
RUNNING when the scheduler crashed are marked FAILED (orphaned processes).

5.9 Cleanup

  • cleanup_parent(parent_id) — cancels all QUEUED requests for a dead parent.
    Called automatically by Runtime.stop().
  • cancel(request_id) — cancels a single QUEUED request.

5.10 API Endpoints

Endpoint Method Purpose
/scheduler/queue GET List requests (filter by parent_id, status)
/scheduler/queue/{request_id} GET Get single request
/scheduler/stats GET GPU availability + queue summary

5.11 What Does NOT Go Through the Scheduler

  • L1→L2 spawningcls.role == "orchestrator" bypasses the scheduler.
  • In-process agents — L1 SessionAgent registered via register_local().
  • Sidecar agentscls.role == "sidecar" bypasses the scheduler.

Only cls.role == "executor" triggers the scheduler path, and only when
runtime._scheduler is set.


6. Runtime & IPC

6.1 Runtime — The Multiprocess Supervisor

Runtime sits between every parent-child pair. It is not an agent — it
manages OS processes, IPC queues, and the shared FileStore.

                        Runtime (server process)
                       ┌──────────────────────┐
                       │  FileStore           │ ← shared filesystem
                       │  _processes {}       │ ← OS Process objects
                       │  _cmd_queues {}      │ ← supervisor → worker
                       │  _event_queues {}    │ ← worker → supervisor
                       │  _agent_meta {}      │ ← in-memory cache
                       │  _local_agents {}    │ ← in-process (L1 only)
                       │  _event_listener     │ ← async polling task
                       └──────────┬───────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   │                   │
         ┌────┴────┐        ┌────┴────┐        ┌────┴────┐
         │ Worker  │        │ Worker  │        │ Worker  │
         │ Process │        │ Process │        │ Process │
         │ (L2)    │        │ (L3)    │        │ (L3)    │
         └─────────┘        └─────────┘        └─────────┘

Multiprocessing model: Uses multiprocessing.get_context("spawn") to avoid
fork-related deadlocks. Each worker gets its own asyncio event loop, its own
FileStore instance (same disk path), and a RuntimeProxy for IPC.

spawn() — Creating a Child Process

Runtime.spawn(cls, goal, parent_id, config, session, sweep, run)
  ├── 1. Validate hierarchy (parent.allowed_child_roles includes cls.role)
  ├── 2. Generate agent_id = f"{cls.role}-{uuid4().hex[:8]}"
  ├── 3. Build scope (inherited from parent)
  ├── 4. Create IPC queues (cmd_queue, event_queue)
  ├── 5. Write initial .meta.json to FileStore
  ├── 6. Create OS Process(target=run_worker, args=(...))
  ├── 7. process.start()
  ├── 8. Start event listener (if not running)
  └── 9. Return AgentInfo (lightweight metadata)

register_local() — In-Process Agent (L1 only)

Runtime.register_local(agent, goal, config)
  ├── Wire agent._runtime = self (direct reference, not proxy)
  ├── No subprocess created
  └── Caller must await agent.start() separately

6.2 Worker Process (worker.py)

run_worker(cls_path, agent_id, goal, config, scope_dict, ...)
  └── asyncio.run(_async_worker(...))
        ├── Import agent class: path_to_cls(cls_path)
        ├── Instantiate: agent = cls()
        ├── Create FileStore + MemoryView
        ├── Wire agent: id, goal, config, scope, memory, _runtime=RuntimeProxy
        ├── Write "spawned" CONTEXT entry
        ├── Start command listener (polls cmd_queue)
        ├── Send STARTED event
        ├── await agent.start() → await agent._task
        ├── Send DONE or FAILED event
        └── Write final .meta.json

6.3 RuntimeProxy — Agent-Side IPC

Workers communicate with the supervisor through RuntimeProxy:

Agent code                RuntimeProxy              Supervisor Runtime
──────────                ────────────              ────────────────
spawn_child(cls, goal)
  └──→ proxy.spawn(...)
         ├── event_queue.put(SPAWN_REQUEST)  ──→  _handle_event()
         ├── await future  (blocks)                  ├── Runtime.spawn()
         │                                           └── cmd_queue.put(SPAWN_RESPONSE)
         └── ← future.set_result(payload)    ←──  _listen_commands()
         return AgentInfo

stop(agent_id)
  └──→ proxy.stop(...)
         └── event_queue.put(STOP_REQUEST)   ──→  Runtime.stop(agent_id)

6.4 ChildHandle — Cross-Process Child Reference

When an agent spawns a child, it gets a ChildHandle:

class ChildHandle:
    id: str
    _task: asyncio.Task | None       # in-process mode (L1 → L2)
    _agent: Agent | None             # in-process reference
    _store: FileStore | None         # multiprocess mode (reads .meta.json)
    _runtime_proxy: RuntimeProxy     # multiprocess IPC

    @property
    def status(self) -> AgentStatus:
        # In-process:   direct agent.status
        # Multiprocess: read .meta.json

    async def wait(self, timeout=None):
        # In-process:   await asyncio.shield(task)
        # Multiprocess: poll .meta.json until DONE/FAILED

    async def cancel(self):
        # runtime.stop(id) or runtime_proxy.stop(id)

6.5 IPC Message Types

Supervisor → Worker (Command via cmd_queue):

CmdType Payload Effect
STOP {} agent.stop()
PAUSE {} agent.pause()
RESUME {} agent.resume()
STEER {context, urgency} agent._inject_steer()
SPAWN_RESPONSE {request_id, child_id, ...} Fulfills pending spawn future

Worker → Supervisor (Event via event_queue):

EventType Payload Effect
STARTED {} Status → RUNNING
DONE {} Status → DONE
FAILED {error} Status → FAILED
SPAWN_REQUEST {request_id, cls_path, goal, ...} Runtime.spawn()
STOP_REQUEST {target_id} Runtime.stop()

6.6 Agent ABC — Lifecycle

     ┌─────────────────────────────────────────────┐
     │   IDLE ──start()──→ RUNNING ──→ DONE        │
     │                       │  ↑        ↑         │
     │                 pause()│  │resume()│stop()   │
     │                       ↓  │        │         │
     │                     PAUSED ───────┘         │
     │                       │                     │
     │                  exception                  │
     │                       ↓                     │
     │                    FAILED                   │
     └─────────────────────────────────────────────┘

start()
  └── status = RUNNING → on_start() → create_task(_run_loop())

_run_loop()
  ├── start _watchdog_loop() if monitor_interval > 0
  ├── await run()              ← ABSTRACT (subclass implements)
  ├── status = DONE (or FAILED on exception)
  └── await on_stop()

_watchdog_loop()
  └── while RUNNING/PAUSED: sleep → check_pause() → on_monitor()

7. Memory System

7.1 Overview

┌───────────────┐     ┌───────────────┐     ┌──────────────────────┐
│    Agent      │     │  MemoryView   │     │     FileStore        │
│               │     │               │     │                      │
│ self.memory   │────→│ .write()      │────→│ atomic JSON write    │
│   .write()    │     │ .read_self()  │     │ to disk              │
│   .query()    │     │ .assemble_    │     │                      │
│               │     │   context()   │     │ .query() scans dirs  │
│               │     │               │     │ with 2-phase filter  │
│               │     │ auto-fills:   │     │                      │
│               │     │  agent_id     │     │ Dir structure:       │
│               │     │  project      │     │  {root}/{project}/   │
│               │     │  session      │     │    {agent_id}/       │
│               │     │  sweep        │     │      .meta.json      │
│               │     │  run          │     │      {entry}.json    │
│               │     │  role         │     │                      │
└───────────────┘     └───────────────┘     └──────────────────────┘

7.2 Entry — The Atomic Unit

@dataclass
class Entry:
    key: str                # auto-generated 12-char hex
    agent_id: str           # who produced this
    target_id: str | None   # recipient (MESSAGE type only)
    type: EntryType         # PLAN, CONTEXT, METRICS, ALERT, RESULT, REFLECTION, MESSAGE, RAW_FILE
    project: str            # always set
    session: str | None     # L1 scope
    sweep: str | None       # L2 scope
    run: str | None         # L3 scope
    role: str               # producer's role
    tags: list[str]         # free-form labels
    data: dict              # arbitrary JSON payload
    created_at: float       # unix timestamp
EntryType Written By Purpose
PLAN L2 Task decomposition
CONTEXT Any Lifecycle events, status
METRICS L3 Training metrics (jsonl rows)
ALERT L3 Anomaly detection (NaN, loss spike)
RESULT L3 Task output (exit code, run dir)
REFLECTION L2 Lessons learned (persists across sessions)
MESSAGE Any → Any Inter-agent communication
RAW_FILE External Log files, scripts, artifacts

7.3 FileStore — Flat File KV on Disk

{store_root}/{project}/{agent_id}/
  .meta.json
  {timestamp_ms}_{entry_type}_{12char_key}.json
  • Atomic writes: .tmpos.rename() (no partial reads)
  • 2-phase query: Phase 1 scans filenames (fast type/time filter), Phase 2
    reads JSON for scope/tag filtering
  • Filename encoding embeds type for fast filtering without JSON parse

7.4 Hierarchical Scope & Context Assembly

Level 0: project    (always present)
Level 1: session    (research session)
Level 2: sweep      (parameter sweep)
Level 3: run        (individual training run)

Scope inheritance on spawn:
  SessionAgent  scope = {project: "mlexp"}
    └── ResearchAgent  scope = {project: "mlexp", session: "sess-42"}
          └── ExecutorAgent  scope = {project: "mlexp", session: "sess-42", run: "run-7"}

Context assembly (MemoryView.assemble_context(token_budget)) builds
prompt context by widening scope from narrow to broad:

Step 1: "My Entries" — agent's own entries (chronological)
Step 2: Widen by scope:
  ├── Sweep level:   sibling runs in same sweep
  ├── Session level: other sweeps in same session
  └── Project level: cross-session learnings (memory bank)
Step 3: Truncate to token budget

8. Communication Protocol

8.1 Three Tiers of Steer

Tier Mechanism Delivery Effect
CRITICAL Runtime.steer(id, ctx, CRITICAL) Kill + respawn Agent restarts with [STEER: ctx] appended to goal
PRIORITY Runtime.steer(id, ctx, PRIORITY) IPC queue → buffer Agent reads at next consume_steer()
MESSAGE agent.msg(target, data) FileStore entry Target reads from inbox on next context assembly

8.2 Event Relay — Runtime to Frontend

Agent._emit_event("iteration", {session_id, iteration, summary})
  │
  ├── In-process: runtime.emit_event() → EventRelay.emit()
  │     └── SSE listeners (GET /agents/events)
  │
  └── Subprocess: event_queue → runtime._handle_event()
        └── EventRelay.emit() → SSE listeners

EventRelay maintains a ring buffer of the last 1000 events. SSE clients can
reconnect and replay from a timestamp via recent(n, since).

8.3 Stop Cascade

When any agent is stopped, Runtime cascades to all descendants (leaves first):

stop(orchestrator-xyz)
  ├── Collect descendants (BFS)
  ├── Reverse order (leaves first)
  └── For each: send STOP → join(5s) → terminate() if alive

9. Chat System Integration

9.1 ChatStreamRuntime

The in-memory object tracking a live streaming response:

class ChatStreamRuntime:
    session_id: str
    run_id: str               # unique per response
    status: str               # "running" | "completed" | "failed" | "stopped"
    events: list[dict]        # sequenced event log
    next_sequence: int
    subscribers: set[Queue]   # live SSE client queues
    full_text: str            # accumulated LLM text
    full_thinking: str        # accumulated reasoning
    parts_accumulator: StreamPartsAccumulator
    started_at: float
    updated_at: float

9.2 StreamPartsAccumulator

Collects ordered message parts from the OpenCode SSE stream:

  • Text parts: accumulated from part_delta events
  • Thinking parts: accumulated from reasoning deltas
  • Tool parts: keyed by ID, state updates amend existing entry
  • snapshot() returns immutable ordered list for persistence
  • finalize() flushes buffers and returns final parts list

9.3 SSE Event Format (NDJSON to frontend)

{"seq": 1, "type": "provenance", "skill_id": "l1_auto_session", ...}
{"seq": 2, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "I'll "}
{"seq": 3, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "investigate..."}
{"seq": 4, "type": "part_update", "id": "prt_2", "ptype": "tool", "name": "bash", ...}
{"seq": 5, "type": "l1_action", "action_type": "spawn", "goal": "...", "agent_id": "..."}
{"seq": 6, "type": "session_status", "status": "idle"}

9.4 Persistence

Chat state is saved to {WORKDIR}/.agents/chat_data.json:

{
  "chat_sessions": {
    "session_id": {
      "title": "Optimize learning rate",
      "created_at": 1771745054.69,
      "messages": [
        {"role": "user", "content": "...", "timestamp": ...},
        {"role": "assistant", "content": "...", "thinking": "...", "parts": [...]}
      ],
      "opencode_session_id": "ses_abc123",
      "model_provider": "opencode",
      "model_id": "minimax-m2.5-free",
      "active_stream": { ... }   // snapshot during streaming, removed on complete
    }
  }
}

10. MCP Tools

10.1 Research Agent MCP Server

Registered in opencode.json as a local MCP server, providing tools the LLM
can call during its response:

"research-agent": {
  "type": "local",
  "command": ["./.ra-venv/bin/python", "./server/tools/research_agent_mcp.py"],
  "enabled": true
}

Tools:

Tool Purpose Key Params
quick_bash Shell command with timeout (never hangs) command, workdir, timeout_seconds (max 120)
list_directory Instant directory listing (pure Python) path, include_hidden
create_run Create a run via server API name, command, workdir, launch_policy
start_run Start an existing run run_id

10.2 SLURM MCP Server

For HPC cluster environments:

"slurm-cli": {
  "type": "local",
  "command": ["./.ra-venv/bin/python", "./server/tools/slurm_mcp_cli.py"],
  "enabled": true
}

Provides wrappers for squeue, sinfo, sbatch, scancel, sacct, scontrol.

10.3 Context7 (Remote)

"context7": {
  "type": "remote",
  "url": "https://mcp.context7.com/mcp",
  "enabled": true
}

11. Server Bootstrap

11.1 Startup Sequence

python server.py --workdir ../../sandbox-wild-test --port 10000
  │
  ├── Parse arguments (workdir, port, host, tmux-session)
  ├── init_paths(workdir)
  │     → WORKDIR, DATA_DIR, CHAT_DATA_FILE, etc.
  │     → mkdir .agents/, .agents/runs/
  │
  ├── _init_agentsys()
  │     ├── Create FileStore at .agents/store
  │     ├── Create Runtime (multiprocess supervisor)
  │     ├── Bridge Runtime events → EventRelay
  │     └── Initialize route modules with runtime refs
  │
  ├── Load persisted state
  │     ├── chat_data.json → chat_sessions
  │     ├── jobs.json → runs, sweeps
  │     ├── alerts.json → alerts
  │     ├── plans.json → plans
  │     ├── journey_state.json → journey
  │     └── settings.json → cluster config
  │
  └── uvicorn.run(app, host, port)

11.2 Key Configuration

Variable Default Source
MODEL_PROVIDER "opencode" env MODEL_PROVIDER
MODEL_ID "minimax-m2.5-free" env MODEL_ID
OPENCODE_URL "http://127.0.0.1:4096" env OPENCODE_URL
OPENCODE_USERNAME "opencode" env OPENCODE_SERVER_USERNAME
USER_AUTH_TOKEN None env RESEARCH_AGENT_USER_AUTH_TOKEN
TMUX_SESSION_NAME "research-agent" env RESEARCH_AGENT_TMUX_SESSION
SERVER_CALLBACK_URL "http://127.0.0.1:10000" computed

12. File Inventory

Core Framework (agentsys/)

File Purpose
types.py AgentStatus, EntryType, SteerUrgency, Entry, Steer, Scope, AgentInfo
agent.py Agent ABC, ChildHandle, lifecycle, steer, spawn_child
runtime.py Multiprocess supervisor, spawn, stop, steer routing, event listener
worker.py Subprocess entry point, wires agent + proxy + store
proxy.py RuntimeProxy — agent-side IPC (spawn requests, command listener)
ipc.py CmdType, EventType, Command, Event, cls_to_path, path_to_cls
filestore.py Flat file KV store, atomic writes, 2-phase query
memory.py MemoryView — scoped interface, context assembly

Agent Implementations (agentsys/agents/)

File Level Purpose
session_agent.py L1 LLM-enabled orchestrator, per-session, build_wild_prompt, handle_llm_response
research_agent.py L2 Autonomous research loop (plan → iterate → reflect), OpenCode integration
executor.py L3 Subprocess runner, GPU handling, metrics tailing, anomaly detection

Chat System (chat/)

File Purpose
routes.py Chat CRUD, POST /chat, _chat_worker, synthetic turns, mode routing
streaming.py OpenCode SSE parsing, ChatStreamRuntime, StreamPartsAccumulator, event broadcasting

Agent Routes & Support (agent/)

File Purpose
wild_routes.py Per-session L1 registry, wild mode endpoints (start/stop/steer/status)
v2_prompts.py PromptContext, XML tag parsers (spawn_research, steer_child, etc.)
core/event_relay.py EventRelay: Runtime events → SSE stream
runtime_routes.py SSE endpoint (GET /agents/events), agent management

Prompt Templates (skills/prompts/)

Directory Purpose
l1_wild_session/ L1 prompt: forced research mode
l1_auto_session/ L1 prompt: agent-decides mode
l1_agent_session/ L1 prompt: direct tool use mode
wild_v2_planning/ L2 prompt: Phase 0 planning
wild_v2_iteration/ L2 prompt: Phase 1+ iteration
wild_v2_reflection/ L2 prompt: reflection

MCP Tools (tools/)

File Purpose
research_agent_mcp.py FastMCP server: quick_bash, list_directory, create_run, start_run
slurm_mcp_cli.py FastMCP server: SLURM job management wrappers

Configuration (core/)

File Purpose
config.py Paths, auth, model helpers, OpenCode config loading
state.py Global state dicts, JSON persistence (chat, runs, alerts, etc.)

Data Files (.agents/)

{workdir}/.agents/
  chat_data.json              # chat sessions + messages
  jobs.json                   # runs + sweeps
  alerts.json                 # active alerts
  settings.json               # cluster config
  plans.json                  # plans
  journey_state.json          # journey events
  store/{project}/            # FileStore root
    {agent_id}/
      .meta.json
      {timestamp}_{type}_{key}.json
  wild/{session_id}/          # L2 research session state
    state.json
    history.json
    tasks.md
    iteration_log.md
  runs/{agent_id}/            # L3 executor output
    command.txt
    run.log
    job.done
    agent_metrics.jsonl

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions