AgentSys— Backend Parrellel Agent System Design Document
Architecture reference for the multi-agent backend powering the Research Agent
application. Covers the 3-level agent hierarchy, chat integration, LLM
orchestration, memory system, and inter-process communication.

Note: This is a Gemini-generated graph so there's detail misaligning with the actual design. The graph is just serving for a rough understanding of the broader picture. Refer to the design doc below for every detail in the system.
0. Holistic View
The Research Agent backend is a 3-level multi-process agent system built on
top of a FastAPI server and an external LLM gateway (OpenCode). A user chats in
the browser; every message flows through an LLM-powered Level 1 agent that
decides — based on its operating mode — whether to answer directly, spawn
autonomous research experiments, or manage running children.
┌─────────────────────────────────────────────────────────────────────┐
│ DIMENSION 1: AGENTS │
│ │
│ Browser ──HTTP──→ FastAPI Server ──OpenCode──→ LLM Gateway │
│ │ │
│ SessionAgent (L1) │
│ [LLM brain, in-process] │
│ [3 modes: wild/auto/agent] │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ Runtime IPC │ │ Runtime IPC │
│ ▼ ▼ ▼ │
│ ResearchAgent ResearchAgent ExecutorAgent ─┐ │
│ (L2) (L2) (L3) │ │
│ │ │ │ │
│ │ spawn_child │ spawn_child │ │
│ ▼ ▼ │ │
│ ┌─────────────────────────────────┐ │ │
│ │ SCHEDULER │◄────────────┘ │
│ │ ┌─────────┐ ┌────────────┐ │ │
│ │ │ FIFO │ │ GPU │ │ ← all executor spawns │
│ │ │ Queue │ │ Tracker │ │ routed here │
│ │ └────┬────┘ └─────┬──────┘ │ │
│ │ └──────┬──────┘ │ │
│ │ Policy Engine │ │
│ │ (tick ~1s) │ │
│ └──────────────┬─────────────────┘ │
│ │ approve & spawn │
│ ▼ │
│ ExecutorAgent (L3) ×N │
│ [subprocess, per task/run] │
│ [GPU-gated, queued] │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ DIMENSION 2: MEMORY │
│ │
│ Agent → MemoryView → FileStore │
│ (scoped) (auto-fill) (flat KV on disk) │
│ │
│ Hierarchical scope queries: │
│ project > session > sweep > run │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ DIMENSION 3: COMMUNICATION │
│ │
│ CRITICAL steer → kill + respawn │
│ PRIORITY steer → buffered, polled │
│ MESSAGE → append to target memory │
│ │
│ All cross-process via IPC queues │
└─────────────────────────────────────────────────────────────────────┘
1. End-to-End Flow
1.1 The Big Picture
User (browser) FastAPI Server OpenCode (LLM) OS Subprocesses
────────────── ────────────── ────────────── ───────────────
POST /chat ───────────→ Resolve mode
{message, mode} Build L1 prompt
(skill template + context)
│
ChatStreamRuntime ──────→ POST /session (create)
│ POST /session/{id}/prompt_async
│ │
│ GET /global/event (SSE stream)
│ ← text deltas, tool updates
│
← SSE (NDJSON) ─────────→ User sees streaming response
│
Post-process:
parse XML action tags
│
┌────┴──────────────────────────┐
│ <spawn_research> │
│ → Runtime.spawn(L2) ───────┼──→ ResearchAgent (L2 proc)
│ <steer_child> │ ├── Phase 0: Plan
│ → Runtime.steer(L2) │ ├── Phase 1+: Iterate
│ <stop_child> │ │ ├── Design (OpenCode LLM)
│ → Runtime.stop(L2) │ │ ├── spawn_child(L3)
│ <spawn_command> │ │ │ │
│ → spawn_child(L3) │ │ │ ┌────▼──────────┐
│ │ │ │ │ │ SCHEDULER │
│ ┌────▼──────────┐ │ │ │ │ Queue + GPU │
│ │ SCHEDULER │ │ │ │ │ gate → spawn │
│ │ Queue + GPU │ │ │ │ └────┬──────────┘
│ │ gate → spawn │ │ │ │ │
│ └────┬──────────┘ │ │ │ ExecutorAgent (L3)
│ │ │ │ └── Collect results
│ ExecutorAgent (L3) │ └── Reflection
│ (direct tool use) │
│ → OpenCode handles │
└────────────────────────────────┘
GET /sessions/{id}/stream ─→ SSE replay + live events
GET /agents/events ────────→ EventRelay SSE (agent lifecycle)
[L2/L3 completion] ───────→ Synthetic chat turn
(L1 auto-responds with results)
1.2 Detailed Chat Request Flow
Step 1 — User sends message:
POST /chat
{
"session_id": "abc123",
"message": "Optimize the learning rate schedule",
"mode": "auto" // "wild" | "auto" | "agent"
}
Step 2 — Build L1 prompt:
- Resolve mode → select skill template (
l1_wild_session, l1_auto_session,
or l1_agent_session)
- Get or create per-session
SessionAgent instance
- Call
agent.build_wild_prompt() which renders the skill template with:
{{children_status}} — active/completed L2 experiments
{{memories}} — reflection bank from past sessions
{{experiment_context}} — runs/sweeps/alerts summary
{{workdir}} — project root directory
{{conversation_history}} — last 10 chat messages
- Append
[USER] {message} to rendered prompt
Step 3 — Create ChatStreamRuntime:
- In-memory object that tracks the streaming response
- Generates unique
run_id, initializes event list, subscriber queues
- Stored in
active_chat_streams[session_id]
Step 4 — Background worker sends prompt to OpenCode:
- Get or create OpenCode session:
POST {OPENCODE_URL}/session?directory={workdir}
- Send prompt:
POST {OPENCODE_URL}/session/{id}/prompt_async
with {model: {providerID, modelID}, parts: [{type: "text", text: prompt}]}
- Stream SSE:
GET {OPENCODE_URL}/global/event
message.part.delta → incremental text
message.part.updated → tool call state
session.status: idle → done
Step 5 — Stream to frontend:
- Each SSE event is wrapped with a sequence number and broadcast to subscribers
- Frontend receives NDJSON stream: text deltas, thinking deltas, tool updates
- Periodic snapshot persisted to
chat_data.json (debounced every 0.75s)
Step 6 — Post-process L1 response (agent/wild/auto modes only):
- Call
agent.handle_llm_response(full_text) which parses XML action tags:
<spawn_research> → agent.start_experiment() → spawns L2 ResearchAgent
<spawn_command> → agent.start_run() → spawns L3 ExecutorAgent
<steer_child> → runtime.steer() → injects context into running L2
<stop_child> → runtime.stop() → terminates running agent
- Strip XML tags from text before saving assistant message
Step 7 — Finalize:
- Save assistant message (content + thinking + tool parts) to session
- Auto-name session from OpenCode title on first turn
- Send
session_status: idle terminal event
- 120-second retention window for client reconnect
1.3 Synthetic Chat Turns (Reactive)
When an L2/L3 child finishes, the system automatically generates a follow-up:
L1.on_monitor() detects child completion
→ reads child results (run.log, iteration_log.md)
→ calls _on_child_complete_callback(session_id, system_message)
→ trigger_synthetic_chat_turn()
→ build_wild_prompt(system_message, ...)
→ _start_chat_worker() → OpenCode → SSE stream
→ L1 summarizes results for user
The frontend sees a continuous SSE stream — the user gets automatic result
summaries without refreshing or polling.
2. Level 1 — SessionAgent (LLM-Enabled Orchestrator)
2.1 Overview
The L1 SessionAgent is the user-facing gateway. Unlike a traditional
orchestrator that just routes HTTP requests, the L1 has an LLM brain —
every user message goes through OpenCode, and the LLM's structured response
drives what happens next.
┌──────────────────────────────────────────────────────────────────────┐
│ Level 1 (L1) — SessionAgent [in-process, one per chat │
│ role = "orchestrator" session, LLM-powered] │
│ allowed_children = {orchestrator, executor, sidecar} │
│ monitor_interval = 5.0s │
│ │
│ The L1 does NOT solve problems itself. It: │
│ 1. Receives user messages via the chat system │
│ 2. Gets its prompt rendered with context (mode-specific) │
│ 3. OpenCode generates a response (streamed to user) │
│ 4. Post-processing parses XML action tags from the response │
│ 5. Actions are executed: spawn L2, steer L2, stop L2 │
│ 6. Monitors children, triggers synthetic turns on completion │
│ │
│ Three operating modes (selected per chat message): │
│ WILD — must spawn research for every request │
│ AUTO — LLM decides: direct tools vs spawn research │
│ AGENT — direct tool use only, no research spawning │
│ │
│ Key methods: │
│ build_wild_prompt(message, session_id, skill_manager, skill_id) │
│ handle_llm_response(full_text) → list[action] │
│ start_experiment(goal, config) → agent_id │
│ start_run(command, name, workdir) → agent_id │
│ on_monitor() → detect child completion, trigger callbacks │
│ │
│ Instance management: │
│ _session_agents: dict[chat_session_id → SessionAgent] │
│ _get_or_create_session_agent(session_id) → SessionAgent │
│ One L1 per chat session, created on demand │
└──────────────────────────────────────────────────────────────────────┘
2.2 The Three Modes
| Aspect |
WILD |
AUTO |
AGENT |
| Skill template |
l1_wild_session |
l1_auto_session |
l1_agent_session |
| Spawn research |
MANDATORY |
LLM decides |
NEVER |
| Direct tool use |
No |
Yes |
Yes |
| Steer/stop children |
Yes |
Yes |
No |
| Best for |
Full automation |
General use |
Quick Q&A, debugging |
WILD mode — the LLM MUST output <spawn_research> for every request. No
exceptions. The agent is purely a delegation gateway. Use when you want
autonomous research for all tasks.
AUTO mode — the LLM freely chooses between using tools directly (bash, file
read/write, code editing via OpenCode) and spawning research experiments. The
prompt gives heuristics: spawn research if the task needs iteration, try
multiple approaches, or would take more than ~3 tool calls. Use for balanced
autonomy.
AGENT mode — the LLM uses tools directly to answer questions and complete
tasks. No research spawning. The agent handles everything in the conversation
turn. Use for interactive debugging and development.
2.3 XML Action Tags
The L1's response is parsed for structured action blocks:
<!-- Spawn a Level 2 ResearchAgent -->
<spawn_research>
goal: Investigate learning rate sensitivity with ablation studies
max_iterations: 15
</spawn_research>
<!-- Spawn a standalone Level 3 ExecutorAgent -->
<spawn_command>
goal: Run the baseline training script
command: python train.py --lr 0.001 --epochs 50
workdir: /path/to/project
</spawn_command>
<!-- Redirect an active experiment -->
<steer_child>
experiment_id: orchestrator-abc123
context: Focus on reward shaping instead of policy gradient
</steer_child>
<!-- Stop an experiment -->
<stop_child>
experiment_id: orchestrator-abc123
</stop_child>
Parsing detail: The parser uses _last_match() (last regex occurrence) to
avoid matching template examples in the prompt itself — only the LLM's actual
output is parsed.
2.4 Prompt Construction (build_wild_prompt)
┌─────────────────────────────────────────────────────────┐
│ Skill Template (SKILL.md) │
│ ┌───────────────────────────────────────────────────┐ │
│ │ # Session Agent │ │
│ │ ...role instructions... │ │
│ │ ## Working Directory │ │
│ │ {{workdir}} │ │
│ │ ## Active Experiments │ │
│ │ {{children_status}} │ │
│ │ {{experiment_context}} │ │
│ │ ## Memory Bank │ │
│ │ {{memories}} │ │
│ │ ## Recent Conversation │ │
│ │ {{conversation_history}} │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ + [USER] {user_message} │
└─────────────────────────────────────────────────────────┘
Context variables filled dynamically:
children_status — markdown list of active/completed L2 experiments
memories — last 20 REFLECTION entries (cross-session memory bank)
experiment_context — active runs count, finished/failed counts, alerts
workdir — absolute path to project root
conversation_history — last 10 messages (role + content, truncated)
2.5 L1 Tools Available via OpenCode
The LLM running inside OpenCode has access to OpenCode's built-in tools:
bash, read, glob, grep, edit, write, task, webfetch,
websearch, codesearch, skill, and others.
Additionally, MCP tools registered in opencode.json provide:
quick_bash — shell execution with timeout (never hangs)
list_directory — instant directory listing (pure Python)
create_run / start_run — run management via the server API
Important: MCP tools require the MCP server process to start successfully.
The opencode.json must point to the correct Python interpreter (the project
venv at .ra-venv/bin/python, not the system Python).
3. Level 2 — ResearchAgent (Autonomous Research Loop)
3.1 Overview
The L2 ResearchAgent runs as an OS subprocess spawned by the L1 via
Runtime. It operates a full autonomous research loop: plan → iterate → reflect.
Each L2 instance handles one research experiment.
┌──────────────────────────────────────────────────────────────────────┐
│ Level 2 (L2) — ResearchAgent [subprocess, one per │
│ role = "orchestrator" experiment] │
│ allowed_children = {executor} │
│ monitor_interval = 5.0s │
│ │
│ Configuration (ResearchAgentConfig): │
│ opencode_url, model_provider, model_id, workdir, │
│ server_url, skills_dir, max_iterations, wait_seconds, │
│ autonomy_level, evo_sweep_enabled │
│ │
│ Research Loop: │
│ Phase 0: Plan (one-shot LLM → task decomposition) │
│ Phase 1+: Iterate (design → spawn L3s → collect results) │
│ Reflection: learn + decide continue/stop │
│ │
│ LLM Integration: │
│ Direct HTTP to OpenCode (own sessions, own SSE streams) │
│ Prompt templates from PromptSkillManager (loaded from disk) │
│ │
│ L2 = BRAIN: designs experiments, analyzes results, reflects │
│ L3 = HANDS: executes experiments (subprocess commands) │
└──────────────────────────────────────────────────────────────────────┘
3.2 The Research Loop
┌─────────────────────────────────────────────────────────────────┐
│ RESEARCH LOOP (L2) │
│ │
│ ┌──────────────────┐ │
│ │ Phase 0: PLAN │ One-shot LLM call │
│ │ Iteration = 0 │ → Parse <plan>, write tasks.md │
│ │ │ → git commit │
│ │ │ → PLAN entry to MemoryView │
│ └────────┬─────────┘ │
│ │ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Phase 1+: ITERATE (until DONE or max_iterations) │ │
│ │ │ │
│ │ 1. check_pause() + consume_steer() │ │
│ │ 2. Snapshot files (git ls-files) │ │
│ │ 3. Build iteration prompt (memories, steer, plan, ctx) │ │
│ │ 4. Send to OpenCode LLM │ │
│ │ 5. Parse response: │ │
│ │ - <promise>DONE|WAITING|...</promise> │ │
│ │ - <plan>updated tasks</plan> │ │
│ │ - <summary>what happened</summary> │ │
│ │ - <experiment> specs (optional) │ │
│ │ 6. If <experiment> tags: │ │
│ │ → spawn L3 ExecutorAgents per spec │ │
│ │ → poll 5s intervals, read run.log │ │
│ │ → format results for next iteration │ │
│ │ 7. Diff files, count errors, track progress │ │
│ │ 8. git commit, append iteration_log.md │ │
│ │ 9. Write entries to MemoryView, emit SSE events │ │
│ │ │ │
│ │ If promise == "DONE" → enter Reflection │ │
│ │ If promise == "WAITING" → sleep(wait_seconds) │ │
│ │ Else → sleep(2s), continue │ │
│ └──────────┬───────────────────────────────────────────────┘ │
│ │ │
│ ↓ (on DONE) │
│ ┌──────────────────┐ │
│ │ REFLECTION │ One-shot LLM call │
│ │ │ → Parse <reflection>, <continue>, │
│ │ │ <memories> with [tag] prefix │
│ │ │ → Store as REFLECTION entries │
│ │ │ → If continue=yes: back to Phase 1 │
│ │ │ → If continue=no: session done │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
3.3 OpenCode Interaction (L2)
The L2 communicates with the LLM via direct HTTP to OpenCode — separate from
the L1's OpenCode sessions:
_run_opencode_prompt(prompt)
├── 1. Create session: POST /session?directory={workdir}
├── 2. Connect SSE: GET /global/event (before prompt, no missed events)
├── 3. Send prompt: POST /session/{id}/prompt_async
│ payload = {model: {providerID, modelID}, parts: [{type: "text", text: ...}]}
├── 4. Stream events:
│ message.part.delta → incremental text
│ message.part.updated → full text snapshots
│ session.status: idle → done
└── 5. Return (full_text, opencode_session_id)
Timeouts:
- Time to first token: 180s
- Total response: 600s
- SSE line idle: 120s
3.4 Prompt Templates
L2 prompts are rendered from SKILL.md templates via PromptSkillManager:
skills/prompts/
wild_v2_planning/SKILL.md → Phase 0 (planning)
wild_v2_iteration/SKILL.md → Phase 1+ (iteration)
wild_v2_reflection/SKILL.md → Reflection
Each template has {{variable}} placeholders filled by PromptContext:
@dataclass
class PromptContext:
goal: str
iteration: int
max_iterations: int
session_id: str
workdir: str
tasks_path: str # .agents/wild/{session_id}/tasks.md
log_path: str # .agents/wild/{session_id}/iteration_log.md
steer_context: str # user steering (if any)
history: list[dict] # all iteration records
no_progress_streak: int # consecutive no-change iterations
short_iteration_count: int
autonomy_level: str # "cautious" | "balanced" | "full"
memories_text: str # REFLECTION entries from MemoryView
evo_sweep_enabled: bool
3.5 Session State (ResearchSession)
@dataclass
class ResearchSession:
session_id: str
goal: str
status: str = "running" # running | paused | done | failed
iteration: int = 0
max_iterations: int = 25
plan: str = ""
history: list = [] # iteration records
started_at: float = 0.0
finished_at: float | None = None
steer_context: str = ""
opencode_sessions: list = [] # track all OC session IDs used
reflection: str = ""
no_progress_streak: int = 0
short_iteration_count: int = 0
Persisted to .agents/wild/{session_id}/state.json on each iteration.
4. Level 3 — ExecutorAgent (Task Execution)
┌──────────────────────────────────────────────────────────────────────┐
│ Level 3 (L3) — ExecutorAgent [subprocess, leaf node, │
│ role = "executor" one per task/run] │
│ allowed_children = frozenset() (LEAF — cannot spawn) │
│ monitor_interval = 3.0s (auto-enabled for subprocess backend) │
│ │
│ Backends: │
│ "subprocess" — asyncio.create_subprocess_shell(command) │
│ "callback" — await config["callback"](goal) │
│ "mock" — simulated delay + canned result (tests) │
│ │
│ Subprocess Features: │
│ - GPU detection via gpuwrap_detect.py │
│ - Retry on GPU conflict (CUDA busy / OOM patterns) │
│ - Stdout streaming to run.log │
│ - WANDB env var injection │
│ - Exit code → RESULT entry │
│ │
│ Watchdog (on_monitor, every 3s): │
│ _tail_metrics() — reads agent_metrics.jsonl from run dir │
│ _check_anomalies(): │
│ - NaN/Inf in loss → CRITICAL alert │
│ - High loss (>8.0) → WARNING alert │
│ - Loss spike (>3x rolling avg) → WARNING alert │
│ Writes ALERT entries to FileStore on anomaly detection │
│ │
│ Output files: │
│ .agents/runs/{agent_id}/command.txt — shell command │
│ .agents/runs/{agent_id}/run.log — stdout/stderr stream │
│ .agents/runs/{agent_id}/job.done — exit code marker │
│ .agents/runs/{agent_id}/agent_metrics.jsonl — training metrics │
└──────────────────────────────────────────────────────────────────────┘
5. Scheduler — GPU-Aware Executor Queue
5.1 Overview
The Scheduler sits between spawn requests and actual process creation for
executor (L3) agents only. L1→L2 spawning bypasses it entirely.
spawn_child(ExecutorAgent, goal, gpu_required=2)
│
▼
┌───────────────────┐
│ Scheduler │
│ │
│ FIFO Queue │ ← file-persisted (.json per request)
│ GPU Tracker │ ← nvidia-smi at init, tracks allocations
│ Policy Engine │ ← pluggable (default: FIFOPolicy)
│ │
│ Tick loop (~1s): │
│ policy.select_ │
│ next() → │
│ Runtime.spawn() │
└────────┬──────────┘
│
▼
Runtime.spawn(ExecutorAgent, ...)
│
▼
OS Process (L3)
5.2 Key Types
class RequestStatus(str, Enum):
QUEUED = "queued" # waiting for resources
RUNNING = "running" # agent spawned
DONE = "done" # agent finished successfully
FAILED = "failed" # agent failed or request cancelled
REJECTED = "rejected" # impossible to satisfy (e.g. gpu_required > total)
@dataclass
class ScheduleRequest:
request_id: str # "sched-{uuid8}"
agent_cls_path: str # "agentsys.agents.executor.ExecutorAgent"
goal: str
parent_id: str | None
config: dict | None
gpu_required: int = 0
created_at: float
status: RequestStatus
agent_id: str | None # set when spawned
rejection_reason: str | None
@dataclass
class ScheduleTicket:
request_id: str # returned to caller for status queries
5.3 GPU Tracking
class GPUTracker:
total_gpus: int # detect_total_gpus() via nvidia-smi at init
_allocated: dict[str, int] # request_id → gpu count
can_allocate(n) → bool # n <= total - sum(allocated)
allocate(request_id, n) # reserve GPUs
release(request_id) # free GPUs (on agent DONE/FAILED)
GPU detection runs nvidia-smi --list-gpus at scheduler init. Falls back to
0 on systems without GPUs — all gpu_required=0 requests pass through
immediately.
5.4 Scheduling Policy
class SchedulingPolicy(ABC):
def select_next(queue, running, gpu_tracker) → request_id | None
class FIFOPolicy(SchedulingPolicy):
# Returns first queued request where gpu_required <= gpu_tracker.available
# Skips requests that need more GPUs than currently free
The policy abstraction enables future algorithms (priority queues, fair
scheduling, deadline-aware) without changing the Scheduler core.
5.5 Scheduler Lifecycle
_init_agentsys() lifespan startup
│ │
├── Scheduler(runtime, store_dir) ├── scheduler.start()
│ ├── detect_total_gpus() │ └── create_task(_tick_loop)
│ ├── mkdir _scheduler/ │
│ └── _load_from_disk() │
│ └── orphaned RUNNING │
│ requests → FAILED │
│ │
└── runtime._scheduler = scheduler │
│
lifespan shutdown
│
└── scheduler.shutdown()
└── cancel tick task
5.6 Request Flow
1. Agent.spawn_child(ExecutorAgent, goal, gpu_required=2)
│
├── scheduler.submit(cls, goal, parent_id, config, gpu_required)
│ ├── If gpu_required > total_gpus → REJECTED immediately
│ ├── Create ScheduleRequest (status=QUEUED)
│ ├── Persist to {store_dir}/{request_id}.json
│ └── Return ScheduleTicket
│
└── Return ScheduledChildHandle(ticket, scheduler)
2. Tick loop (every ~1s):
├── Collect QUEUED requests, sort by created_at
├── policy.select_next(queue, running, gpu_tracker)
│ └── FIFOPolicy: first request where gpu fits
├── If selected:
│ ├── Runtime.spawn(cls, goal, parent_id, config)
│ ├── gpu_tracker.allocate(request_id, gpu_required)
│ ├── request.status = RUNNING, request.agent_id = child.id
│ └── Persist updated request
└── If nothing fits: no-op (try again next tick)
3. Agent completes (DONE/FAILED event in Runtime):
├── Runtime._handle_event notifies scheduler.on_agent_done()
├── gpu_tracker.release(request_id)
├── request.status = DONE or FAILED
└── Persist — freed GPU enables next queued request
5.7 ScheduledChildHandle
Duck-type compatible with ChildHandle. Returned when an executor spawn goes
through the scheduler.
class ScheduledChildHandle:
.id → request_id (while queued) or agent_id (once spawned)
.status → IDLE (queued), RUNNING, DONE, FAILED (maps to AgentStatus)
.wait() → polls scheduler then agent status
.cancel() → scheduler.cancel() if queued, runtime.stop() if running
5.8 Persistence
Each request is atomically persisted as {store_dir}/{request_id}.json:
{
"request_id": "sched-a1b2c3d4",
"agent_cls_path": "agentsys.agents.executor.ExecutorAgent",
"goal": "train model v2",
"parent_id": "orchestrator-abc12345",
"config": {"command": "python train.py", "workdir": "/exp"},
"gpu_required": 2,
"created_at": 1740000000.0,
"status": "queued",
"agent_id": null,
"rejection_reason": null
}
On restart, _load_from_disk() restores all requests. Requests that were
RUNNING when the scheduler crashed are marked FAILED (orphaned processes).
5.9 Cleanup
cleanup_parent(parent_id) — cancels all QUEUED requests for a dead parent.
Called automatically by Runtime.stop().
cancel(request_id) — cancels a single QUEUED request.
5.10 API Endpoints
| Endpoint |
Method |
Purpose |
/scheduler/queue |
GET |
List requests (filter by parent_id, status) |
/scheduler/queue/{request_id} |
GET |
Get single request |
/scheduler/stats |
GET |
GPU availability + queue summary |
5.11 What Does NOT Go Through the Scheduler
- L1→L2 spawning —
cls.role == "orchestrator" bypasses the scheduler.
- In-process agents — L1
SessionAgent registered via register_local().
- Sidecar agents —
cls.role == "sidecar" bypasses the scheduler.
Only cls.role == "executor" triggers the scheduler path, and only when
runtime._scheduler is set.
6. Runtime & IPC
6.1 Runtime — The Multiprocess Supervisor
Runtime sits between every parent-child pair. It is not an agent — it
manages OS processes, IPC queues, and the shared FileStore.
Runtime (server process)
┌──────────────────────┐
│ FileStore │ ← shared filesystem
│ _processes {} │ ← OS Process objects
│ _cmd_queues {} │ ← supervisor → worker
│ _event_queues {} │ ← worker → supervisor
│ _agent_meta {} │ ← in-memory cache
│ _local_agents {} │ ← in-process (L1 only)
│ _event_listener │ ← async polling task
└──────────┬───────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Worker │ │ Worker │ │ Worker │
│ Process │ │ Process │ │ Process │
│ (L2) │ │ (L3) │ │ (L3) │
└─────────┘ └─────────┘ └─────────┘
Multiprocessing model: Uses multiprocessing.get_context("spawn") to avoid
fork-related deadlocks. Each worker gets its own asyncio event loop, its own
FileStore instance (same disk path), and a RuntimeProxy for IPC.
spawn() — Creating a Child Process
Runtime.spawn(cls, goal, parent_id, config, session, sweep, run)
├── 1. Validate hierarchy (parent.allowed_child_roles includes cls.role)
├── 2. Generate agent_id = f"{cls.role}-{uuid4().hex[:8]}"
├── 3. Build scope (inherited from parent)
├── 4. Create IPC queues (cmd_queue, event_queue)
├── 5. Write initial .meta.json to FileStore
├── 6. Create OS Process(target=run_worker, args=(...))
├── 7. process.start()
├── 8. Start event listener (if not running)
└── 9. Return AgentInfo (lightweight metadata)
register_local() — In-Process Agent (L1 only)
Runtime.register_local(agent, goal, config)
├── Wire agent._runtime = self (direct reference, not proxy)
├── No subprocess created
└── Caller must await agent.start() separately
6.2 Worker Process (worker.py)
run_worker(cls_path, agent_id, goal, config, scope_dict, ...)
└── asyncio.run(_async_worker(...))
├── Import agent class: path_to_cls(cls_path)
├── Instantiate: agent = cls()
├── Create FileStore + MemoryView
├── Wire agent: id, goal, config, scope, memory, _runtime=RuntimeProxy
├── Write "spawned" CONTEXT entry
├── Start command listener (polls cmd_queue)
├── Send STARTED event
├── await agent.start() → await agent._task
├── Send DONE or FAILED event
└── Write final .meta.json
6.3 RuntimeProxy — Agent-Side IPC
Workers communicate with the supervisor through RuntimeProxy:
Agent code RuntimeProxy Supervisor Runtime
────────── ──────────── ────────────────
spawn_child(cls, goal)
└──→ proxy.spawn(...)
├── event_queue.put(SPAWN_REQUEST) ──→ _handle_event()
├── await future (blocks) ├── Runtime.spawn()
│ └── cmd_queue.put(SPAWN_RESPONSE)
└── ← future.set_result(payload) ←── _listen_commands()
return AgentInfo
stop(agent_id)
└──→ proxy.stop(...)
└── event_queue.put(STOP_REQUEST) ──→ Runtime.stop(agent_id)
6.4 ChildHandle — Cross-Process Child Reference
When an agent spawns a child, it gets a ChildHandle:
class ChildHandle:
id: str
_task: asyncio.Task | None # in-process mode (L1 → L2)
_agent: Agent | None # in-process reference
_store: FileStore | None # multiprocess mode (reads .meta.json)
_runtime_proxy: RuntimeProxy # multiprocess IPC
@property
def status(self) -> AgentStatus:
# In-process: direct agent.status
# Multiprocess: read .meta.json
async def wait(self, timeout=None):
# In-process: await asyncio.shield(task)
# Multiprocess: poll .meta.json until DONE/FAILED
async def cancel(self):
# runtime.stop(id) or runtime_proxy.stop(id)
6.5 IPC Message Types
Supervisor → Worker (Command via cmd_queue):
| CmdType |
Payload |
Effect |
STOP |
{} |
agent.stop() |
PAUSE |
{} |
agent.pause() |
RESUME |
{} |
agent.resume() |
STEER |
{context, urgency} |
agent._inject_steer() |
SPAWN_RESPONSE |
{request_id, child_id, ...} |
Fulfills pending spawn future |
Worker → Supervisor (Event via event_queue):
| EventType |
Payload |
Effect |
STARTED |
{} |
Status → RUNNING |
DONE |
{} |
Status → DONE |
FAILED |
{error} |
Status → FAILED |
SPAWN_REQUEST |
{request_id, cls_path, goal, ...} |
Runtime.spawn() |
STOP_REQUEST |
{target_id} |
Runtime.stop() |
6.6 Agent ABC — Lifecycle
┌─────────────────────────────────────────────┐
│ IDLE ──start()──→ RUNNING ──→ DONE │
│ │ ↑ ↑ │
│ pause()│ │resume()│stop() │
│ ↓ │ │ │
│ PAUSED ───────┘ │
│ │ │
│ exception │
│ ↓ │
│ FAILED │
└─────────────────────────────────────────────┘
start()
└── status = RUNNING → on_start() → create_task(_run_loop())
_run_loop()
├── start _watchdog_loop() if monitor_interval > 0
├── await run() ← ABSTRACT (subclass implements)
├── status = DONE (or FAILED on exception)
└── await on_stop()
_watchdog_loop()
└── while RUNNING/PAUSED: sleep → check_pause() → on_monitor()
7. Memory System
7.1 Overview
┌───────────────┐ ┌───────────────┐ ┌──────────────────────┐
│ Agent │ │ MemoryView │ │ FileStore │
│ │ │ │ │ │
│ self.memory │────→│ .write() │────→│ atomic JSON write │
│ .write() │ │ .read_self() │ │ to disk │
│ .query() │ │ .assemble_ │ │ │
│ │ │ context() │ │ .query() scans dirs │
│ │ │ │ │ with 2-phase filter │
│ │ │ auto-fills: │ │ │
│ │ │ agent_id │ │ Dir structure: │
│ │ │ project │ │ {root}/{project}/ │
│ │ │ session │ │ {agent_id}/ │
│ │ │ sweep │ │ .meta.json │
│ │ │ run │ │ {entry}.json │
│ │ │ role │ │ │
└───────────────┘ └───────────────┘ └──────────────────────┘
7.2 Entry — The Atomic Unit
@dataclass
class Entry:
key: str # auto-generated 12-char hex
agent_id: str # who produced this
target_id: str | None # recipient (MESSAGE type only)
type: EntryType # PLAN, CONTEXT, METRICS, ALERT, RESULT, REFLECTION, MESSAGE, RAW_FILE
project: str # always set
session: str | None # L1 scope
sweep: str | None # L2 scope
run: str | None # L3 scope
role: str # producer's role
tags: list[str] # free-form labels
data: dict # arbitrary JSON payload
created_at: float # unix timestamp
| EntryType |
Written By |
Purpose |
PLAN |
L2 |
Task decomposition |
CONTEXT |
Any |
Lifecycle events, status |
METRICS |
L3 |
Training metrics (jsonl rows) |
ALERT |
L3 |
Anomaly detection (NaN, loss spike) |
RESULT |
L3 |
Task output (exit code, run dir) |
REFLECTION |
L2 |
Lessons learned (persists across sessions) |
MESSAGE |
Any → Any |
Inter-agent communication |
RAW_FILE |
External |
Log files, scripts, artifacts |
7.3 FileStore — Flat File KV on Disk
{store_root}/{project}/{agent_id}/
.meta.json
{timestamp_ms}_{entry_type}_{12char_key}.json
- Atomic writes:
.tmp → os.rename() (no partial reads)
- 2-phase query: Phase 1 scans filenames (fast type/time filter), Phase 2
reads JSON for scope/tag filtering
- Filename encoding embeds type for fast filtering without JSON parse
7.4 Hierarchical Scope & Context Assembly
Level 0: project (always present)
Level 1: session (research session)
Level 2: sweep (parameter sweep)
Level 3: run (individual training run)
Scope inheritance on spawn:
SessionAgent scope = {project: "mlexp"}
└── ResearchAgent scope = {project: "mlexp", session: "sess-42"}
└── ExecutorAgent scope = {project: "mlexp", session: "sess-42", run: "run-7"}
Context assembly (MemoryView.assemble_context(token_budget)) builds
prompt context by widening scope from narrow to broad:
Step 1: "My Entries" — agent's own entries (chronological)
Step 2: Widen by scope:
├── Sweep level: sibling runs in same sweep
├── Session level: other sweeps in same session
└── Project level: cross-session learnings (memory bank)
Step 3: Truncate to token budget
8. Communication Protocol
8.1 Three Tiers of Steer
| Tier |
Mechanism |
Delivery |
Effect |
| CRITICAL |
Runtime.steer(id, ctx, CRITICAL) |
Kill + respawn |
Agent restarts with [STEER: ctx] appended to goal |
| PRIORITY |
Runtime.steer(id, ctx, PRIORITY) |
IPC queue → buffer |
Agent reads at next consume_steer() |
| MESSAGE |
agent.msg(target, data) |
FileStore entry |
Target reads from inbox on next context assembly |
8.2 Event Relay — Runtime to Frontend
Agent._emit_event("iteration", {session_id, iteration, summary})
│
├── In-process: runtime.emit_event() → EventRelay.emit()
│ └── SSE listeners (GET /agents/events)
│
└── Subprocess: event_queue → runtime._handle_event()
└── EventRelay.emit() → SSE listeners
EventRelay maintains a ring buffer of the last 1000 events. SSE clients can
reconnect and replay from a timestamp via recent(n, since).
8.3 Stop Cascade
When any agent is stopped, Runtime cascades to all descendants (leaves first):
stop(orchestrator-xyz)
├── Collect descendants (BFS)
├── Reverse order (leaves first)
└── For each: send STOP → join(5s) → terminate() if alive
9. Chat System Integration
9.1 ChatStreamRuntime
The in-memory object tracking a live streaming response:
class ChatStreamRuntime:
session_id: str
run_id: str # unique per response
status: str # "running" | "completed" | "failed" | "stopped"
events: list[dict] # sequenced event log
next_sequence: int
subscribers: set[Queue] # live SSE client queues
full_text: str # accumulated LLM text
full_thinking: str # accumulated reasoning
parts_accumulator: StreamPartsAccumulator
started_at: float
updated_at: float
9.2 StreamPartsAccumulator
Collects ordered message parts from the OpenCode SSE stream:
- Text parts: accumulated from
part_delta events
- Thinking parts: accumulated from reasoning deltas
- Tool parts: keyed by ID, state updates amend existing entry
snapshot() returns immutable ordered list for persistence
finalize() flushes buffers and returns final parts list
9.3 SSE Event Format (NDJSON to frontend)
{"seq": 1, "type": "provenance", "skill_id": "l1_auto_session", ...}
{"seq": 2, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "I'll "}
{"seq": 3, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "investigate..."}
{"seq": 4, "type": "part_update", "id": "prt_2", "ptype": "tool", "name": "bash", ...}
{"seq": 5, "type": "l1_action", "action_type": "spawn", "goal": "...", "agent_id": "..."}
{"seq": 6, "type": "session_status", "status": "idle"}
9.4 Persistence
Chat state is saved to {WORKDIR}/.agents/chat_data.json:
{
"chat_sessions": {
"session_id": {
"title": "Optimize learning rate",
"created_at": 1771745054.69,
"messages": [
{"role": "user", "content": "...", "timestamp": ...},
{"role": "assistant", "content": "...", "thinking": "...", "parts": [...]}
],
"opencode_session_id": "ses_abc123",
"model_provider": "opencode",
"model_id": "minimax-m2.5-free",
"active_stream": { ... } // snapshot during streaming, removed on complete
}
}
}
10. MCP Tools
10.1 Research Agent MCP Server
Registered in opencode.json as a local MCP server, providing tools the LLM
can call during its response:
"research-agent": {
"type": "local",
"command": ["./.ra-venv/bin/python", "./server/tools/research_agent_mcp.py"],
"enabled": true
}
Tools:
| Tool |
Purpose |
Key Params |
quick_bash |
Shell command with timeout (never hangs) |
command, workdir, timeout_seconds (max 120) |
list_directory |
Instant directory listing (pure Python) |
path, include_hidden |
create_run |
Create a run via server API |
name, command, workdir, launch_policy |
start_run |
Start an existing run |
run_id |
10.2 SLURM MCP Server
For HPC cluster environments:
"slurm-cli": {
"type": "local",
"command": ["./.ra-venv/bin/python", "./server/tools/slurm_mcp_cli.py"],
"enabled": true
}
Provides wrappers for squeue, sinfo, sbatch, scancel, sacct, scontrol.
10.3 Context7 (Remote)
"context7": {
"type": "remote",
"url": "https://mcp.context7.com/mcp",
"enabled": true
}
11. Server Bootstrap
11.1 Startup Sequence
python server.py --workdir ../../sandbox-wild-test --port 10000
│
├── Parse arguments (workdir, port, host, tmux-session)
├── init_paths(workdir)
│ → WORKDIR, DATA_DIR, CHAT_DATA_FILE, etc.
│ → mkdir .agents/, .agents/runs/
│
├── _init_agentsys()
│ ├── Create FileStore at .agents/store
│ ├── Create Runtime (multiprocess supervisor)
│ ├── Bridge Runtime events → EventRelay
│ └── Initialize route modules with runtime refs
│
├── Load persisted state
│ ├── chat_data.json → chat_sessions
│ ├── jobs.json → runs, sweeps
│ ├── alerts.json → alerts
│ ├── plans.json → plans
│ ├── journey_state.json → journey
│ └── settings.json → cluster config
│
└── uvicorn.run(app, host, port)
11.2 Key Configuration
| Variable |
Default |
Source |
MODEL_PROVIDER |
"opencode" |
env MODEL_PROVIDER |
MODEL_ID |
"minimax-m2.5-free" |
env MODEL_ID |
OPENCODE_URL |
"http://127.0.0.1:4096" |
env OPENCODE_URL |
OPENCODE_USERNAME |
"opencode" |
env OPENCODE_SERVER_USERNAME |
USER_AUTH_TOKEN |
None |
env RESEARCH_AGENT_USER_AUTH_TOKEN |
TMUX_SESSION_NAME |
"research-agent" |
env RESEARCH_AGENT_TMUX_SESSION |
SERVER_CALLBACK_URL |
"http://127.0.0.1:10000" |
computed |
12. File Inventory
Core Framework (agentsys/)
| File |
Purpose |
types.py |
AgentStatus, EntryType, SteerUrgency, Entry, Steer, Scope, AgentInfo |
agent.py |
Agent ABC, ChildHandle, lifecycle, steer, spawn_child |
runtime.py |
Multiprocess supervisor, spawn, stop, steer routing, event listener |
worker.py |
Subprocess entry point, wires agent + proxy + store |
proxy.py |
RuntimeProxy — agent-side IPC (spawn requests, command listener) |
ipc.py |
CmdType, EventType, Command, Event, cls_to_path, path_to_cls |
filestore.py |
Flat file KV store, atomic writes, 2-phase query |
memory.py |
MemoryView — scoped interface, context assembly |
Agent Implementations (agentsys/agents/)
| File |
Level |
Purpose |
session_agent.py |
L1 |
LLM-enabled orchestrator, per-session, build_wild_prompt, handle_llm_response |
research_agent.py |
L2 |
Autonomous research loop (plan → iterate → reflect), OpenCode integration |
executor.py |
L3 |
Subprocess runner, GPU handling, metrics tailing, anomaly detection |
Chat System (chat/)
| File |
Purpose |
routes.py |
Chat CRUD, POST /chat, _chat_worker, synthetic turns, mode routing |
streaming.py |
OpenCode SSE parsing, ChatStreamRuntime, StreamPartsAccumulator, event broadcasting |
Agent Routes & Support (agent/)
| File |
Purpose |
wild_routes.py |
Per-session L1 registry, wild mode endpoints (start/stop/steer/status) |
v2_prompts.py |
PromptContext, XML tag parsers (spawn_research, steer_child, etc.) |
core/event_relay.py |
EventRelay: Runtime events → SSE stream |
runtime_routes.py |
SSE endpoint (GET /agents/events), agent management |
Prompt Templates (skills/prompts/)
| Directory |
Purpose |
l1_wild_session/ |
L1 prompt: forced research mode |
l1_auto_session/ |
L1 prompt: agent-decides mode |
l1_agent_session/ |
L1 prompt: direct tool use mode |
wild_v2_planning/ |
L2 prompt: Phase 0 planning |
wild_v2_iteration/ |
L2 prompt: Phase 1+ iteration |
wild_v2_reflection/ |
L2 prompt: reflection |
MCP Tools (tools/)
| File |
Purpose |
research_agent_mcp.py |
FastMCP server: quick_bash, list_directory, create_run, start_run |
slurm_mcp_cli.py |
FastMCP server: SLURM job management wrappers |
Configuration (core/)
| File |
Purpose |
config.py |
Paths, auth, model helpers, OpenCode config loading |
state.py |
Global state dicts, JSON persistence (chat, runs, alerts, etc.) |
Data Files (.agents/)
{workdir}/.agents/
chat_data.json # chat sessions + messages
jobs.json # runs + sweeps
alerts.json # active alerts
settings.json # cluster config
plans.json # plans
journey_state.json # journey events
store/{project}/ # FileStore root
{agent_id}/
.meta.json
{timestamp}_{type}_{key}.json
wild/{session_id}/ # L2 research session state
state.json
history.json
tasks.md
iteration_log.md
runs/{agent_id}/ # L3 executor output
command.txt
run.log
job.done
agent_metrics.jsonl
AgentSys— Backend Parrellel Agent System Design Document
0. Holistic View
The Research Agent backend is a 3-level multi-process agent system built on
top of a FastAPI server and an external LLM gateway (OpenCode). A user chats in
the browser; every message flows through an LLM-powered Level 1 agent that
decides — based on its operating mode — whether to answer directly, spawn
autonomous research experiments, or manage running children.
1. End-to-End Flow
1.1 The Big Picture
1.2 Detailed Chat Request Flow
Step 1 — User sends message:
Step 2 — Build L1 prompt:
l1_wild_session,l1_auto_session,or
l1_agent_session)SessionAgentinstanceagent.build_wild_prompt()which renders the skill template with:{{children_status}}— active/completed L2 experiments{{memories}}— reflection bank from past sessions{{experiment_context}}— runs/sweeps/alerts summary{{workdir}}— project root directory{{conversation_history}}— last 10 chat messages[USER] {message}to rendered promptStep 3 — Create ChatStreamRuntime:
run_id, initializes event list, subscriber queuesactive_chat_streams[session_id]Step 4 — Background worker sends prompt to OpenCode:
POST {OPENCODE_URL}/session?directory={workdir}POST {OPENCODE_URL}/session/{id}/prompt_asyncwith
{model: {providerID, modelID}, parts: [{type: "text", text: prompt}]}GET {OPENCODE_URL}/global/eventmessage.part.delta→ incremental textmessage.part.updated→ tool call statesession.status: idle→ doneStep 5 — Stream to frontend:
chat_data.json(debounced every 0.75s)Step 6 — Post-process L1 response (agent/wild/auto modes only):
agent.handle_llm_response(full_text)which parses XML action tags:<spawn_research>→agent.start_experiment()→ spawns L2 ResearchAgent<spawn_command>→agent.start_run()→ spawns L3 ExecutorAgent<steer_child>→runtime.steer()→ injects context into running L2<stop_child>→runtime.stop()→ terminates running agentStep 7 — Finalize:
session_status: idleterminal event1.3 Synthetic Chat Turns (Reactive)
When an L2/L3 child finishes, the system automatically generates a follow-up:
The frontend sees a continuous SSE stream — the user gets automatic result
summaries without refreshing or polling.
2. Level 1 — SessionAgent (LLM-Enabled Orchestrator)
2.1 Overview
The L1 SessionAgent is the user-facing gateway. Unlike a traditional
orchestrator that just routes HTTP requests, the L1 has an LLM brain —
every user message goes through OpenCode, and the LLM's structured response
drives what happens next.
2.2 The Three Modes
l1_wild_sessionl1_auto_sessionl1_agent_sessionWILD mode — the LLM MUST output
<spawn_research>for every request. Noexceptions. The agent is purely a delegation gateway. Use when you want
autonomous research for all tasks.
AUTO mode — the LLM freely chooses between using tools directly (bash, file
read/write, code editing via OpenCode) and spawning research experiments. The
prompt gives heuristics: spawn research if the task needs iteration, try
multiple approaches, or would take more than ~3 tool calls. Use for balanced
autonomy.
AGENT mode — the LLM uses tools directly to answer questions and complete
tasks. No research spawning. The agent handles everything in the conversation
turn. Use for interactive debugging and development.
2.3 XML Action Tags
The L1's response is parsed for structured action blocks:
Parsing detail: The parser uses
_last_match()(last regex occurrence) toavoid matching template examples in the prompt itself — only the LLM's actual
output is parsed.
2.4 Prompt Construction (build_wild_prompt)
Context variables filled dynamically:
children_status— markdown list of active/completed L2 experimentsmemories— last 20 REFLECTION entries (cross-session memory bank)experiment_context— active runs count, finished/failed counts, alertsworkdir— absolute path to project rootconversation_history— last 10 messages (role + content, truncated)2.5 L1 Tools Available via OpenCode
The LLM running inside OpenCode has access to OpenCode's built-in tools:
bash,read,glob,grep,edit,write,task,webfetch,websearch,codesearch,skill, and others.Additionally, MCP tools registered in
opencode.jsonprovide:quick_bash— shell execution with timeout (never hangs)list_directory— instant directory listing (pure Python)create_run/start_run— run management via the server APIImportant: MCP tools require the MCP server process to start successfully.
The
opencode.jsonmust point to the correct Python interpreter (the projectvenv at
.ra-venv/bin/python, not the system Python).3. Level 2 — ResearchAgent (Autonomous Research Loop)
3.1 Overview
The L2 ResearchAgent runs as an OS subprocess spawned by the L1 via
Runtime. It operates a full autonomous research loop: plan → iterate → reflect.
Each L2 instance handles one research experiment.
3.2 The Research Loop
3.3 OpenCode Interaction (L2)
The L2 communicates with the LLM via direct HTTP to OpenCode — separate from
the L1's OpenCode sessions:
3.4 Prompt Templates
L2 prompts are rendered from SKILL.md templates via
PromptSkillManager:Each template has
{{variable}}placeholders filled byPromptContext:3.5 Session State (ResearchSession)
Persisted to
.agents/wild/{session_id}/state.jsonon each iteration.4. Level 3 — ExecutorAgent (Task Execution)
5. Scheduler — GPU-Aware Executor Queue
5.1 Overview
The Scheduler sits between spawn requests and actual process creation for
executor (L3) agents only. L1→L2 spawning bypasses it entirely.
5.2 Key Types
5.3 GPU Tracking
GPU detection runs
nvidia-smi --list-gpusat scheduler init. Falls back to0 on systems without GPUs — all
gpu_required=0requests pass throughimmediately.
5.4 Scheduling Policy
The policy abstraction enables future algorithms (priority queues, fair
scheduling, deadline-aware) without changing the Scheduler core.
5.5 Scheduler Lifecycle
5.6 Request Flow
5.7 ScheduledChildHandle
Duck-type compatible with
ChildHandle. Returned when an executor spawn goesthrough the scheduler.
5.8 Persistence
Each request is atomically persisted as
{store_dir}/{request_id}.json:{ "request_id": "sched-a1b2c3d4", "agent_cls_path": "agentsys.agents.executor.ExecutorAgent", "goal": "train model v2", "parent_id": "orchestrator-abc12345", "config": {"command": "python train.py", "workdir": "/exp"}, "gpu_required": 2, "created_at": 1740000000.0, "status": "queued", "agent_id": null, "rejection_reason": null }On restart,
_load_from_disk()restores all requests. Requests that wereRUNNINGwhen the scheduler crashed are markedFAILED(orphaned processes).5.9 Cleanup
cleanup_parent(parent_id)— cancels all QUEUED requests for a dead parent.Called automatically by
Runtime.stop().cancel(request_id)— cancels a single QUEUED request.5.10 API Endpoints
/scheduler/queueparent_id,status)/scheduler/queue/{request_id}/scheduler/stats5.11 What Does NOT Go Through the Scheduler
cls.role == "orchestrator"bypasses the scheduler.SessionAgentregistered viaregister_local().cls.role == "sidecar"bypasses the scheduler.Only
cls.role == "executor"triggers the scheduler path, and only whenruntime._scheduleris set.6. Runtime & IPC
6.1 Runtime — The Multiprocess Supervisor
Runtime sits between every parent-child pair. It is not an agent — it
manages OS processes, IPC queues, and the shared FileStore.
Multiprocessing model: Uses
multiprocessing.get_context("spawn")to avoidfork-related deadlocks. Each worker gets its own asyncio event loop, its own
FileStore instance (same disk path), and a RuntimeProxy for IPC.
spawn() — Creating a Child Process
register_local() — In-Process Agent (L1 only)
6.2 Worker Process (worker.py)
6.3 RuntimeProxy — Agent-Side IPC
Workers communicate with the supervisor through RuntimeProxy:
6.4 ChildHandle — Cross-Process Child Reference
When an agent spawns a child, it gets a
ChildHandle:6.5 IPC Message Types
Supervisor → Worker (Command via cmd_queue):
STOP{}agent.stop()PAUSE{}agent.pause()RESUME{}agent.resume()STEER{context, urgency}agent._inject_steer()SPAWN_RESPONSE{request_id, child_id, ...}Worker → Supervisor (Event via event_queue):
STARTED{}DONE{}FAILED{error}SPAWN_REQUEST{request_id, cls_path, goal, ...}STOP_REQUEST{target_id}6.6 Agent ABC — Lifecycle
7. Memory System
7.1 Overview
7.2 Entry — The Atomic Unit
PLANCONTEXTMETRICSALERTRESULTREFLECTIONMESSAGERAW_FILE7.3 FileStore — Flat File KV on Disk
.tmp→os.rename()(no partial reads)reads JSON for scope/tag filtering
7.4 Hierarchical Scope & Context Assembly
Context assembly (
MemoryView.assemble_context(token_budget)) buildsprompt context by widening scope from narrow to broad:
8. Communication Protocol
8.1 Three Tiers of Steer
Runtime.steer(id, ctx, CRITICAL)[STEER: ctx]appended to goalRuntime.steer(id, ctx, PRIORITY)consume_steer()agent.msg(target, data)8.2 Event Relay — Runtime to Frontend
EventRelay maintains a ring buffer of the last 1000 events. SSE clients can
reconnect and replay from a timestamp via
recent(n, since).8.3 Stop Cascade
When any agent is stopped, Runtime cascades to all descendants (leaves first):
9. Chat System Integration
9.1 ChatStreamRuntime
The in-memory object tracking a live streaming response:
9.2 StreamPartsAccumulator
Collects ordered message parts from the OpenCode SSE stream:
part_deltaeventssnapshot()returns immutable ordered list for persistencefinalize()flushes buffers and returns final parts list9.3 SSE Event Format (NDJSON to frontend)
{"seq": 1, "type": "provenance", "skill_id": "l1_auto_session", ...} {"seq": 2, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "I'll "} {"seq": 3, "type": "part_delta", "id": "prt_1", "ptype": "text", "delta": "investigate..."} {"seq": 4, "type": "part_update", "id": "prt_2", "ptype": "tool", "name": "bash", ...} {"seq": 5, "type": "l1_action", "action_type": "spawn", "goal": "...", "agent_id": "..."} {"seq": 6, "type": "session_status", "status": "idle"}9.4 Persistence
Chat state is saved to
{WORKDIR}/.agents/chat_data.json:{ "chat_sessions": { "session_id": { "title": "Optimize learning rate", "created_at": 1771745054.69, "messages": [ {"role": "user", "content": "...", "timestamp": ...}, {"role": "assistant", "content": "...", "thinking": "...", "parts": [...]} ], "opencode_session_id": "ses_abc123", "model_provider": "opencode", "model_id": "minimax-m2.5-free", "active_stream": { ... } // snapshot during streaming, removed on complete } } }10. MCP Tools
10.1 Research Agent MCP Server
Registered in
opencode.jsonas a local MCP server, providing tools the LLMcan call during its response:
Tools:
quick_bashcommand,workdir,timeout_seconds(max 120)list_directorypath,include_hiddencreate_runname,command,workdir,launch_policystart_runrun_id10.2 SLURM MCP Server
For HPC cluster environments:
Provides wrappers for
squeue,sinfo,sbatch,scancel,sacct,scontrol.10.3 Context7 (Remote)
11. Server Bootstrap
11.1 Startup Sequence
11.2 Key Configuration
MODEL_PROVIDER"opencode"MODEL_PROVIDERMODEL_ID"minimax-m2.5-free"MODEL_IDOPENCODE_URL"http://127.0.0.1:4096"OPENCODE_URLOPENCODE_USERNAME"opencode"OPENCODE_SERVER_USERNAMEUSER_AUTH_TOKENRESEARCH_AGENT_USER_AUTH_TOKENTMUX_SESSION_NAME"research-agent"RESEARCH_AGENT_TMUX_SESSIONSERVER_CALLBACK_URL"http://127.0.0.1:10000"12. File Inventory
Core Framework (
agentsys/)types.pyagent.pyruntime.pyworker.pyproxy.pyipc.pyfilestore.pymemory.pyAgent Implementations (
agentsys/agents/)session_agent.pyresearch_agent.pyexecutor.pyChat System (
chat/)routes.pystreaming.pyAgent Routes & Support (
agent/)wild_routes.pyv2_prompts.pycore/event_relay.pyruntime_routes.pyPrompt Templates (
skills/prompts/)l1_wild_session/l1_auto_session/l1_agent_session/wild_v2_planning/wild_v2_iteration/wild_v2_reflection/MCP Tools (
tools/)research_agent_mcp.pyslurm_mcp_cli.pyConfiguration (
core/)config.pystate.pyData Files (
.agents/)