Documentation Date: 2026-02-12
Overall: Dual-Phase Autonomous Development Pipeline with Dynamic Complexity Adaptation
Key Characteristics:
- Two-phase pipeline: Spec Creation (planning) → Implementation (building)
- Complexity-adaptive spec creation: SIMPLE (3 phases), STANDARD (6-7 phases), COMPLEX (8 phases)
- Agent-based execution: Each phase runs as a Claude Agent SDK session with phase-specific prompts
- Subtask-based implementation: Planner breaks work into atomic subtasks, Coder executes sequentially
- QA validation loop: Reviewer validates → Fixer resolves → repeats until approval
- Git worktree isolation: Each spec builds in isolated environment on
auto-code/{spec-name}branch - Memory system integration: Graphiti provides cross-session context and pattern suggestions
The multi-agent pipeline consists of two major stages:
┌─────────────────────────────────────────────────────────────────┐
│ SPEC CREATION PHASE │
│ (SpecOrchestrator: apps/backend/spec/pipeline/orchestrator.py)│
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ IMPLEMENTATION PHASE │
│ (Coder Agent: apps/backend/agents/coder.py) │
│ ┌────────────────┐ ┌────────────────┐ ┌─────────────┐ │
│ │ Planner Agent │───▶│ Coder Agent │───▶│ QA Loop │ │
│ │ (plan creation)│ │ (subtask impl.) │ │ (validate+fix)│ │
│ └────────────────┘ └────────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Orchestrator: apps/backend/spec/pipeline/orchestrator.py:SpecOrchestrator
The spec creation pipeline uses dynamic complexity assessment to determine which phases to execute based on task complexity.
| Complexity | Phases | Use Case |
|---|---|---|
| SIMPLE | 3 phases (Discovery → Quick Spec → Validate) | Quick bug fixes, trivial changes |
| STANDARD | 6-7 phases (Discovery → Requirements → [Research] → Context → Spec → Plan → Validate) | Typical feature development |
| COMPLEX | 8 phases (Full pipeline with Research and Self-Critique) | Multi-service features, architectural changes |
Phase 1: Discovery
↓
Phase 2: Requirements
↓
Phase 3: Complexity Assessment (AI-based or heuristic)
↓
[Phase 4: Historical Context] # Only if project has prior specs
↓
[Phase 5: Research] # Only if complexity.research_enabled == True
↓
Phase 6: Context Gathering
↓
Phase 7: Spec Writing
↓
[Phase 8: Self-Critique] # Only if complexity.self_critique_enabled == True
↓
Phase 9: Implementation Planning
↓
Phase 10: Validation
↓
Human Review Checkpoint
| Phase | Purpose | Output | Agent Prompt |
|---|---|---|---|
| Discovery | Analyze project structure, identify files involved | File list, stack detection | prompts/spec_gatherer.md |
| Requirements | Gather user requirements via interactive interview | requirements.json |
prompts/spec_gatherer.md |
| Complexity Assessment | AI determines which phases to run | complexity_assessment.json |
prompts/complexity_assessor.md |
| Historical Context | Review prior specs for patterns | Context summary | prompts/spec_researcher.md |
| Research | Validate external dependencies/APIs | Research findings | prompts/spec_researcher.md |
| Context Gathering | Collect codebase patterns, architecture docs | context.json |
prompts/spec_writer.md |
| Spec Writing | Generate comprehensive specification | spec.md |
prompts/spec_writer.md |
| Self-Critique | Review and refine spec using ultrathink | Refined spec.md |
prompts/spec_critic.md |
| Implementation Planning | Create detailed implementation plan | implementation_plan.json |
prompts/planner.md |
| Validation | Verify spec completeness and correctness | Validation report | spec/validate_pkg/ |
SpecOrchestrator:
- Location:
apps/backend/spec/pipeline/orchestrator.py - Purpose: Coordinates spec creation phases with dynamic phase selection
- Pattern: Orchestrator with complexity-based phase routing
PhaseExecutor:
- Location:
apps/backend/spec/phases/phases.py - Purpose: Executes individual spec creation phases
- Pattern: Phase runner with retry logic (MAX_RETRIES=3)
AgentRunner:
- Location:
apps/backend/spec/pipeline/agent_runner.py - Purpose: Creates Claude Agent SDK sessions for spec phases
- Pattern: Session factory with thinking budget management
Conversation Compaction:
- Location:
apps/backend/spec/compaction.py - Purpose: Summarizes completed phases to provide context to subsequent phases
- Pattern: Phase summarization with target word count (500 words per phase)
Orchestrator: apps/backend/agents/coder.py:run_autonomous_agent()
The implementation pipeline uses subtask-based execution where the Planner agent breaks work into atomic subtasks, and the Coder agent executes them sequentially.
1. Planner Agent (Session 1)
↓
Reads spec.md → Creates implementation_plan.json
↓
Breaks feature into subtasks (atomic, scoped to one service)
↓
Each subtask: description, acceptance criteria, files to modify
↓
2. Coder Agent (Session 2-N)
↓
Iterates through subtasks in order
↓
For each subtask:
- Runs Claude Agent SDK session with subtask context
- Can spawn subagents (via Task tool) for parallel work
- Commits changes after completion
- Updates implementation_plan.json status
↓
3. QA Validation Loop
↓
QA Reviewer Agent validates against acceptance criteria
↓
If issues found: QA Fixer Agent resolves
↓
Loop until approved or max iterations (50)
| Agent | Purpose | Prompt | Session Type |
|---|---|---|---|
| Planner | Create subtask-based implementation plan | prompts/planner.md |
Single session (first) |
| Coder | Implement subtasks, spawn subagents as needed | prompts/coder.md |
Multiple sessions (one per subtask) |
| QA Reviewer | Validate implementation against acceptance criteria | prompts/qa_reviewer.md |
Single session |
| QA Fixer | Fix issues found by QA reviewer | prompts/qa_fixer.md |
Multiple sessions (until approved) |
| Coder Recovery | Recover from stuck/failed subtasks | prompts/coder_recovery.md |
On-demand (when stuck) |
Each subtask in implementation_plan.json contains:
{
"subtask_id": "subtask-1-1",
"title": "Create authentication service",
"status": "pending",
"phase": "Backend Authentication",
"description": "Implement JWT authentication service",
"acceptance_criteria": [
"Users can authenticate with email/password",
"JWT tokens are generated and validated",
"Tokens expire after 24 hours"
],
"files_to_create": [
"apps/backend/services/auth_service.py"
],
"files_to_modify": [
"apps/backend/core/auth.py"
],
"dependencies": [],
"verification_steps": [
"Run authentication tests",
"Verify token generation"
]
}run_autonomous_agent():
- Location:
apps/backend/agents/coder.py - Purpose: Main orchestration loop for implementation
- Pattern: Iterative subtask execution with recovery support
RecoveryManager:
- Location:
apps/backend/agents/recovery.py - Purpose: Tracks agent sessions for resumption after interruption
- Pattern: Session state persistence with recovery checkpoints
Subagent Spawning:
- Mechanism: Coder agent uses Claude SDK
Tasktool to spawn subagents - Decision: Agent autonomously decides when to use parallel work
- Use case: Independent tasks that can run concurrently (e.g., testing multiple platforms)
QA Loop:
- Location:
apps/backend/qa/loop.py - Purpose: Validation loop with reviewer → fixer cycle
- Pattern: Iterative improvement with max iteration limit (50)
- Escalation: Human review after max iterations or recurring issues
Memory Provider: apps/backend/integrations/graphiti/
The multi-agent pipeline integrates with Graphiti (graph-based memory) for cross-session context:
| Pipeline Stage | Memory Usage | Purpose |
|---|---|---|
| Spec Creation | Pattern suggestions | Recommend relevant codebase patterns for feature |
| Planning | Historical context | Access prior spec patterns and gotchas |
| Implementation | Session insights | Store discoveries, gotchas, patterns during build |
| QA | Recurring issues | Detect patterns in bugs to prevent future issues |
Pattern Suggestions:
- Query: "spec creation" or task description
- Returns: Relevant codebase patterns ranked by semantic similarity
- Used by: Spec writer, planner agents
Session Insights:
- Automatic extraction after each agent session
- Categories: Discoveries, Gotchas, Patterns, Optimizations
- Stored in:
.auto-claude/specs/XXX/graphiti/
Memory Queries:
get_graphiti_context()- Retrieve relevant context for sessionget_pattern_suggestions()- Get codebase patterns for featuresave_user_correction()- Store human corrections for learning
Location: apps/backend/prompts/
Each agent type has a dedicated system prompt that defines its role and behavior:
| Prompt | Agent Type | Key Instructions |
|---|---|---|
| planner.md | Planner Agent | Deep codebase investigation, subtask creation (not tests), dependency ordering |
| coder.md | Coder Agent | Implement subtasks, spawn subagents for parallel work, follow patterns |
| coder_recovery.md | Recovery Agent | Detect stuck state, try alternative approaches, escalate if needed |
| qa_reviewer.md | QA Reviewer | Validate acceptance criteria, check for edge cases, E2E testing (Electron) |
| qa_fixer.md | QA Fixer | Fix reported issues, verify fixes, prevent regressions |
| spec_gatherer.md | Spec Gatherer | Interactive requirements gathering, file discovery |
| spec_researcher.md | Spec Researcher | Validate external APIs, dependencies, third-party services |
| spec_writer.md | Spec Writer | Generate comprehensive specs with acceptance criteria |
| spec_critic.md | Spec Critic | Self-critique using ultrathink, refine spec quality |
| complexity_assessor.md | Complexity Assessor | AI-based task complexity evaluation |
User Task (--task "Add user authentication")
↓
SpecOrchestrator.initialize()
↓
[Discovery Phase]
→ Finds: apps/backend/core/auth.py, apps/frontend/src/auth/
→ Detects: JWT library, OAuth integration
↓
[Requirements Phase]
→ Interactive interview
→ Output: requirements.json
↓
[Complexity Assessment]
→ AI analyzes task complexity
→ Output: complexity_assessment.json
→ Determines: Run 6 phases (STANDARD workflow)
↓
[Context Phase]
→ Loads project_index.json
→ Reads: ARCHITECTURE.md, similar features
→ Output: context.json
↓
[Spec Writing Phase]
→ Generates: spec.md
→ Uses pattern suggestions from Graphiti
↓
[Planning Phase]
→ Planner agent creates: implementation_plan.json
→ Subtasks: Auth service → API routes → Frontend login → Tests
↓
[Validation Phase]
→ Validates spec schema
→ Checks acceptance criteria are testable
↓
[Human Review Checkpoint]
→ User reviews and approves
↓
Output: Ready for implementation
python run.py --spec 001
↓
run_autonomous_agent(spec_dir, project_dir)
↓
[Planner Session]
→ Reads: spec.md, requirements.json, context.json
→ Investigates codebase (find, grep, read patterns)
→ Creates: implementation_plan.json with 8 subtasks
↓
[Coder Loop - Subtask 1]
→ Session: "Implement authentication service"
→ Reads: auth patterns in codebase
→ Creates: apps/backend/services/auth_service.py
→ Tests: python -m pytest tests/test_auth.py
→ Commits: "auto-claude: subtask-1-1 - Create authentication service"
→ Updates: implementation_plan.json[subtask-1-1].status = "completed"
↓
[Coder Loop - Subtask 2-N]
→ Repeat for each subtask
→ Can spawn subagents for parallel work (agent decides)
↓
[All Subtasks Complete]
→ Emits phase: BUILD_COMPLETE
↓
[QA Loop - Iteration 1]
→ QA Reviewer validates acceptance criteria
→ Finds: Missing test for token expiration
→ Creates: QA_FIX_REQUEST.md
→ Status: REJECTED
↓
[QA Loop - Iteration 2]
→ QA Fixer resolves issues
→ Adds test for token expiration
→ Commits fix
↓
[QA Loop - Iteration 3]
→ QA Reviewer re-validates
→ All acceptance criteria met
→ Status: APPROVED
↓
Output: Build complete, ready for merge
| File | Purpose | Updated By |
|---|---|---|
spec.md |
Feature specification with acceptance criteria | Spec Writer Agent |
requirements.json |
Structured user requirements | Spec Gatherer Agent |
context.json |
Codebase context and patterns | Context Phase |
implementation_plan.json |
Subtask-based implementation plan | Planner Agent |
complexity_assessment.json |
AI complexity evaluation | Complexity Assessor Agent |
complexity_report.md |
Human-readable complexity report | Spec Orchestrator |
| File | Purpose | Updated By |
|---|---|---|
implementation_plan.json |
Subtask tracking (status, commits, notes) | Coder Agent |
build-progress.txt |
Human-readable build progress | Coder Agent |
qa_report.md |
QA validation results | QA Reviewer Agent |
QA_FIX_REQUEST.md |
Issues to fix (when rejected) | QA Reviewer Agent |
task.log |
Structured event log for debugging | All agents |
RecoveryManager tracks:
- Current agent session ID
- Last completed subtask
- Token usage statistics
- Session interruption points
Stored in: .auto-claude/specs/XXX/.recovery/
Three-layer defense:
- OS Sandbox - Bash command isolation
- Filesystem Permissions - Operations restricted to project directory
- Command Allowlist - Dynamic allowlist from project analysis
Implementation: apps/backend/security/
project_analyzer.py- Detects project stack (Python, Node.js, etc.)security.py- Base + stack-specific command allowliststool_input_validator.py- Validates Claude tool argumentshooks/- Pre-tool-use security hooks
Acceptance Criteria Validation:
- All subtasks marked as completed
- All acceptance criteria verified (manual or automated)
- No console errors (browser, terminal)
- No security vulnerabilities (secrets scan, dependency check)
- Cross-platform compatibility (Windows, macOS, Linux)
E2E Testing (Electron Apps):
- QA agents can use Electron MCP server for automated testing
mcp__electron__take_screenshot- Visual verificationmcp__electron__send_command_to_electron- UI interactionmcp__electron__read_electron_logs- Console log inspection
| Error Type | Handling |
|---|---|
| Phase failure | Retry up to MAX_RETRIES (3) with exponential backoff |
| Agent session error | Log to task.log, continue to next phase if non-critical |
| Validation error | Report errors, halt pipeline for human review |
| User interrupt | Graceful shutdown, save partial state for recovery |
| Error Type | Handling |
|---|---|
| Subtask failure | Coder Recovery Agent attempts alternative approach |
| Stuck subtask | Recovery escalation after MAX_STUCK_COUNT (3) |
| QA rejection | QA Fixer loop (max 50 iterations) |
| Max QA iterations | Escalate to human review with recurring issue summary |
| Git conflict | Abort with instructions to resolve manually |
Session Resumption:
# RecoveryManager restores state
recovery_manager = RecoveryManager(spec_dir, project_dir)
last_session = recovery_manager.get_last_session()
if last_session and not last_session.completed:
# Resume from last checkpoint
session = client.resume_session(last_session.id)Stuck Subtask Detection:
# After 3 consecutive failures on same subtask
if consecutive_failures >= MAX_STUCK_COUNT:
# Spawn coder recovery agent
agent = spawn_recovery_agent(subtask_id)
agent.try_alternative_approach()| Variable | Purpose | Default |
|---|---|---|
GRAPHITI_ENABLED |
Enable Graphiti memory system | true |
LINEAR_ENABLED |
Enable Linear task integration | false |
ELECTRON_MCP_ENABLED |
Enable Electron E2E testing | false |
PROJECT_DIR |
Project root directory (auto-detected) | CWD |
ANTHROPIC_API_KEY |
Claude API authentication | Required |
Thinking Budgets: apps/backend/phase_config.py
| Phase | Default Thinking Budget |
|---|---|
| Spec creation (all phases) | Medium (32K tokens) |
| Planner | High (64K tokens) |
| Coder | None (no extended thinking) |
| QA Reviewer | High (64K tokens) |
| QA Fixer | Medium (32K tokens) |
Model Selection:
- Resolved via API Profile (if configured)
- Fallback to hardcoded shorthands (sonnet, haiku, opus)
- Per-phase model override available
| Metric | Typical Value |
|---|---|
| SIMPLE spec (3 phases) | 2-3 minutes |
| STANDARD spec (6-7 phases) | 5-8 minutes |
| COMPLEX spec (8 phases) | 10-15 minutes |
| Phase summarization | +30 seconds per phase |
| Metric | Typical Value |
|---|---|
| Subtask implementation | 2-10 minutes (varies by complexity) |
| QA iteration | 1-5 minutes |
| Full build (10 subtasks) | 30-60 minutes |
| QA fixer loop (2-3 iterations) | +5-15 minutes |
Parallelization:
- Subagent spawning for independent tasks
- Multi-platform testing via subagents
- Graphiti memory queries are cached
Bottlenecks:
- Sequential subtask execution (by design for dependency management)
- QA validation loop (can require multiple iterations)
- Extended thinking phases (high thinking budget = longer response time)
Task Logger: apps/backend/task_logger.py
- Structured event log:
task.log - Log levels: INFO, WARNING, ERROR, DEBUG
- Phases: PLANNING, IMPLEMENTATION, QA, RECOVERY
- Entry types: PHASE_START, PHASE_END, SUBTASK_COMPLETE, TOOL_USE, ERROR
Event Emission: apps/backend/phase_event.py
- Emits events to frontend for real-time progress
- Events: PHASE_START, PHASE_COMPLETE, SUBTASK_START, SUBTASK_COMPLETE, BUILD_COMPLETE, QA_START, QA_COMPLETE, ERROR
build-progress.txt: Human-readable progress summary:
Build Progress: 3/10 subtasks completed (30%)
Completed:
✓ subtask-1-1: Create authentication service
✓ subtask-1-2: Implement login API endpoint
✓ subtask-1-3: Create JWT token management
In Progress:
→ subtask-2-1: Build login form UI
Pending:
○ subtask-2-2: Connect form to API
○ subtask-2-3: Implement error handling
...
implementation_plan.json: Machine-readable subtask status:
{
"subtasks": [
{
"subtask_id": "subtask-1-1",
"status": "completed",
"commit_hash": "abc123",
"completed_at": "2026-02-12T10:30:00Z"
}
]
}Location: apps/backend/linear_updater.py
When enabled, creates and updates Linear tasks:
- Spec creation: Creates task with spec name
- Implementation start: Updates status to "In Progress"
- QA approval: Updates status to "Done"
- Build failure: Updates status to "Cancelled" with error notes
Location: apps/backend/runners/github/
Automates GitHub Issues and PRs:
- Create issue from spec
- Create PR when build approved
- Link PR to issue
Location: apps/frontend/src/main/mcp/electron/
Allows QA agents to interact with running Electron app:
- Take screenshots
- Click buttons, fill forms
- Navigate routes
- Read console logs
Prerequisites:
ELECTRON_MCP_ENABLED=truein.env- App running with
--remote-debugging-port=9222 - Agent type:
qa_reviewerorqa_fixer
DO:
- Let AI assess complexity (don't override unless necessary)
- Run research phase for external dependencies
- Review generated spec before approval
- Keep acceptance criteria testable and specific
- Use pattern suggestions from Graphiti
DON'T:
- Skip complexity assessment for complex tasks
- Leave placeholders in spec.md
- Make acceptance criteria vague ("should work properly")
- Override to SIMPLE workflow for multi-service features
DO:
- Let planner create subtasks (don't manually edit implementation_plan.json)
- Follow codebase patterns found during investigation
- Commit after each subtask completion
- Use recovery agent when stuck
- Run QA validation even if you think it's ready
DON'T:
- Implement multiple subtasks in one session
- Skip subtasks or change order
- Ignore QA feedback
- Commit without testing
- Manually mark subtasks as completed
DO:
- Be thorough (you're the last line of defense)
- Test edge cases, not just happy path
- Use E2E testing for frontend changes
- Verify all acceptance criteria
- Check for security vulnerabilities
DON'T:
- Approve without full validation
- Ignore console errors
- Skip regression testing
- Assume code works without verification
| Issue | Cause | Solution |
|---|---|---|
| Spec creation stuck in phase | Agent not responding to tool calls | Check Claude API status, resume with --resume |
| Implementation plan has no subtasks | Planner didn't break down task | Check spec.md clarity, re-run planner |
| Subtask marked completed but not working | Coder agent skipped verification | Run QA manually, report issue |
| QA loop exceeds max iterations | Recurring issue or bug | Escalate to human review |
| Recovery fails to resume | Recovery state corrupted | Delete .recovery/ folder, start fresh |
| Memory system returns no patterns | Graphiti not initialized | Check GRAPHITI_ENABLED=true |
| Electron MCP not available | App not running or port mismatch | Start app with --remote-debugging-port=9222 |
- Architecture Documentation: ARCHITECTURE.md - System architecture overview
- Spec Creation:
apps/backend/spec/pipeline/orchestrator.py- Spec orchestrator implementation - Implementation:
apps/backend/agents/coder.py- Coder agent orchestration - QA Loop:
apps/backend/qa/loop.py- QA validation loop - Memory System:
apps/backend/integrations/graphiti/- Graphiti memory integration - Agent Prompts:
apps/backend/prompts/- System prompts for all agent types - Security:
apps/backend/security/- Command validation and allowlists
Documentation generated: 2026-02-12