Stop writing boilerplate Python. Stop wiring together visual spaghetti graphs. Start building AI teams that actually learn from their mistakes.
Markdown Agent Studio is a local-first, browser-based IDE for building self-improving AI agent systems. An agent is not a stateless API call - it is a living document that can work, remember, collaborate, and evolve.
Every AI agent built on the same model starts with the same intelligence. The industry tries to differentiate them through prompt engineering (telling them what to be) and fine-tuning (showing them what others have done). Neither is actual learning. A prompted agent doesn't get better at writing stories by writing stories. It gets the same result every time, from the same static starting point.
Humans don't work this way. We learn by doing, failing, reflecting, and carrying that experience forward. AI agents have had no equivalent - until now.
You give an agent a task. It runs, produces output, and reflects on what it did. On the next run, its memory from the previous session feeds back in. It sees what it tried, what fell flat, what worked. It spawns sub-agents to research or review. When context fills up, a summarizer compresses working memory into long-term knowledge - deduplicating what it already knows, preserving what's new.
Run after run, the agent's accumulated knowledge grows deeper and more refined. Not because a human engineered the right prompt, but because the agent earned its expertise through iterative practice.
- Why Markdown
- How the Learning Loop Works
- Getting Started
- The Sample Project
- Agent File Reference
- Workflow Files
- Autonomous Mode
- Memory System
- Inter-Agent Communication
- Observability
- Agent Templates
- Configuration
- Keyboard Shortcuts
- MCP Server Integration
- Running as an npm Package
- Architecture
- Development
- Troubleshooting
- License
Most agent tooling forces a choice: write code (powerful but inaccessible) or use a visual builder (accessible but opaque). Agents defined in Markdown sit in the middle. They're plain text files you can read, edit, version-control, and share. The YAML frontmatter configures behavior; the body is the system prompt. No framework lock-in, no proprietary format, no deployment step.
---
name: Story Writer
model: gemini-2.5-flash
safety_mode: balanced
reads: ["**"]
writes: [artifacts/**, memory/**]
permissions:
spawn_agents: true
web_access: true
autonomous:
max_cycles: 20
resume_mission: true
---
You are a story writer developing your craft through practice.
Read your memory for lessons from previous sessions before starting.
Write drafts to files. Reflect on what works and what doesn't.
Spawn a critic agent to review your output. Incorporate feedback.
Record what you learned to memory before finishing.That file is the agent.
- Run - The agent executes its task, using tools to research, write, and collaborate with sub-agents.
- Reflect - Before the session ends, the agent records what it accomplished, what failed, and what to try next.
- Compress - When context fills up, a summarizer distills working memory into long-term knowledge. Duplicates are discarded; new insights are preserved.
- Resume - On the next run, accumulated memory feeds back in. The agent picks up where it left off, building on everything it has learned.
Each cycle makes the agent more capable at its specific task. Not because the model changed, but because the agent's experiential knowledge grew.
MAS runs entirely locally. No backend infrastructure required.
- Git: https://git-scm.com/downloads
- Node.js 20.19+: https://nodejs.org/
git clone https://github.com/RobThePCGuy/markdown-agent-studio.git
cd markdown-agent-studio
npm install
npm run devOpen http://localhost:5173. Pick an agent, enter a prompt, click Run.
No API key? That's fine - the app ships with a scripted demo provider so you can explore the full experience first.
cp .env.example .env.localAdd your provider keys (any or all):
VITE_GEMINI_API_KEY=your_key_here
VITE_OPENAI_API_KEY=your_key_here
VITE_ANTHROPIC_API_KEY=your_key_hereIf no key is set, demo mode runs automatically. Select your provider and model in Settings.
On first launch, a six-agent team is loaded to demonstrate multi-agent orchestration. The task: build a portfolio website from scratch.
| Agent | Role | Safety Mode |
|---|---|---|
| Project Lead | Plans the project, delegates to specialists, writes the final summary | balanced |
| UX Researcher | Searches the web for current design trends, writes research report | safe |
| Designer | Reads research findings, produces a design spec with tokens and layout | balanced |
| HTML Developer | Builds semantic HTML from the design spec | safe |
| CSS Developer | Creates responsive CSS with custom properties (works in parallel with HTML Dev) | safe |
| QA Reviewer | Audits HTML/CSS using a custom design_review tool, produces a scored report |
gloves_off |
Hit Run with the Project Lead selected and watch the team coordinate: delegation, parallel execution, signaling, and consolidation - all visualized on the graph in real time.
The demo produces real output: site/index.html, site/styles.css, artifacts/design-spec.md, artifacts/qa-report.md, and artifacts/summary.md.
Agent files live in agents/*.md. The YAML frontmatter configures behavior; everything below the closing --- is the system prompt.
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | required | Display name for the agent |
model |
string | Settings default | LLM model override (e.g. gemini-2.5-flash, gpt-4o, claude-sonnet-4-20250514) |
safety_mode |
string | gloves_off |
Permission tier: safe, balanced, or gloves_off |
reads |
string[] | mode default | Glob patterns the agent can read (e.g. ["agents/**", "memory/**"]) |
writes |
string[] | mode default | Glob patterns the agent can write (e.g. ["artifacts/**"]) |
permissions |
object or string[] | mode default | Fine-grained permission overrides (see Safety Modes) |
allowed_tools |
string[] | all | Whitelist of built-in tools this agent can use |
blocked_tools |
string[] | none | Blacklist of built-in tools this agent cannot use |
gloves_off_triggers |
string[] | none | Keywords in the mission prompt that auto-escalate to gloves_off |
tools |
object[] | none | Custom tool definitions (see Custom Tools) |
autonomous |
object | none | Autonomous cycle config (see Autonomous Mode) |
mcp_servers |
object[] | none | MCP server connections (see MCP Server Integration) |
Every agent runs under one of three safety modes that control what it's allowed to do. Set safety_mode in the frontmatter, or let it default to gloves_off.
| Permission | safe |
balanced |
gloves_off |
|---|---|---|---|
| Spawn agents | - | ✓ | ✓ |
| Edit agents | - | - | ✓ |
| Delete files | - | - | ✓ |
| Web access | - | ✓ | ✓ |
| Signal parent | ✓ | ✓ | ✓ |
| Custom tools | - | ✓ | ✓ |
| Default reads | agents/**, memory/**, artifacts/** |
** |
** |
| Default writes | memory/**, artifacts/** |
memory/**, artifacts/** |
** |
Aliases: street maps to safe; autonomous and track map to gloves_off.
You can override individual permissions regardless of mode:
safety_mode: safe
permissions:
web_access: true # grant web access even in safe modeTrigger-based escalation: If you set gloves_off_triggers, the agent automatically escalates to gloves_off when any trigger keyword appears in the mission prompt:
safety_mode: safe
gloves_off_triggers:
- "delete"
- "modify agents"
- "full access"Agents access tools based on their safety mode and permission settings. The full tool inventory:
File System
| Tool | Description |
|---|---|
vfs_read |
Read file contents from the virtual file system |
vfs_write |
Write or overwrite a file |
vfs_list |
List files by directory prefix |
vfs_delete |
Delete a file (requires delete permission) |
Agent Orchestration
| Tool | Description |
|---|---|
spawn_agent |
Create and queue a new agent for execution with a task |
delegate |
Hand off a task to an existing agent with context |
signal_parent |
Send a message back to the agent that spawned you |
Web
| Tool | Description |
|---|---|
web_search |
Search the web (uses provider's search API) |
web_fetch |
Fetch and parse a URL's content |
Memory
| Tool | Description |
|---|---|
memory_write |
Write an entry to working memory with tags |
memory_read |
Search working memory by query or tags |
Knowledge Base
| Tool | Description |
|---|---|
knowledge_query |
Semantic search across all agents' long-term memory (requires vector memory) |
knowledge_contribute |
Add typed knowledge as tagged working memory (skill, fact, procedure, observation, mistake, preference) |
Messaging
| Tool | Description |
|---|---|
publish |
Broadcast a message to a named channel |
subscribe |
Listen to a channel and check for pending messages |
Shared State
| Tool | Description |
|---|---|
blackboard_write |
Write a key-value pair visible to all agents in the current run |
blackboard_read |
Read from the shared blackboard (omit key to list all) |
Task Management (autonomous mode only)
| Tool | Description |
|---|---|
task_queue_write |
Add, update, or remove tasks (actions: add, update, remove). Only registered during autonomous runs. |
task_queue_read |
Query the task queue with filters (pending, in_progress, done, blocked, all). Only registered during autonomous runs. |
You can define custom tools in the agent's frontmatter. Each custom tool spawns a temporary sub-agent when invoked, with template variables substituted from the caller's arguments.
tools:
- name: design_review
description: Evaluate HTML and CSS against a design specification
parameters:
html_path:
type: string
description: Path to the HTML file
css_path:
type: string
description: Path to the CSS file
prompt: |
Review the HTML at {{html_path}} and CSS at {{css_path}}.
Score accessibility, responsiveness, performance, and design fidelity.
Return a structured report with scores out of 100.
model: gemini-2.5-flash # optional: override model for this tool
result_schema: # optional: guide output shape (not validated)
type: object
properties:
overall_score:
type: number
breakdown:
type: objectWhen an agent calls design_review, a temporary agent is created at agents/_custom_design_review_<timestamp>.md, runs the prompt with parameters injected, and returns the result to the caller.
Workflow files live in workflows/*.md and define multi-step pipelines with dependency ordering.
---
name: Research Pipeline
steps:
- id: research
agent: agents/researcher.md
prompt: "Research {topic}"
outputs: [findings, sources]
- id: synthesis
agent: agents/synthesizer.md
depends_on: [research]
prompt: "Synthesize {research.findings} with sources from {research.sources}"
outputs: [synthesis]
- id: review
agent: agents/reviewer.md
depends_on: [synthesis]
prompt: "Review {synthesis.synthesis} for accuracy"
---Steps execute in topological order. Steps with no unmet dependencies run in parallel (controlled by the Workflow Parallel Steps setting, default 1). Circular dependencies are detected and rejected. Variables use {step_id.output_name} syntax for upstream data access.
Autonomous mode runs an agent through multiple cycles, with memory carrying forward between each one.
autonomous:
max_cycles: 20 # 1-1000, default: 10
stop_when_complete: true # stop early if agent assesses task is done
resume_mission: true # load previous mission state and continue
seed_task_when_idle: true # auto-generate follow-up tasks when queue empties| Option | What it does |
|---|---|
max_cycles |
Hard limit on iteration count. Each cycle is a full run → reflect → compress loop. |
stop_when_complete |
The agent self-assesses completion after each cycle. If satisfied, it stops without exhausting all cycles. |
resume_mission |
Loads the previous mission state from _mission_state_<agentId>.json in the VFS. The agent picks up where it left off with full task queue and cycle history. |
seed_task_when_idle |
When the task queue empties mid-run, the agent generates continuation tasks to keep progressing. Prevents premature stops on open-ended missions. |
Mission state - including task queue, cycle notes (last 12), and token totals - persists across browser sessions.
Memory is what makes agents learn. The system operates in three layers.
Working memory is written during a run via memory_write. It holds observations, plans, and intermediate results scoped to the current session.
Post-run summarization happens automatically after each completed run. A summarizer agent reviews everything - files written, working memory entries, conversation history - and extracts structured memories typed as skill, fact, procedure, observation, mistake, or preference. Mistakes are prioritized because they prevent repeated failures.
Long-term memory stores extracted knowledge across runs. When new memories are consolidated with existing ones, the system operates in capacity tiers:
| Tier | Condition | Behavior |
|---|---|---|
| Generous | < 30% of budget used | Freely add new memories; only skip exact duplicates |
| Selective | 30-50% of budget | Add only high-value knowledge; merge duplicates via UPDATE |
| Heavy cut | > 50% of budget | Aggressively compress; target 10-20% reduction; merge related memories |
Each memory is tagged, timestamped, and access-counted. Frequently accessed memories are prioritized for retention. Vector memory (opt-in via Settings) enables semantic retrieval using Transformers.js embeddings backed by IndexedDB, so agents can find related knowledge even with different phrasing.
Shared memory is visible to all agents in a project. Private memory is scoped to a single agent.
Agents coordinate through four communication primitives.
Signal parent - A spawned agent sends a message back to its creator when it finishes or needs attention. The simplest coordination pattern.
Pub/sub messaging - Agents publish messages to named channels and subscribe to receive them. Messages include timestamps and author IDs. Subscribers only receive messages published after their subscription, with acknowledgment tracking to prevent duplicates.
Blackboard - A shared key-value store visible to all agents in the current run. Useful for coordination flags, shared config, and status tracking between parallel agents. Cleared when the run ends.
Task queue (autonomous mode only) - A priority-based task list that survives across autonomous cycles. These tools are not part of the default built-in registry; they are injected when an agent runs in autonomous mode. Agents can add, update, and remove tasks with statuses (pending, in_progress, done, blocked). Lower priority numbers execute first.
MAS is designed to make agent thinking visible, not buried in terminal output.
The graph view shows agents as color-coded nodes connected by activity edges:
| Node border color | Meaning |
|---|---|
| Green (pulsing) | Running |
| Cyan | Completed |
| Yellow | Paused |
| Orange | Aborted |
| Red | Error |
| Gray | Idle |
Activity nodes appear as agents work, colored by type: green for thinking, blue for web search, cyan for web fetch, orange for signals, yellow for spawns, purple for file system operations, and teal for tool calls.
The HUD overlay (top-left) shows live stats: agent count, running/thinking/web activity counts, spawns, signals, and errors. Total token consumption is shown in workflow mode.
Three tabs on the right side:
- Chat - Streaming output from the selected agent's session, with session picker for multi-session agents.
- Events - Timeline of all events (activation, tool calls, file changes, spawns, signals, errors, workflow steps, MCP connections, pub/sub, blackboard operations). Includes checkpoint restore and replay controls.
- Memory - Working memory entries (current run), long-term memories (cross-run), and shared knowledge.
A policy banner above the tabs shows the selected agent's safety mode and permissions at a glance.
A horizontal timeline below the graph shows the duration and overlap of all agent activations. Each bar represents one agent's execution, colored by agent identity.
MAS includes sonification - distinct audio signals for agent events so you can hear your system working even while focused elsewhere. Spawn triggers a rising chime, tool calls get a soft click, signals are a double blip, completion plays a C-E-G chord, and errors sound a warning tone with vibrato. Toggle with MUTE in the top bar.
The template picker offers seven starting points:
| Template | Description |
|---|---|
| Blank Agent | Minimal skeleton - empty sections with default permissions |
| Autonomous Learner | Persistent multi-cycle missions with task queue and memory |
| Researcher | Web search and sub-agent delegation for deep investigation |
| Writer | Safe-mode agent that reads artifacts and writes refined prose |
| Orchestrator | Gloves-off coordinator that breaks tasks into sub-agent work |
| Critic | Safe-mode reviewer that reads output and signals feedback |
| Tool Builder | Demonstrates custom tool definitions with parameters and prompts |
You can also save any agent as a template with Save as Template, and create new agents from saved templates. User templates are stored in templates/*.md.
Open Settings (⚙ in the top bar) to configure:
API - Provider (Gemini, Anthropic, OpenAI), API key, model selection.
Kernel limits - Max Concurrency (1-10, default 3), Max Depth (1-20, default 5), Max Fanout (1-20, default 5), Token Budget (default 250,000), Workflow Parallel Steps (1-10, default 1).
Agent persistence - Min Turns Before Stop (0-25, default 5), Force Reflection (auto-inject reflection prompt), Auto-Record Failures (write tool failures to memory).
Memory - Enable Memory, Use Vector Memory (LanceDB + embeddings vs JSON-based), Memory Token Budget (500-8000, default 2000).
Autonomous defaults - Default Max Cycles, Resume Previous Mission, Stop When Complete, Seed Continuation Tasks.
Danger zone - Reset to Sample Project, Clear Workspace.
| Shortcut | Action |
|---|---|
Ctrl/Cmd+K |
Command palette |
Ctrl/Cmd+Enter |
Run once |
Ctrl/Cmd+Shift+Enter |
Run autonomous |
Ctrl/Cmd+Shift+P |
Pause / resume |
Ctrl/Cmd+Shift+K |
Kill all |
Ctrl/Cmd+Shift+L |
Focus prompt box |
The command palette supports scope prefixes: agent:, file:, action:, nav:.
Connect external tools via the Model Context Protocol. Supported transports: http, sse, and stdio.
Configure in agent frontmatter:
mcp_servers:
- name: docs
transport: http
url: http://localhost:3000/mcp
- name: local-tools
transport: sse
url: http://localhost:3001/sse
- name: cli-tools
transport: stdio
command: npx
args: [my-mcp-server]
gatewayUrl: http://localhost:3002/mcpStdio servers can't run directly in the browser. Use gatewayUrl to point to an HTTP bridge that wraps the stdio process. MCP tools are dynamically registered and appear alongside built-in tools.
npx markdown-agent-studioOr install globally:
npm install -g markdown-agent-studio
markdown-agent-studioOr import the dist path programmatically:
import distPath from 'markdown-agent-studio';Options: --port 4173, --host 127.0.0.1, --no-open
src/
├── core/ Execution engine: kernel, providers, memory, summarizer,
│ autonomous runner, workflow engine, MCP client, plugins
├── stores/ Zustand state: sessions, VFS, memory, events, pub/sub,
│ blackboard, task queue, project metadata
├── components/ React UI: graph visualization, Monaco editor, inspector,
│ workspace explorer, command palette, settings
├── hooks/ React hooks: useKernel, useGraphData, useOnboarding
├── types/ TypeScript definitions: agent, session, memory, events
├── utils/ Helpers: agent parser, validator, templates, diff engine
└── styles/ CSS modules
Tech stack: React, TypeScript, Vite, Zustand, React Flow, Monaco Editor, MCP SDK, Transformers.js, IndexedDB.
npm run dev # local dev server
npm run lint # lint checks
npm test # test suite (52 test files)
npm run build # typecheck + production build
npm run check:all # lint + test + build + bundle guardCI runs lint, tests, build, bundle-size guard, and npm dry-run on every push and PR.
npm run release:patch
npm run release:minor
npm run release:major| Problem | Solution |
|---|---|
| App does not start | Confirm Node version is 20.19+ with node -v |
| No AI responses | Add your API key to .env.local and select the matching provider in Settings |
| Demo mode won't activate | Clear browser storage and reload - the sample project loads on first visit |
| MCP stdio server unavailable | Stdio can't run in the browser directly; configure a gatewayUrl HTTP bridge |
| Slow first vector search | Expected - the embedding model downloads and warms up on first use |
| Agent can't write files | Check writes patterns and safety_mode permissions in the agent frontmatter |
| Workflow steps won't parallelize | Increase Workflow Parallel Steps in Settings (default is 1 = sequential) |
| Agent stops too early | Increase Min Turns Before Stop in Settings or set stop_when_complete: false |
MIT © RobThePCGuy
