Skip to content

RobThePCGuy/markdown-agent-studio

Repository files navigation

Markdown Agent Studio (MAS)

npm version CI License: MIT Node.js

Stop writing boilerplate Python. Stop wiring together visual spaghetti graphs. Start building AI teams that actually learn from their mistakes.

Markdown Agent Studio is a local-first, browser-based IDE for building self-improving AI agent systems. An agent is not a stateless API call - it is a living document that can work, remember, collaborate, and evolve.

Markdown Agent Studio Demo

The Problem

Every AI agent built on the same model starts with the same intelligence. The industry tries to differentiate them through prompt engineering (telling them what to be) and fine-tuning (showing them what others have done). Neither is actual learning. A prompted agent doesn't get better at writing stories by writing stories. It gets the same result every time, from the same static starting point.

Humans don't work this way. We learn by doing, failing, reflecting, and carrying that experience forward. AI agents have had no equivalent - until now.

What MAS Does

You give an agent a task. It runs, produces output, and reflects on what it did. On the next run, its memory from the previous session feeds back in. It sees what it tried, what fell flat, what worked. It spawns sub-agents to research or review. When context fills up, a summarizer compresses working memory into long-term knowledge - deduplicating what it already knows, preserving what's new.

Run after run, the agent's accumulated knowledge grows deeper and more refined. Not because a human engineered the right prompt, but because the agent earned its expertise through iterative practice.


Table of Contents


Why Markdown

Most agent tooling forces a choice: write code (powerful but inaccessible) or use a visual builder (accessible but opaque). Agents defined in Markdown sit in the middle. They're plain text files you can read, edit, version-control, and share. The YAML frontmatter configures behavior; the body is the system prompt. No framework lock-in, no proprietary format, no deployment step.

---
name: Story Writer
model: gemini-2.5-flash
safety_mode: balanced
reads: ["**"]
writes: [artifacts/**, memory/**]
permissions:
  spawn_agents: true
  web_access: true
autonomous:
  max_cycles: 20
  resume_mission: true
---

You are a story writer developing your craft through practice.
Read your memory for lessons from previous sessions before starting.
Write drafts to files. Reflect on what works and what doesn't.
Spawn a critic agent to review your output. Incorporate feedback.
Record what you learned to memory before finishing.

That file is the agent.

How the Learning Loop Works

  1. Run - The agent executes its task, using tools to research, write, and collaborate with sub-agents.
  2. Reflect - Before the session ends, the agent records what it accomplished, what failed, and what to try next.
  3. Compress - When context fills up, a summarizer distills working memory into long-term knowledge. Duplicates are discarded; new insights are preserved.
  4. Resume - On the next run, accumulated memory feeds back in. The agent picks up where it left off, building on everything it has learned.

Each cycle makes the agent more capable at its specific task. Not because the model changed, but because the agent's experiential knowledge grew.


Getting Started

MAS runs entirely locally. No backend infrastructure required.

Prerequisites

Setup

git clone https://github.com/RobThePCGuy/markdown-agent-studio.git
cd markdown-agent-studio
npm install
npm run dev

Open http://localhost:5173. Pick an agent, enter a prompt, click Run.

No API key? That's fine - the app ships with a scripted demo provider so you can explore the full experience first.

Provider Keys

cp .env.example .env.local

Add your provider keys (any or all):

VITE_GEMINI_API_KEY=your_key_here
VITE_OPENAI_API_KEY=your_key_here
VITE_ANTHROPIC_API_KEY=your_key_here

If no key is set, demo mode runs automatically. Select your provider and model in Settings.


The Sample Project

On first launch, a six-agent team is loaded to demonstrate multi-agent orchestration. The task: build a portfolio website from scratch.

Agent Role Safety Mode
Project Lead Plans the project, delegates to specialists, writes the final summary balanced
UX Researcher Searches the web for current design trends, writes research report safe
Designer Reads research findings, produces a design spec with tokens and layout balanced
HTML Developer Builds semantic HTML from the design spec safe
CSS Developer Creates responsive CSS with custom properties (works in parallel with HTML Dev) safe
QA Reviewer Audits HTML/CSS using a custom design_review tool, produces a scored report gloves_off

Hit Run with the Project Lead selected and watch the team coordinate: delegation, parallel execution, signaling, and consolidation - all visualized on the graph in real time.

The demo produces real output: site/index.html, site/styles.css, artifacts/design-spec.md, artifacts/qa-report.md, and artifacts/summary.md.


Agent File Reference

Agent files live in agents/*.md. The YAML frontmatter configures behavior; everything below the closing --- is the system prompt.

Frontmatter Schema

Field Type Default Description
name string required Display name for the agent
model string Settings default LLM model override (e.g. gemini-2.5-flash, gpt-4o, claude-sonnet-4-20250514)
safety_mode string gloves_off Permission tier: safe, balanced, or gloves_off
reads string[] mode default Glob patterns the agent can read (e.g. ["agents/**", "memory/**"])
writes string[] mode default Glob patterns the agent can write (e.g. ["artifacts/**"])
permissions object or string[] mode default Fine-grained permission overrides (see Safety Modes)
allowed_tools string[] all Whitelist of built-in tools this agent can use
blocked_tools string[] none Blacklist of built-in tools this agent cannot use
gloves_off_triggers string[] none Keywords in the mission prompt that auto-escalate to gloves_off
tools object[] none Custom tool definitions (see Custom Tools)
autonomous object none Autonomous cycle config (see Autonomous Mode)
mcp_servers object[] none MCP server connections (see MCP Server Integration)

Safety Modes

Every agent runs under one of three safety modes that control what it's allowed to do. Set safety_mode in the frontmatter, or let it default to gloves_off.

Permission safe balanced gloves_off
Spawn agents -
Edit agents - -
Delete files - -
Web access -
Signal parent
Custom tools -
Default reads agents/**, memory/**, artifacts/** ** **
Default writes memory/**, artifacts/** memory/**, artifacts/** **

Aliases: street maps to safe; autonomous and track map to gloves_off.

You can override individual permissions regardless of mode:

safety_mode: safe
permissions:
  web_access: true    # grant web access even in safe mode

Trigger-based escalation: If you set gloves_off_triggers, the agent automatically escalates to gloves_off when any trigger keyword appears in the mission prompt:

safety_mode: safe
gloves_off_triggers:
  - "delete"
  - "modify agents"
  - "full access"

Built-in Tools

Agents access tools based on their safety mode and permission settings. The full tool inventory:

File System

Tool Description
vfs_read Read file contents from the virtual file system
vfs_write Write or overwrite a file
vfs_list List files by directory prefix
vfs_delete Delete a file (requires delete permission)

Agent Orchestration

Tool Description
spawn_agent Create and queue a new agent for execution with a task
delegate Hand off a task to an existing agent with context
signal_parent Send a message back to the agent that spawned you

Web

Tool Description
web_search Search the web (uses provider's search API)
web_fetch Fetch and parse a URL's content

Memory

Tool Description
memory_write Write an entry to working memory with tags
memory_read Search working memory by query or tags

Knowledge Base

Tool Description
knowledge_query Semantic search across all agents' long-term memory (requires vector memory)
knowledge_contribute Add typed knowledge as tagged working memory (skill, fact, procedure, observation, mistake, preference)

Messaging

Tool Description
publish Broadcast a message to a named channel
subscribe Listen to a channel and check for pending messages

Shared State

Tool Description
blackboard_write Write a key-value pair visible to all agents in the current run
blackboard_read Read from the shared blackboard (omit key to list all)

Task Management (autonomous mode only)

Tool Description
task_queue_write Add, update, or remove tasks (actions: add, update, remove). Only registered during autonomous runs.
task_queue_read Query the task queue with filters (pending, in_progress, done, blocked, all). Only registered during autonomous runs.

Custom Tools

You can define custom tools in the agent's frontmatter. Each custom tool spawns a temporary sub-agent when invoked, with template variables substituted from the caller's arguments.

tools:
  - name: design_review
    description: Evaluate HTML and CSS against a design specification
    parameters:
      html_path:
        type: string
        description: Path to the HTML file
      css_path:
        type: string
        description: Path to the CSS file
    prompt: |
      Review the HTML at {{html_path}} and CSS at {{css_path}}.
      Score accessibility, responsiveness, performance, and design fidelity.
      Return a structured report with scores out of 100.
    model: gemini-2.5-flash          # optional: override model for this tool
    result_schema:                    # optional: guide output shape (not validated)
      type: object
      properties:
        overall_score:
          type: number
        breakdown:
          type: object

When an agent calls design_review, a temporary agent is created at agents/_custom_design_review_<timestamp>.md, runs the prompt with parameters injected, and returns the result to the caller.


Workflow Files

Workflow files live in workflows/*.md and define multi-step pipelines with dependency ordering.

---
name: Research Pipeline
steps:
  - id: research
    agent: agents/researcher.md
    prompt: "Research {topic}"
    outputs: [findings, sources]
  - id: synthesis
    agent: agents/synthesizer.md
    depends_on: [research]
    prompt: "Synthesize {research.findings} with sources from {research.sources}"
    outputs: [synthesis]
  - id: review
    agent: agents/reviewer.md
    depends_on: [synthesis]
    prompt: "Review {synthesis.synthesis} for accuracy"
---

Steps execute in topological order. Steps with no unmet dependencies run in parallel (controlled by the Workflow Parallel Steps setting, default 1). Circular dependencies are detected and rejected. Variables use {step_id.output_name} syntax for upstream data access.


Autonomous Mode

Autonomous mode runs an agent through multiple cycles, with memory carrying forward between each one.

autonomous:
  max_cycles: 20              # 1-1000, default: 10
  stop_when_complete: true    # stop early if agent assesses task is done
  resume_mission: true        # load previous mission state and continue
  seed_task_when_idle: true   # auto-generate follow-up tasks when queue empties
Option What it does
max_cycles Hard limit on iteration count. Each cycle is a full run → reflect → compress loop.
stop_when_complete The agent self-assesses completion after each cycle. If satisfied, it stops without exhausting all cycles.
resume_mission Loads the previous mission state from _mission_state_<agentId>.json in the VFS. The agent picks up where it left off with full task queue and cycle history.
seed_task_when_idle When the task queue empties mid-run, the agent generates continuation tasks to keep progressing. Prevents premature stops on open-ended missions.

Mission state - including task queue, cycle notes (last 12), and token totals - persists across browser sessions.


Memory System

Memory is what makes agents learn. The system operates in three layers.

Working memory is written during a run via memory_write. It holds observations, plans, and intermediate results scoped to the current session.

Post-run summarization happens automatically after each completed run. A summarizer agent reviews everything - files written, working memory entries, conversation history - and extracts structured memories typed as skill, fact, procedure, observation, mistake, or preference. Mistakes are prioritized because they prevent repeated failures.

Long-term memory stores extracted knowledge across runs. When new memories are consolidated with existing ones, the system operates in capacity tiers:

Tier Condition Behavior
Generous < 30% of budget used Freely add new memories; only skip exact duplicates
Selective 30-50% of budget Add only high-value knowledge; merge duplicates via UPDATE
Heavy cut > 50% of budget Aggressively compress; target 10-20% reduction; merge related memories

Each memory is tagged, timestamped, and access-counted. Frequently accessed memories are prioritized for retention. Vector memory (opt-in via Settings) enables semantic retrieval using Transformers.js embeddings backed by IndexedDB, so agents can find related knowledge even with different phrasing.

Shared memory is visible to all agents in a project. Private memory is scoped to a single agent.


Inter-Agent Communication

Agents coordinate through four communication primitives.

Signal parent - A spawned agent sends a message back to its creator when it finishes or needs attention. The simplest coordination pattern.

Pub/sub messaging - Agents publish messages to named channels and subscribe to receive them. Messages include timestamps and author IDs. Subscribers only receive messages published after their subscription, with acknowledgment tracking to prevent duplicates.

Blackboard - A shared key-value store visible to all agents in the current run. Useful for coordination flags, shared config, and status tracking between parallel agents. Cleared when the run ends.

Task queue (autonomous mode only) - A priority-based task list that survives across autonomous cycles. These tools are not part of the default built-in registry; they are injected when an agent runs in autonomous mode. Agents can add, update, and remove tasks with statuses (pending, in_progress, done, blocked). Lower priority numbers execute first.


Observability

MAS is designed to make agent thinking visible, not buried in terminal output.

Graph Visualization

The graph view shows agents as color-coded nodes connected by activity edges:

Node border color Meaning
Green (pulsing) Running
Cyan Completed
Yellow Paused
Orange Aborted
Red Error
Gray Idle

Activity nodes appear as agents work, colored by type: green for thinking, blue for web search, cyan for web fetch, orange for signals, yellow for spawns, purple for file system operations, and teal for tool calls.

The HUD overlay (top-left) shows live stats: agent count, running/thinking/web activity counts, spawns, signals, and errors. Total token consumption is shown in workflow mode.

Inspector Panel

Three tabs on the right side:

  • Chat - Streaming output from the selected agent's session, with session picker for multi-session agents.
  • Events - Timeline of all events (activation, tool calls, file changes, spawns, signals, errors, workflow steps, MCP connections, pub/sub, blackboard operations). Includes checkpoint restore and replay controls.
  • Memory - Working memory entries (current run), long-term memories (cross-run), and shared knowledge.

A policy banner above the tabs shows the selected agent's safety mode and permissions at a glance.

Run Timeline

A horizontal timeline below the graph shows the duration and overlap of all agent activations. Each bar represents one agent's execution, colored by agent identity.

Audio Feedback

MAS includes sonification - distinct audio signals for agent events so you can hear your system working even while focused elsewhere. Spawn triggers a rising chime, tool calls get a soft click, signals are a double blip, completion plays a C-E-G chord, and errors sound a warning tone with vibrato. Toggle with MUTE in the top bar.


Agent Templates

The template picker offers seven starting points:

Template Description
Blank Agent Minimal skeleton - empty sections with default permissions
Autonomous Learner Persistent multi-cycle missions with task queue and memory
Researcher Web search and sub-agent delegation for deep investigation
Writer Safe-mode agent that reads artifacts and writes refined prose
Orchestrator Gloves-off coordinator that breaks tasks into sub-agent work
Critic Safe-mode reviewer that reads output and signals feedback
Tool Builder Demonstrates custom tool definitions with parameters and prompts

You can also save any agent as a template with Save as Template, and create new agents from saved templates. User templates are stored in templates/*.md.


Configuration

Settings Reference

Open Settings (⚙ in the top bar) to configure:

API - Provider (Gemini, Anthropic, OpenAI), API key, model selection.

Kernel limits - Max Concurrency (1-10, default 3), Max Depth (1-20, default 5), Max Fanout (1-20, default 5), Token Budget (default 250,000), Workflow Parallel Steps (1-10, default 1).

Agent persistence - Min Turns Before Stop (0-25, default 5), Force Reflection (auto-inject reflection prompt), Auto-Record Failures (write tool failures to memory).

Memory - Enable Memory, Use Vector Memory (LanceDB + embeddings vs JSON-based), Memory Token Budget (500-8000, default 2000).

Autonomous defaults - Default Max Cycles, Resume Previous Mission, Stop When Complete, Seed Continuation Tasks.

Danger zone - Reset to Sample Project, Clear Workspace.


Keyboard Shortcuts

Shortcut Action
Ctrl/Cmd+K Command palette
Ctrl/Cmd+Enter Run once
Ctrl/Cmd+Shift+Enter Run autonomous
Ctrl/Cmd+Shift+P Pause / resume
Ctrl/Cmd+Shift+K Kill all
Ctrl/Cmd+Shift+L Focus prompt box

The command palette supports scope prefixes: agent:, file:, action:, nav:.


MCP Server Integration

Connect external tools via the Model Context Protocol. Supported transports: http, sse, and stdio.

Configure in agent frontmatter:

mcp_servers:
  - name: docs
    transport: http
    url: http://localhost:3000/mcp
  - name: local-tools
    transport: sse
    url: http://localhost:3001/sse
  - name: cli-tools
    transport: stdio
    command: npx
    args: [my-mcp-server]
    gatewayUrl: http://localhost:3002/mcp

Stdio servers can't run directly in the browser. Use gatewayUrl to point to an HTTP bridge that wraps the stdio process. MCP tools are dynamically registered and appear alongside built-in tools.


Running as an npm Package

npx markdown-agent-studio

Or install globally:

npm install -g markdown-agent-studio
markdown-agent-studio

Or import the dist path programmatically:

import distPath from 'markdown-agent-studio';

Options: --port 4173, --host 127.0.0.1, --no-open


Architecture

src/
├── core/           Execution engine: kernel, providers, memory, summarizer,
│                   autonomous runner, workflow engine, MCP client, plugins
├── stores/         Zustand state: sessions, VFS, memory, events, pub/sub,
│                   blackboard, task queue, project metadata
├── components/     React UI: graph visualization, Monaco editor, inspector,
│                   workspace explorer, command palette, settings
├── hooks/          React hooks: useKernel, useGraphData, useOnboarding
├── types/          TypeScript definitions: agent, session, memory, events
├── utils/          Helpers: agent parser, validator, templates, diff engine
└── styles/         CSS modules

Tech stack: React, TypeScript, Vite, Zustand, React Flow, Monaco Editor, MCP SDK, Transformers.js, IndexedDB.


Development

npm run dev          # local dev server
npm run lint         # lint checks
npm test             # test suite (52 test files)
npm run build        # typecheck + production build
npm run check:all    # lint + test + build + bundle guard

CI runs lint, tests, build, bundle-size guard, and npm dry-run on every push and PR.

Release

npm run release:patch
npm run release:minor
npm run release:major

Troubleshooting

Problem Solution
App does not start Confirm Node version is 20.19+ with node -v
No AI responses Add your API key to .env.local and select the matching provider in Settings
Demo mode won't activate Clear browser storage and reload - the sample project loads on first visit
MCP stdio server unavailable Stdio can't run in the browser directly; configure a gatewayUrl HTTP bridge
Slow first vector search Expected - the embedding model downloads and warms up on first use
Agent can't write files Check writes patterns and safety_mode permissions in the agent frontmatter
Workflow steps won't parallelize Increase Workflow Parallel Steps in Settings (default is 1 = sequential)
Agent stops too early Increase Min Turns Before Stop in Settings or set stop_when_complete: false

License

MIT © RobThePCGuy

About

Markdown Agent Studio is a workspace where agents develop specialization through their own experience.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors