This directory contains a complete implementation guide for the Asynchronous Research pattern based on Simon Willison's approach to using fire-and-forget coding agents for autonomous code investigations.
Pattern Source: Simon Willison, "Code research projects with async coding agents", November 6, 2025
The Asynchronous Research pattern enables developers to conduct 2-3 autonomous code investigations per day with minimal human oversight. Fire research prompts at async coding agents, let them work autonomously for 10-30 minutes, then review results when complete.
Key Insight: "The great thing about questions about code is that they can often be definitively answered by writing and executing code." - Simon Willison
asynchronous-research/
├── README.md # This file
├── setup/
│ ├── create-research-repo.sh # Script to initialize research repository
│ ├── claude-code-config.md # Configure Claude Code for research
│ ├── gemini-jules-config.md # Configure Gemini Jules for research
│ └── platform-comparison.md # Compare async agent platforms (2025)
├── prompt-templates/
│ ├── performance-benchmark.md # Template for benchmark research
│ ├── feasibility-study.md # Template for "is this possible?"
│ ├── library-comparison.md # Template for comparative analysis
│ └── integration-proof.md # Template for "can A work with B?"
├── example-research-tasks/
│ ├── redis-notifications/ # Proof-of-concept example
│ ├── markdown-comparison/ # Simon's benchmark example
│ └── webassembly-compilation/ # Complex multi-step research
└── automation/
├── summarize-workflow.yml # GitHub Action to auto-index research
├── AGENTS.md.template # Template for agent guidance
└── result-parser.py # Extract findings from agent output
# Run the setup script
./setup/create-research-repo.sh my-research
# Or manually:
gh repo create my-research --private
cd my-research
# Initialize with README
cat > README.md << 'EOF'
# Research Repository
Autonomous code investigations using async coding agents.
## Pattern
This repository follows the Asynchronous Research pattern from
[AI Development Patterns](https://github.com/paulduvall/ai-development-patterns).
Each folder contains one research task with autonomous agent findings.
EOF
git add README.md
git commit -m "Initial commit"
git push -u origin mainChoose your platform and follow the configuration guide:
- Claude Code (web): See setup/claude-code-config.md
- Gemini Jules: See setup/gemini-jules-config.md
- Codex Cloud: Similar to Claude Code configuration
- GitHub Copilot: Follow GitHub Enterprise setup
Critical: Enable unrestricted network access for your research repo. This is safe because there are no production secrets.
Good research questions are:
- Specific and testable
- Answerable by writing and executing code
- Focused on a single investigation
- Deliverable with clear outputs (code, charts, reports)
Examples from Simon:
- "Could Redis Streams handle 10k concurrent notification subscribers?"
- "How do 7 Python Markdown libraries compare on performance?"
- "Can cmarkgfm C extension compile for Pyodide WebAssembly?"
Start with a template from prompt-templates/ and customize:
Performance Benchmark Template:
Work in new folder: [folder-name]/
[Describe system to build and test]
Measure and report:
- [Metric 1]
- [Metric 2]
- [Metric 3]
Produce [deliverables: code, charts, report].
Library Comparison Template:
Work in new folder: [folder-name]/
Compare these libraries: [lib1, lib2, lib3]
For each library:
1. [Analysis task]
2. [Performance test]
3. [Feature comparison]
Output: README.md with charts, raw data as JSON.
See prompt-templates/ for complete templates.
Submit your research prompt to your async agent platform:
- Submit prompt (2-3 paragraphs with clear deliverables)
- Agent works autonomously (10-30 minutes typically)
- Receive notification when PR/commit is ready
- Skim results for key insights
- Merge if useful, discard if not
Simon's workflow: "I'm firing off 2-3 code research projects a day right now. My own time commitment is minimal."
When the agent completes:
Do:
- Skim the report for key insights
- Verify the research question was answered
- Check if findings are actionable
- Look for unexpected discoveries
Don't:
- Audit code line-by-line (it's exploratory)
- Expect production-quality code
- Get blocked on minor issues
- Worry about "slop" (AI-generated content is expected)
Research tasks can reference each other:
# Research 1: Prove Node.js can run Pyodide
# Result: node-pyodide/
# Research 2: Build on finding
# Prompt: "Building on the node-pyodide example, compile cmarkgfm
# C extension for Pyodide and prove it works in Node.js"
# Result: cmarkgfm-in-pyodide/Create AGENTS.md in your repo with guidance for future runs:
# Agent Guidance
## Research Standards
- Create one folder per research task
- Include README.md with findings
- Generate charts for performance data
- Provide runnable code examples
## Tech Stack Preferences
- Python: Use pytest for tests
- JavaScript: Use Node.js 20+
- Charts: Use matplotlib or chart.jsResearch Question: "Can Redis Streams handle 10k concurrent notification subscribers?"
Prompt (see example-research-tasks/redis-notifications/prompt.md):
Work in new folder: redis-notifications/
Build Redis Streams notification system.
Simulate 10,000 users receiving 100 notifications/second.
Run for 1 hour and measure:
- Memory usage over time
- P95 latency for delivery
- Behavior under network partition
- Recovery after Redis restart
Produce Python code, charts, detailed markdown report.
Expected Deliverables:
publisher.py- Notification publishing scriptsubscriber.py- Subscriber simulationbenchmark.py- Performance measurementresults/- Charts and raw dataREADME.md- Findings and analysis
Research Question: "How do Python Markdown libraries compare on performance?"
Prompt (see example-research-tasks/markdown-comparison/prompt.md):
Simon's actual prompt:
Create a performance benchmark and feature comparison report on PyPI cmarkgfm
compared to other popular Python markdown libraries—check all of them out from
github and read the source to get an idea for features, then design and run a
benchmark including generating some charts, then create a report in a new
python-markdown-comparison folder. Make sure the performance chart images are
directly displayed in the README.md in the folder.
Results from Simon's research:
- Agent found 7 libraries automatically
- cmarkgfm was 10-52x faster (C implementation)
- Generated comparison charts
- Feature matrix from source analysis
Research Question: "Can cmarkgfm (C extension) run in Pyodide WebAssembly?"
Prompt (see example-research-tasks/webassembly-compilation/prompt.md):
Figure out how to get the cmarkgfm markdown library for Python working
in pyodide. This will be hard because it uses C so you will need to
compile it to pyodide compatible webassembly somehow.
Write a report on your results plus code to a new cmarkgfm-in-pyodide
directory. Test it using pytest to exercise a node.js test script that
calls pyodide.
Simon's approach when stuck:
"Complete this project, actually run emscripten, I do not care how long it takes"
Result: 88.4KB .whl file with working C extension in WebAssembly.
| Platform | Free Tier | Network Access | Setup Time | Best For |
|---|---|---|---|---|
| Claude Code (web) | $250 credits ($20/month users, until Nov 18) | Configurable | 5 min | Complex research, multi-file |
| Gemini Jules | Free tier | Default unrestricted | 2 min | Quick POCs, exploration |
| Codex Cloud | Pay-as-you-go | Configurable | 10 min | Enterprise workflows |
| GitHub Copilot Agent | Enterprise | Configurable | 15 min | GitHub integration |
See setup/platform-comparison.md for detailed comparison.
From Simon's 2 weeks of practice (Nov 2025):
- 13 projects in 2 weeks (~1/day average)
- 2-3 tasks/day at peak productivity
- Minutes per task of human time (prompt + review)
- 10-30 minutes typical agent execution
- Frequent useful results informing decisions
Research validates which options are technically feasible:
# Question: Should we use Redis or RabbitMQ for notifications?
# Research Task 1: Redis proof-of-concept
# Research Task 2: RabbitMQ proof-of-concept
# Compare results → Make informed choiceResearch provides real data for planning:
# Question: How long will Markdown rendering optimization take?
# Research: Benchmark current vs. alternatives
# Result: cmarkgfm is 10-52x faster
# Planning: 2-day task to integrate cmarkgfmArchive research for future sessions:
# Add to .claude/context.md or similar:
## Research Findings
### Markdown Performance (2025-11)
cmarkgfm is 10-52x faster than alternatives. Use for production.
See: research-repo/python-markdown-comparison/Run multiple research questions simultaneously:
# Fire 3 research tasks in parallel:
# 1. Redis vs RabbitMQ benchmarks
# 2. Authentication library comparison
# 3. Deployment strategy feasibility
# Review all results together → Make integrated decisionInstall GitHub Action to automatically update README with summaries:
# Copy workflow
cp automation/summarize-workflow.yml .github/workflows/
# Install dependencies for LLM summarization
# Uses: cog, llm, llm-github-models (see Simon's approach)See automation/summarize-workflow.yml for complete implementation.
Extract structured findings from agent output:
# Parse agent results for key metrics
python automation/result-parser.py research-folder/
# Output: JSON with performance data, conclusions, warningsProblem: Vague prompts without clear goals.
# Bad
"Investigate Redis and try some stuff"
# Good
"Build Redis Streams POC. Measure: memory, P95 latency, partition behavior."Problem: Publishing AI research directly to main docs without review.
Solution: Keep research in dedicated repo, mark as noindex.
Problem: Running unrestricted agents on repos with production secrets.
Solution: Only use unrestricted access on dedicated, secret-free research repos.
Problem: Expecting agents to prove something can't be done.
Simon's caveat: "They can't prove something is impossible—just because the coding agent couldn't find a way doesn't mean it can't be done."
Use for: Proving feasibility, not impossibility.
Simon's approach: Provide persistence prompts:
"Complete this project, actually run emscripten, I do not care how long it takes"
Check:
- Was the research question specific enough?
- Did you provide clear deliverables?
- Is the agent exploring the right approach?
Solution: Refine prompt with more constraints.
Solution: Break into smaller research tasks:
# Instead of: "Benchmark all Python libraries"
# Do: "Benchmark top 3 Python Markdown libraries"
# Then: "Benchmark next 4 Python Markdown libraries"Check:
- Is network access enabled in agent settings?
- Are you running in the correct repository?
- Is the repository truly secret-free?
- Simon Willison's Article: https://simonwillison.net/2025/Nov/6/async-code-research/
- Simon's Research Repo: https://github.com/simonw/research
- AI Development Patterns: Main README
- Related Patterns:
This example implementation is part of the AI Development Patterns project. The pattern itself is based on Simon Willison's public article and research repository.