A C++23 orchestrator that intelligently combines Claude (Anthropic API) and local models (ollama) — as an interactive CLI, pipe tool, and MCP server for Claude Code.
Anyone working with AI assistants daily quickly runs into a dilemma: simple tasks like "write a docstring" or "explain this snippet" don't really need a cloud API — but they still cost money and add latency. Complex questions about architecture, security, or large codebases, on the other hand, benefit enormously from Claude's full context window and reasoning power.
aiorch solves this with an orchestrator pattern:
Simple task → local model (ollama, free, fast)
Complex task → Claude API (powerful, context-rich)
A scoring-based router makes the routing decision automatically — based on token count, keywords, and context signals. The result: significantly lower API costs while maintaining full quality for complex tasks.
- Intelligent routing — scoring-based (token count, keywords, context signals, question marks, code blocks)
- Interactive REPL — with linenoise, history (
~/.aiorch_history), ANSI colors and REPL commands - Pipe mode — for shell scripting and CI integration
- MCP server — direct integration as a tool in Claude Code
- Automatic fallback — ollama unreachable → transparent switch to Claude
- Session statistics — local/remote distribution, estimated API cost savings
- Self-test — checks all backends and credentials at once
- Sliding-window context — max. 50 messages, system message always preserved
- Timeouts — ollama 30 s, Anthropic 120 s, no process hang
- Unit tests — 12 tests with doctest (Router, Config, Context)
| Component | Version |
|---|---|
| Compiler | clang++ 18+ or g++ 13+ (C++23) |
| CMake | 3.25+ |
| OpenSSL | 3.0+ |
| ollama | any recent version |
| ollama model | qwen2.5-coder:14b (recommended) or any other |
| Anthropic API key | sk-ant-api03-... (from console.anthropic.com) |
Dependencies are fetched automatically via CMake FetchContent:
- cpp-httplib — header-only HTTP/HTTPS
- nlohmann/json — header-only JSON
- linenoise — REPL/readline
- doctest — unit tests
git clone https://github.com/<your-user>/aiorch.git
cd aiorch
# Debug build (development)
cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
# Release build (production)
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
# Smoke test
./build/aiorch --versioncmake --build build --target tests
ctest --test-dir build --output-on-failureExpected output:
Test project /home/malo/src/aiorch/build
Start 1: aiorch_tests
1/1 Test #1: aiorch_tests ........... Passed 0.12 sec
100% tests passed, 0 tests failed out of 1
# Install to ~/.local/bin/aiorch
cmake --install build --prefix ~/.local
# Make sure ~/.local/bin is in your PATH
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
# Verify
aiorch --versionGet your API key from console.anthropic.com (starts with sk-ant-api03-...).
mkdir -p ~/.claude
cat > ~/.claude/aiorch.json << 'EOF'
{
"apikey": "sk-ant-api03-YOUR-KEY-HERE"
}
EOF
chmod 600 ~/.claude/aiorch.jsonKey loading order (first found wins):
~/.claude/aiorch.json→ field"apikey"(primary — real API key)~/.claude/.credentials.json→ field"apiKey"(Claude Code login, usually OAuth token)- Environment variable
ANTHROPIC_API_KEY
aiorch works out of the box without this file — it starts with sensible defaults
(ollama at http://localhost:11434, model qwen2.5-coder:14b, etc.). Create
~/.aiorch.conf only if you want to override those defaults; it is never
generated automatically.
cat > ~/.aiorch.conf << 'EOF'
# aiorch configuration
ollama_endpoint = http://localhost:11434
ollama_model = qwen2.5-coder:14b
anthropic_model = claude-sonnet-4-5
context_threshold = 2000
history_file = ~/.aiorch_history
log_level = info # debug | info | warn | error
EOFLoading order:
~/.aiorch.conf → environment variables → CLI arguments
Use AIORCH_CONFIG to point to a different config file:
AIORCH_CONFIG=/path/to/my.conf aiorch# Install ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the recommended model
ollama pull qwen2.5-coder:14b
# Start ollama (runs as a background service)
ollama serveaiorchaiorch> explain this snippet
[Response streams token by token]
(local: qwen2.5-coder:14b)
aiorch> What is the architecture behind this design pattern?
[Response streams token by token]
(remote: claude-sonnet-4-5)
aiorch> /exit
REPL commands:
| Command | Function |
|---|---|
/clear |
Reset context |
/history |
Show conversation history |
/backend |
Show routing decision for last input |
/model |
Show currently active model |
/exit |
Quit (also Ctrl+D, Ctrl+C) |
# Simple query
echo "Write a docstring for this function: int add(int a, int b)" | aiorch --pipe
# Analyze file contents
cat myfile.cpp | aiorch --pipe
# In shell scripts
RESULT=$(echo "explain this code" | aiorch --pipe --local)
# Exit code: 0 = success, 1 = erroraiorch --local # always ollama, no remote routing
aiorch --remote # always Claude API, no local routingaiorch --model llama3.2 # different ollama model for this session
aiorch --model llama3.2 --local # combined with local overrideaiorch --selftestaiorch selftest
───────────────────────────────────────────
Credentials ~/.claude/aiorch.json OK
ollama http://localhost:11434 OK
Anthropic API OK
───────────────────────────────────────────
All checks passed. Exit code: 0
The router uses a scoring system — both sides accumulate points, the higher score wins. On a tie, Remote wins (safer default).
| Criterion | Points |
|---|---|
| Token count < 500 | +2 local |
| Token count 500–2000 | +1 local |
| Token count > 2000 | +2 remote |
Local keyword found (docstring, explain, snippet, format, rename, …) |
+2 local |
Remote keyword found (architecture, security, refactor, why, design, …) |
+2 remote |
Multiple files in context (--- separator detected) |
+1 remote |
| Question mark in prompt | +1 remote |
Prompt ends with code block (```) |
+1 local |
Statistics are printed at the end of every session — after /exit in the REPL or at the end of a pipe run:
─────────────────────────────────────────
Session Statistics
Total requests: 42
→ local (ollama): 31 (73 %)
→ remote (Claude): 11 (27 %)
Estimated API savings: ~0.09 USD
─────────────────────────────────────────
The cost estimate is based on the average price per request for claude-sonnet-4-5 and shows how much the locally handled requests saved in API costs.
aiorch can be registered as an MCP server directly inside Claude Code:
# Register once
claude mcp add aiorch -- ~/.local/bin/aiorch --mcp
# Verify connection
claude mcp list
# aiorch: /home/malo/.local/bin/aiorch --mcp - ✓ ConnectedThree tools are then available inside Claude Code:
| Tool | Function |
|---|---|
local_complete |
Send a prompt directly to ollama and return the response |
route_query |
Determine the routing decision for a given prompt |
clear_context |
Reset the orchestrator's context |
MCP logs are written to ~/.aiorch_mcp.log (never to stderr — that is reserved for JSON-RPC).
aiorch/
├── CMakeLists.txt
├── docs/
│ └── cpp_orchestrator_architecture_en.svg
├── src/
│ ├── main.cpp — CLI entry point, REPL, pipe mode, MCP server
│ ├── config.hpp / .cpp — configuration, key loading order
│ ├── context_manager.hpp/.cpp — message history, sliding window, token estimation
│ ├── router.hpp / .cpp — scoring-based task router
│ ├── anthropic_client.hpp/.cpp — Anthropic API, HTTPS, SSE streaming
│ └── ollama_client.hpp/.cpp — ollama REST API, HTTP streaming
└── tests/
├── test_router.cpp — 5 router tests (scoring, keywords, tie-break)
├── test_config.cpp — 3 config tests (defaults, file, env vars)
└── test_context.cpp — 4 context tests (tokens, sliding window, clear)
Similar approaches exist — here is how aiorch differs from each of them.
| Project | Approach | Key Difference |
|---|---|---|
| ollama-prompt | A tool for Claude Code that lets Claude explicitly delegate subtasks to ollama | Not an orchestrator — no routing logic, no automatic decision. Claude decides manually via tool call. |
| MCP Server ollama-claude | An MCP server that forwards all requests to a local ollama instance | No routing — everything goes local regardless of complexity. No fallback to Claude. |
| CliGate | A local proxy that intercepts Claude Code requests and redirects them to ollama | Proxy-based, rule-configured, no scoring. Typically requires Node.js or Python runtime. |
1. Scoring-based routing — not a fixed rule
The others use static rules ("always local" or "always remote"). aiorch weighs multiple signals simultaneously — token count, keyword categories, context size, code blocks, question marks — and produces a score for each side. The higher score wins.
2. Native C++23 — no runtime dependency
ollama-prompt and CliGate are typically Node.js or Python based. aiorch compiles to a single self-contained binary with no interpreter, no npm install, no virtual environment.
3. One binary — three modes
aiorch # interactive REPL
aiorch --pipe # scripting / CI
aiorch --mcp # MCP server for Claude CodeNo other tool in this space combines all three in a single binary.
4. Automatic fallback
If ollama is unreachable, aiorch silently falls back to Claude and prints a warning on stderr. None of the alternatives handle this gracefully — they either crash or return an error.
5. Observability built in
Session statistics, a self-test flag (--selftest), MCP request logging to ~/.aiorch_mcp.log, and a unit test suite are part of the project from the start — not afterthoughts.
MIT License — see LICENSE.
This project was conceived, built, and shipped by Martin Lonkwitz across six development sections — from the initial project scaffold through the scoring router, MCP server, unit test suite, and production hardening.
Architecture, implementation guidance, code review, and documentation were developed in close collaboration with Claude (Anthropic) — acting as a technical sparring partner throughout the entire build. The orchestrator pattern at the heart of aiorch is itself a reflection of how that collaboration worked: a capable local effort directed and enriched by a powerful remote intelligence.
Open source dependencies:
- cpp-httplib by Yuji Hirose
- nlohmann/json by Niels Lohmann
- linenoise by Salvatore Sanfilippo
- doctest by Viktor Kirilov
- ollama — local LLM inference
- Anthropic — Claude API