Skip to content

Commit a75bc95

Browse files
tercelclaude
andcommitted
feat: integrate durability+governance into TaskManager, clean stale docs
TaskManager integration (the missing wiring): - __init__() accepts checkpoint_manager, retry_manager, circuit_breaker_registry, budget_manager, policy_engine - Pre-execution: circuit breaker check, budget check with policy evaluation (block/downgrade/notify), checkpoint loading - Execution: optional retry wrapping with configurable backoff - Post-success: update token usage, record CB success, clean checkpoints - Post-failure: record CB failure Documentation cleanup: - Rewrite README.md for v2 (AI Agent Production Middleware) - Delete 25+ stale v1 doc files (DuckDB, CLI, CrewAI, A2A protocol) - Update CLAUDE.md with v2 architecture overview - Delete broken v1 examples/ - Fix stale references in remaining architecture docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b837511 commit a75bc95

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+277
-19464
lines changed

CLAUDE.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,21 @@
11
# High-Quality Code Specification – Simplicity, Readability, and Maintainability First
22

33
## Project Overview
4-
The core of `apflow` is **task orchestration and execution specifications**. It provides a unified task orchestration framework that supports execution of multiple task types.
4+
`apflow` is **AI Agent Production Middleware** — framework-agnostic production middleware that makes AI agents reliable, cost-governed, and auditable.
5+
6+
### v2 Architecture (0.20.0)
7+
- **bridge/**: apcore Module registration (auto-discovers executors, exposes via MCP/A2A/CLI)
8+
- **durability/**: Checkpoint/resume, retry with backoff, circuit breaker
9+
- **governance/**: Token budget management, cost policy engine, model downgrade chains
10+
- **core/execution/**: TaskManager with integrated durability + governance
11+
- **core/storage/**: SQLite (default) / PostgreSQL, SQLAlchemy ORM
12+
- **extensions/**: Tool executors (REST, SSH, Docker, Email, etc.) registered as apcore Modules
13+
14+
### Key Dependencies
15+
- `apcore` — Schema-enforced module standard
16+
- `apcore-mcp` — MCP server (AI agent tool integration)
17+
- `apcore-a2a` — A2A server (internal network service)
18+
- `apcore-cli` — CLI generation (human operation)
519

620
## Core Principles
721
- Prioritize **simplicity, readability, and maintainability** above all.

README.md

Lines changed: 117 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -1,153 +1,160 @@
11
# apflow
22

3-
<p align="center">
4-
<img src="apflow-logo.svg" alt="apflow Logo" width="128" height="128" />
5-
</p>
3+
**AI Agent Production Middleware**
64

7-
**Distributed Task Orchestration Framework**
5+
apflow makes any AI agent production-ready — regardless of which framework built it — by providing durable execution, cost governance, and automatic protocol exposure via the apcore ecosystem.
86

9-
apflow is a distributed task orchestration framework that scales from a single process to multi-node clusters. It provides a unified execution interface with 12+ built-in executors, A2A protocol support, and automatic leader election with failover.
7+
## What apflow IS and IS NOT
108

11-
Start standalone in 30 seconds. Scale to distributed clusters without code changes.
9+
| apflow IS | apflow IS NOT |
10+
|---|---|
11+
| Production middleware for AI agents | An agent framework (use LangGraph/CrewAI/etc.) |
12+
| Framework-agnostic reliability layer | A replacement for your existing stack |
13+
| Cost governance and budget enforcement | An LLM routing layer (use LiteLLM/Portkey) |
14+
| A bridge to MCP and A2A protocols | An observability platform (use Langfuse) |
1215

13-
## Deployment Modes
14-
15-
### Standalone (Development and Small Workloads)
16+
## Install
1617

1718
```bash
1819
pip install apflow
1920
```
2021

21-
Single process, DuckDB storage, zero configuration. Ideal for development, testing, and small-scale automation.
22+
This installs core + apcore + MCP + A2A + CLI. No extras needed for full functionality.
23+
24+
Optional:
25+
26+
```bash
27+
pip install apflow[postgres] # PostgreSQL for distributed deployment
28+
pip install apflow[all] # All optional executors (SSH, Docker, Email, etc.)
29+
```
30+
31+
## Quick Start
2232

2333
```python
24-
from apflow.core.builders import TaskBuilder
2534
from apflow import TaskManager, create_session
26-
27-
db = create_session()
28-
task_manager = TaskManager(db)
29-
result = await (
30-
TaskBuilder(task_manager, "rest_executor")
31-
.with_name("fetch_data")
32-
.with_input("url", "https://api.example.com/data")
33-
.with_input("method", "GET")
34-
.execute()
35+
from apflow.bridge import create_apflow_registry
36+
from apflow.durability import RetryPolicy, CheckpointManager, CircuitBreakerRegistry
37+
from apflow.governance import BudgetManager, PolicyEngine, CostPolicy, PolicyAction
38+
39+
# 1. Create database session (SQLite by default, zero config)
40+
session = create_session()
41+
42+
# 2. Create task manager with durability and governance
43+
task_manager = TaskManager(
44+
session,
45+
checkpoint_manager=CheckpointManager(session),
46+
circuit_breaker_registry=CircuitBreakerRegistry(),
3547
)
48+
49+
# 3. Register as apcore modules (auto-exposes via MCP, A2A, CLI)
50+
registry = create_apflow_registry(task_manager, task_creator, task_repository)
51+
52+
# 4. Start MCP server (AI agents can now discover and call apflow)
53+
from apcore_mcp import serve
54+
serve(registry, transport="stdio")
3655
```
3756

38-
### Distributed Cluster (Production)
57+
## Core Features
3958

40-
```bash
41-
pip install apflow[standard]
59+
### Durable Execution (F-003)
60+
61+
Checkpoint/resume for long-running tasks. Retry with configurable backoff. Circuit breaker for fault isolation.
62+
63+
```python
64+
from apflow.durability import RetryPolicy, BackoffStrategy
65+
66+
policy = RetryPolicy(
67+
max_attempts=5,
68+
backoff_strategy=BackoffStrategy.EXPONENTIAL,
69+
backoff_base_seconds=2.0,
70+
)
4271
```
4372

44-
PostgreSQL-backed, leader/worker topology with automatic leader election, task leasing, and horizontal scaling. Same application code -- only the runtime environment changes.
73+
### Cost Governance (F-004)
4574

46-
```bash
47-
# Leader node
48-
APFLOW_CLUSTER_ENABLED=true \
49-
APFLOW_DATABASE_URL=postgresql+asyncpg://user:pass@db:5432/apflow \
50-
APFLOW_NODE_ROLE=auto \
51-
apflow serve --port 8000
52-
53-
# Worker node (on additional machines)
54-
APFLOW_CLUSTER_ENABLED=true \
55-
APFLOW_DATABASE_URL=postgresql+asyncpg://user:pass@db:5432/apflow \
56-
APFLOW_NODE_ROLE=worker \
57-
apflow serve --port 8001
75+
Token budget management with automatic model downgrade when budgets are approached.
76+
77+
```python
78+
from apflow.governance import PolicyEngine, CostPolicy, PolicyAction
79+
80+
engine = PolicyEngine()
81+
engine.register_policy(CostPolicy(
82+
name="auto-downgrade",
83+
action=PolicyAction.DOWNGRADE,
84+
threshold=0.8,
85+
downgrade_chain=["claude-opus-4", "claude-sonnet-4", "claude-haiku-4"],
86+
))
5887
```
5988

60-
Add worker nodes at any time. The cluster auto-discovers them via the shared database.
61-
62-
## Installation Options
63-
64-
| Extra | Command | Includes |
65-
|-------|---------|----------|
66-
| Core | `pip install apflow` | Task orchestration, DuckDB storage, core executors |
67-
| Standard | `pip install apflow[standard]` | Core + A2A server + CLI + CrewAI + LLM + tools |
68-
| A2A Server | `pip install apflow[a2a]` | A2A Protocol server (HTTP/SSE/WebSocket) |
69-
| CLI | `pip install apflow[cli]` | Command-line interface |
70-
| PostgreSQL | `pip install apflow[postgres]` | PostgreSQL storage (required for distributed) |
71-
| CrewAI | `pip install apflow[crewai]` | LLM-based agent crews |
72-
| LLM | `pip install apflow[llm]` | Direct LLM via LiteLLM (100+ providers) |
73-
| SSH | `pip install apflow[ssh]` | Remote command execution |
74-
| Docker | `pip install apflow[docker]` | Containerized execution |
75-
| gRPC | `pip install apflow[grpc]` | gRPC service calls |
76-
| Email | `pip install apflow[email]` | Email sending (SMTP) |
77-
| All | `pip install apflow[all]` | Everything |
78-
79-
## Built-in Executors
80-
81-
| Executor | Purpose | Extra |
82-
|----------|---------|-------|
83-
| RestExecutor | HTTP/REST API calls with auth and retry | core |
84-
| CommandExecutor | Local shell command execution | core |
85-
| SystemInfoExecutor | System information collection | core |
86-
| ScrapeExecutor | Web page scraping | core |
87-
| WebSocketExecutor | Bidirectional WebSocket communication | core |
88-
| McpExecutor | Model Context Protocol tools and data sources | core |
89-
| ApFlowApiExecutor | Inter-instance API calls for distributed execution | core |
90-
| AggregateResultsExecutor | Aggregate results from multiple tasks | core |
91-
| SshExecutor | Remote command execution via SSH | [ssh] |
92-
| DockerExecutor | Containerized command execution | [docker] |
93-
| GrpcExecutor | gRPC service calls | [grpc] |
94-
| SendEmailExecutor | Send emails via SMTP or Resend API | [email] |
95-
| CrewaiExecutor | LLM agent crews via CrewAI | [crewai] |
96-
| BatchCrewaiExecutor | Atomic batch of multiple crews | [crewai] |
97-
| LLMExecutor | Direct LLM interaction via LiteLLM | [llm] |
98-
| GenerateExecutor | Natural language to task tree via LLM | [llm] |
89+
### apcore Module Bridge (F-002)
9990

100-
## Architecture
91+
One registration, three protocols. All apflow capabilities automatically exposed via MCP, A2A, and CLI.
92+
93+
```python
94+
from apflow.bridge import create_apflow_registry
95+
96+
registry = create_apflow_registry(task_manager, task_creator, task_repository)
97+
98+
# MCP — AI agent tool integration
99+
from apcore_mcp import serve
100+
serve(registry, transport="streamable-http", port=8000)
101101

102+
# A2A — Internal network service
103+
from apcore_a2a import serve as a2a_serve
104+
a2a_serve(registry, name="apflow", url="http://localhost:9000")
105+
106+
# CLI — Human operation
107+
from apcore_cli import create_cli
108+
cli = create_cli()
102109
```
103-
+--------------------------+
104-
| Client / CLI / API |
105-
+------------+-------------+
106-
|
107-
+------------------+------------------+
108-
| | |
109-
+---------v--------+ +------v------+ +----------v--------+
110-
| Leader Node | | Worker Node | | Worker Node |
111-
| (auto-elected) | | | | |
112-
| - Task placement | | - Execute | | - Execute |
113-
| - Lease mgmt | | - Heartbeat| | - Heartbeat |
114-
| - Execute | | | | |
115-
+---------+--------+ +------+------+ +----------+--------+
116-
| | |
117-
+------------------+------------------+
118-
|
119-
+------------v-------------+
120-
| PostgreSQL (shared) |
121-
| - Task state |
122-
| - Leader lease |
123-
| - Node registry |
124-
+--------------------------+
110+
111+
## Storage
112+
113+
- **SQLite** (default): Zero config, WAL mode, in-memory for tests
114+
- **PostgreSQL**: For distributed/production deployments
115+
116+
```python
117+
# SQLite (default)
118+
session = create_session()
119+
120+
# SQLite in-memory (testing)
121+
session = create_session(path=":memory:")
122+
123+
# PostgreSQL (production)
124+
session = create_session(connection_string="postgresql://user:pass@host/db")
125125
```
126126

127-
*Standalone mode uses the same architecture with a single node and DuckDB storage.*
127+
## Architecture
128128

129-
## Documentation
129+
```
130+
Protocol Layer (apcore ecosystem)
131+
apcore-mcp | apcore-a2a | apcore-cli
132+
133+
Module Standard (apcore)
134+
Registry | Executor | ACL | Middleware
130135
131-
- [Getting Started](docs/getting-started/quick-start.md) -- Up and running in 10 minutes
132-
- [Distributed Cluster Guide](docs/guides/distributed-cluster.md) -- Multi-node deployment
133-
- [Executor Selection Guide](docs/guides/executor-selection.md) -- Choose the right executor
134-
- [API Reference](docs/api/python.md) -- Python API documentation
135-
- [Architecture Overview](docs/architecture/overview.md) -- Design and internals
136-
- [Protocol Specification](docs/protocol/index.md) -- A2A Protocol spec
136+
apflow v2 (this project)
137+
Durable Execution | Cost Governance | Module Bridge
138+
Task Orchestration Engine (TaskManager, TaskCreator)
139+
Executors (REST, SSH, Docker, Email, ...)
140+
Storage (SQLite / PostgreSQL)
137141
138-
Full documentation: [flow-docs.aiperceivable.com](https://flow-docs.aiperceivable.com)
142+
Agent Frameworks (bring your own)
143+
LangGraph | CrewAI | OpenAI Agents | Any
144+
```
139145

140-
## Contributing
146+
## Documentation
141147

142-
Contributions are welcome. See [Contributing Guide](docs/development/contributing.md) for setup and guidelines.
148+
- [PRD](docs/apflow-v2/prd.md) — Product requirements
149+
- [Tech Design](docs/apflow-v2/tech-design.md) — Architecture and design
150+
- [Feature Specs](docs/features/) — Implementation specifications
143151

144152
## License
145153

146154
Apache-2.0
147155

148156
## Links
149157

150-
- **Documentation**: [flow-docs.aiperceivable.com](https://flow-docs.aiperceivable.com)
151158
- **Website**: [aiperceivable.com](https://aiperceivable.com)
152159
- **GitHub**: [aiperceivable/apflow](https://github.com/aiperceivable/apflow)
153160
- **PyPI**: [apflow](https://pypi.org/project/apflow/)

docs/README.md

Lines changed: 0 additions & 22 deletions
This file was deleted.

0 commit comments

Comments
 (0)