Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions docs/concepts/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Save state at every node boundary; resume a crashed run from the last
saved point on a subsequent `invoke()`. Without a checkpointer, the
engine holds no state across invocations a crash means start-from-entry.
engine holds no state across invocations; a crash means start-from-entry.

## Wiring a checkpointer

Expand All @@ -27,7 +27,7 @@ graph = (

The engine writes a record at every `completed` event for outermost-
graph nodes and subgraph-internal nodes. **Fan-out instance internal
events do NOT save** in the shipping version — atomic-restart is the
events do NOT save** in the shipping version. Atomic-restart is the
fan-out contract.

## Saves are synchronous-by-contract
Expand All @@ -40,7 +40,7 @@ because the save resolves before the next node runs.

The corollary: slow backends throttle execution. Wrapping a high-
latency persistence layer in a checkpointer makes the whole graph
run at its latency. Plan accordingly async writes inside the
run at its latency. Plan accordingly: async writes inside the
backend (e.g., `asyncio.to_thread` around a sync driver) are fine;
fire-and-forget patterns that return before durability is established
violate the contract.
Expand Down Expand Up @@ -94,7 +94,7 @@ Field framing worth getting right:
there.
- **`correlation_id` ≠ `invocation_id`.** `invocation_id` identifies
*this* graph run uniquely. `correlation_id` is a cross-system
identifier propagated via ContextVar multiple invocations
identifier propagated via ContextVar; multiple invocations
related by a higher-level request can share one `correlation_id`
while each having its own `invocation_id`. See
[Observability](observability.md) for how `correlation_id`
Expand All @@ -119,15 +119,15 @@ class Checkpointer(Protocol):
async def delete(self, invocation_id: str) -> None: ...
```

- **`save`** persist the record under `invocation_id`. Durable for
- **`save`**: persist the record under `invocation_id`. Durable for
any backend that documents durability. Synchronous-by-contract per
the section above.
- **`load`** return the *most recent* record for `invocation_id`,
- **`load`**: return the *most recent* record for `invocation_id`,
or `None`. Round-trip-stable with what `save` wrote.
- **`list`** enumerate saved invocations, optionally filtered by
- **`list`**: enumerate saved invocations, optionally filtered by
`CheckpointFilter` (currently a single `correlation_id` field; v1
ships intentionally narrow).
- **`delete`** remove all records for `invocation_id`. No-op if the
- **`delete`**: remove all records for `invocation_id`. No-op if the
invocation has no record (no error).

Backends MUST be safe to share across concurrent invocations; the
Expand All @@ -137,10 +137,10 @@ call.

## Two built-in backends

- **`InMemoryCheckpointer`** backed by a dict in process memory.
- **`InMemoryCheckpointer`**: backed by a dict in process memory.
Loses everything on process exit. Useful for tests and short-lived
contexts that want the API surface without disk overhead.
- **`SQLiteCheckpointer`** backed by a SQLite database file.
- **`SQLiteCheckpointer`**: backed by a SQLite database file.
Survives process exit. Reasonable default for any non-trivial use.

Custom backends just implement the four-method Protocol. Targets that
Expand All @@ -155,7 +155,7 @@ adapter layer.
cheap; checkpoints are pure overhead.
- **Pipelines whose external side effects can't safely be re-played.**
If node A sends an email, resuming from after A means the email
has already sent fine if your downstream is idempotent, surprising
has already sent; fine if your downstream is idempotent, surprising
if it isn't. Reason explicitly about replay semantics before turning
on resume.

Expand Down
32 changes: 16 additions & 16 deletions docs/concepts/composition.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ pipeline of reusable sub-pipelines:
2. **Subgraphs** encapsulate a sub-pipeline as a single node.
3. **Projections** translate state across the subgraph boundary.

None of these add new primitives a conditional edge is still one
outgoing edge, a subgraph is still a single node but they change
None of these add new primitives (a conditional edge is still one
outgoing edge, a subgraph is still a single node) but they change
what a graph can express.

## Conditional edges
Expand Down Expand Up @@ -52,7 +52,7 @@ it somewhere invisible," state-driven routing gives you:

**Why sync?** Conditional edges are routing decisions, not units of
work. If you want `async def`, the right move is to do the IO in the
producing node and write the decision to a state field exactly what
producing node and write the decision to a state field, exactly what
`classify` does. Keeping edges sync keeps the loop simple to read:
node (async) → merge → edge (sync) → next.

Expand Down Expand Up @@ -114,7 +114,7 @@ builder.add_subgraph_node("research", research_subgraph, projection=...)
**Separate state schemas are load-bearing.** The subgraph has its own
`State` subclass, distinct from the parent's. At compile time, the
subgraph's reducer table and field validation are built against its
own schema. Parent fields can't leak in by accident they aren't in
own schema. Parent fields can't leak in by accident; they aren't in
scope on either side of the boundary. **The only way data crosses is
through the projection.**

Expand Down Expand Up @@ -153,14 +153,14 @@ If you don't pass a `projection=` argument, you get this. It behaves
asymmetrically:

- **`project_in`: parent state is ignored.** Returns
`subgraph_state_cls()` a fresh instance from the subgraph's
`subgraph_state_cls()`, a fresh instance from the subgraph's
defaults. If the subgraph has a required field, this constructor
fails; the subgraph can't run without an explicit projection.
- **`project_out`: field-name intersection.** Looks at the subgraph's
final state, keeps fields whose names also exist on the parent, and
returns them as a partial update. The parent's reducers then merge.

The asymmetry"closed on the way in, open on the way back" — is by
The asymmetry, "closed on the way in, open on the way back," is by
design. The author opts *in* to sharing data with the subgraph; the
subgraph's observable outputs route back through the parent's reducers
automatically.
Expand All @@ -183,17 +183,17 @@ projection = ExplicitMapping[ParentState, SubgraphState](
builder.add_subgraph_node("analyze_a", subgraph, projection=projection)
```

`inputs` and `outputs` are independent pass either, both, or neither.
`inputs` and `outputs` are independent; pass either, both, or neither.

**Asymmetry inputs additive, outputs replacement.** This mirrors the
**Asymmetry: inputs additive, outputs replacement.** This mirrors the
default's asymmetry.

- `inputs` is *additive over no-projection-in*. Subgraph fields named
in `inputs` get the corresponding parent field's value; unnamed
fields get their schema defaults.
- `outputs` *replaces* field-name matching when present. Only pairs
named in `outputs` are merged back. Unnamed subgraph fields are
discarded no slip of extra fields by accident.
discarded, so no slip of extra fields by accident.

**`None` vs `{}` for `outputs`:**

Expand All @@ -206,7 +206,7 @@ default's asymmetry.
**Compile-time validation.** `ExplicitMapping.validate` runs at
parent-graph compile and raises `MappingReferencesUndeclaredField` if
any mapping names a field that isn't on the relevant schema.
Refactor-safe if you rename a parent field but forget the mapping,
Refactor-safe: if you rename a parent field but forget the mapping,
construction fails, not runtime.

**The case `ExplicitMapping` uniquely unlocks.** Same subgraph at
Expand Down Expand Up @@ -235,13 +235,13 @@ builder.add_subgraph_node(

The two sites address disjoint parent fields, so they cannot collide.
Without explicit mapping, both calls would have to read from and write
to the same parent fields under name matching making "run the same
to the same parent fields under name matching, making "run the same
subgraph twice on different inputs" structurally impossible.

### Custom projection strategies

If you need behavior beyond name-mapping synthesize values, project
conditionally, transform on the way through write a class that
If you need behavior beyond name-mapping (synthesize values, project
conditionally, transform on the way through), write a class that
matches the Protocol:

```python
Expand All @@ -268,12 +268,12 @@ A few design points worth sitting with:
- **Unknown fields from `project_out` raise.** Parent's `extra="forbid"`
catches typos at the merge boundary.
- **The `parent_state` argument of `project_out` is for context, not
for writing.** You can read it to decide what to project "only
return the answer if the parent was in a research route" but you
for writing.** You can read it to decide what to project ("only
return the answer if the parent was in a research route") but you
can't mutate it.

`ProjectionStrategy` is a `Protocol`, not a base class. A class fits
the shape or it doesn't; the type checker verifies at use sites. If
you have Java instincts ("where's the `implements` keyword?"), reach
for TypeScript or Go interface instincts instead that's the same
for TypeScript or Go interface instincts instead; that's the same
family.
36 changes: 18 additions & 18 deletions docs/concepts/fan-out.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ a different input, results merged back deterministically.
The "same subgraph at two-or-three call sites" pattern from
[`ExplicitMapping`](composition.md#explicitmapping-declarative)
handles cases where you know the parent fields up front. Fan-out
handles N call sites where N is determined at runtime "for each
handles N call sites where N is determined at runtime: "for each
item in `state.urls`, run the scraping subgraph; collect the
results."

Expand All @@ -15,7 +15,7 @@ results."
A fan-out can dispatch instances driven by a list in state
(`items_field` mode) or by a count resolved from state (`count` mode).

**`items_field` mode** one instance per item in a parent list field:
**`items_field` mode**: one instance per item in a parent list field:

```python
from openarmature.graph import FanOutConfig, FanOutNode
Expand All @@ -24,7 +24,7 @@ scrape_all = FanOutNode(
name="scrape_all",
config=FanOutConfig(
subgraph=scrape_subgraph, # CompiledGraph[ScrapeState]
items_field="urls", # parent list field one instance per item
items_field="urls", # parent list field, one instance per item
item_field="url", # subgraph field that receives each item
collect_field="content", # subgraph field whose value is collected
target_field="contents", # parent list field that receives the collection
Expand All @@ -36,7 +36,7 @@ scrape_all = FanOutNode(
builder.add_node("scrape_all", scrape_all)
```

**`count` mode** fixed-or-dynamic instance count, no list field:
**`count` mode**: fixed-or-dynamic instance count, no list field:

```python
fan_out = FanOutNode(
Expand All @@ -58,7 +58,7 @@ time.

## Per-instance state, inputs and outputs

Each instance gets its own subgraph state distinct from siblings,
Each instance gets its own subgraph state, distinct from siblings,
distinct from the parent. By default the instance receives only:

- the dispatched item in the field named by `item_field` (in
Expand All @@ -67,25 +67,25 @@ distinct from the parent. By default the instance receives only:

`inputs` is a `Mapping[subgraph_field, parent_field]`. The subgraph
fields not named in `inputs` (and not `item_field`) take their
schema defaults same closed-by-default-on-the-way-in posture as
schema defaults; same closed-by-default-on-the-way-in posture as
the explicit-projection story for ordinary subgraphs.

On exit, each instance's `collect_field` value becomes one element
of the parent's `target_field` list, in instance-index order. To
collect additional per-instance fields, declare
`extra_outputs: Mapping[parent_field, subgraph_field]` each becomes
`extra_outputs: Mapping[parent_field, subgraph_field]`; each becomes
its own parent list of the same length, instance-index-aligned.

## Error policy

Two values:

- **`"fail_fast"`** (default) the first instance failure cancels
- **`"fail_fast"`** (default): the first instance failure cancels
the in-flight siblings (`asyncio.gather` semantics) and propagates
as a `NodeException` wrapping the failing instance's cause, with
`recoverable_state` set to the parent's pre-fan-out snapshot. Use
this when one bad result invalidates the rest.
- **`"collect"`** instance failures are captured; the fan-out runs
- **`"collect"`**: instance failures are captured; the fan-out runs
to completion. Failed instances contribute nothing to
`target_field`. If you declare `errors_field` on the config, each
failed instance produces a record (`{"fan_out_index": str(idx),
Expand All @@ -98,11 +98,11 @@ Choose by whether partial results are useful.
After the fan-out completes, the parent receives a partial update
containing:

- `target_field` list of `collect_field` values, instance-index order.
- Each parent name in `extra_outputs` list of values from the named
- `target_field`: list of `collect_field` values, instance-index order.
- Each parent name in `extra_outputs`: list of values from the named
subgraph field, instance-index order.
- `count_field` (if configured) the instance count.
- `errors_field` (if configured, `"collect"` policy only) per-instance
- `count_field` (if configured): the instance count.
- `errors_field` (if configured, `"collect"` policy only): per-instance
error records.
- `on_empty="noop"` for an empty items_field → all the above with empty
lists; `count_field` set to 0.
Expand All @@ -112,9 +112,9 @@ containing:
If `items_field` is set and the parent list is empty (or `count`
resolves to 0):

- `on_empty="raise"` (default) raises `FanOutEmpty` (a runtime
- `on_empty="raise"` (default): raises `FanOutEmpty` (a runtime
error category).
- `on_empty="noop"` emits an empty partial (no instances dispatched,
- `on_empty="noop"`: emits an empty partial (no instances dispatched,
no errors).

## Observability per instance
Expand All @@ -124,7 +124,7 @@ The fan-out node's own `started` / `completed` events carry a
`item_count` / `concurrency` / `error_policy` / `parent_node_name`.

Per-instance events have `fan_out_index = N` (0-based) and a
namespace whose final element is the fan-out node's name instances
namespace whose final element is the fan-out node's name; instances
do NOT contribute a separate synthetic namespace element. Backends
disambiguate per-instance spans using `fan_out_index` alongside the
namespace.
Expand All @@ -133,7 +133,7 @@ namespace.

A fan-out node's `completed` event triggers a save like any other
outermost-graph or subgraph-internal node. **Per-instance internal
events do NOT save** in the shipping version on resume, the
events do NOT save** in the shipping version; on resume, the
fan-out re-runs end-to-end if it hadn't completed (atomic restart).

A per-instance fan-out resume mode is planned but not yet shipped.
Expand All @@ -147,7 +147,7 @@ The signal: N similar pieces of work, N depends on state at runtime
(not at build time), the work is independent enough to run
concurrently. If N is known at build time and small (≤3),
`ExplicitMapping` at multiple subgraph sites is simpler. If the
work isn't independent instance 2 needs instance 1's output
work isn't independent (instance 2 needs instance 1's output),
that's a linear pipeline, not fan-out.

## What fan-out is NOT
Expand Down
Loading
Loading