Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **State migration for checkpointed graphs (proposal 0014, introduced in spec v0.15.0; refined by proposal 0018 in spec v0.16.0).** Saved checkpoints whose `schema_version` doesn't match the current state class now route through a registered migration chain instead of failing on resume. Surface: `State.schema_version: ClassVar[str] = ""` (declare a non-empty value to opt in), `GraphBuilder.with_state_migration(from_version, to_version, migrate)` and `with_state_migrations(*migrations)` for registration, `StateMigration` and `MigrationRegistry` types exported from `openarmature.checkpoint`. Chain resolution is BFS over the registered edges; the shortest path wins. Three new error categories: `CheckpointStateMigrationChainAmbiguous` (proposal 0018: duplicate `(from, to)` pair at registration time, or multiple distinct shortest paths between the saved and current versions at resume time), `CheckpointStateMigrationMissing` (no chain bridges the versions), and `CheckpointStateMigrationFailed` (a migration function raised). All non-transient. Post-migration deserialization failures still route to `CheckpointRecordInvalid` per §10.12.4. The same chain applies to each entry in `parent_states` in lockstep with the outer state per §10.12.2. Routing precedence per §10.10 (v0.16.0): chain-ambiguous → missing → failed → record-invalid.
- **`Checkpointer.supports_state_migration` Protocol attribute.** Marks whether a backend can expose the structural intermediate form (a plain dict, JSON tree) the migration registry consumes. `SQLiteCheckpointer(serialization="json")` opts in; `SQLiteCheckpointer(serialization="pickle")` and `InMemoryCheckpointer` opt out. On version mismatch against a non-migration-eligible backend the engine raises `CheckpointRecordInvalid` per spec §10.12.1.
- **`openarmature.checkpoint.migrate` OTel span (proposal 0014 §6 cross-ref).** Versioned resumes whose migration chain runs emit a zero-duration `openarmature.checkpoint.migrate` span on the OTel observer, parented under the invocation root span. Attributes: `openarmature.checkpoint.migrate.from_version`, `openarmature.checkpoint.migrate.to_version` (the final target), `openarmature.checkpoint.migrate.chain_length`. The §10.12.3 fast path (versions match, registry not consulted) emits no span. Engine-side: a synthetic `checkpoint_migrated` observer phase carries a `_MigrationSummary` payload from `_migrate_record` through to the OTel observer; the new phase is gated off default subscriptions (observers opt in explicitly via `phases={..., "checkpoint_migrated"}`).
- **Prompt-management capability (proposal 0017, introduced in spec v0.15.0).** New `openarmature.prompts` subpackage. `PromptManager` composes one or more `PromptBackend`s, exposes `fetch` / `render` / `get`, applies the §8 fallback semantics (`prompt_store_unavailable` continues to the next backend; `prompt_not_found` stops the chain), and renders templates with Jinja2's `StrictUndefined` per §7. `Prompt` / `PromptResult` / `PromptGroup` are Pydantic models matching spec §3 / §4 / §9. Three error categories (`PromptNotFound`, `PromptRenderError`, `PromptStoreUnavailable`) with `PROMPT_TRANSIENT_CATEGORIES` exported for retry-middleware classifiers. `FilesystemPromptBackend` is the minimum local-filesystem reference backend (layout: `<root>/<label>/<name>.j2`; `version` derived from the first 16 hex chars of `template_hash`). New runtime dependency: `jinja2>=3.1`.
- **`openarmature.prompts.context` — observability propagation per spec §11.** `with_active_prompt(result)` and `with_active_prompt_group(group)` context managers + `current_prompt_result()` / `current_prompt_group()` inspectors. When the OTel observer is active and an LLM call fires inside `with_active_prompt`, the `openarmature.llm.complete` span carries the normative `openarmature.prompt.*` attributes (`name`, `version`, `label`, `template_hash`, `rendered_hash`, `group_name`). Nesting is innermost-wins.
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
Expand All @@ -22,7 +25,9 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Changed

- **Pinned spec version: 0.10.0 → 0.15.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.15.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017) in one bump. Only the surface introduced by proposal 0016 is implemented in this changelog entry; fixtures from 0011 / 0014 / 0015 / 0017 are marked deferred-skip in the conformance suite and unmark as their respective PRs land.
- **Pinned spec version: 0.10.0 → 0.16.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.16.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017, 0018) in one bump. Only the surfaces introduced by proposals 0014–0017 are implemented in the batch's release; fixtures from 0011 are deferred-skip in the conformance suite and unmark with PR-5.
- **`CheckpointRecord.schema_version` semantic shift (proposal 0014).** Previously a backend-internal record-shape version (`CHECKPOINT_SCHEMA_VERSION = "1"` constant), now the user-facing state-schema version per spec §10.2. The framework reads `type(state).schema_version` at save time. Pre-PR-4 records carrying `"1"` are reinterpreted as user-facing v1 identifiers; users with such records either declare `schema_version="1"` on their state class or discard the pre-PR-4 records. `SQLiteCheckpointer` no longer rejects records with non-default `schema_version` at the backend boundary; version-mismatch routing is now an engine concern at resume time. The `CHECKPOINT_SCHEMA_VERSION` module constant is removed; future record-shape evolution can add backend-private metadata fields if needed.
- **`NodeEvent.pre_state` typed `Any` (was `State`).** Required by the new `checkpoint_migrated` phase which carries a `_MigrationSummary` payload rather than a `State` instance. Observer authors who type-narrowed `pre_state` to `State` should treat it as `Any` and narrow per-phase (e.g., `if event.phase == "completed": ...`). The `checkpoint_saved` phase already carried a State-flavored shape (not necessarily a typed `State` subclass instance), so this widens the declared type to match runtime reality rather than introducing a new constraint.

### Notes

Expand Down
128 changes: 128 additions & 0 deletions docs/concepts/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,134 @@ multi-process), S3 (cross-region durability). For event-sourced
runtimes (Temporal, DBOS, Restate, Inngest) the Protocol is the
adapter layer.

## State migrations

When a checkpoint was saved against an earlier version of your state
schema and the code has since evolved, the engine consults a
**migration registry** to bridge the saved record into the current
shape. Without migrations, a schema change invalidates every prior
checkpoint; with one short registration per change, you keep your
saved records working across releases.

The wire-up is two pieces: declare a version on your state class,
and register one migration per version bump.

```python
from typing import ClassVar
from openarmature.graph import State, GraphBuilder
from openarmature.checkpoint import SQLiteCheckpointer


class MyState(State):
schema_version: ClassVar[str] = "v2"
x: int = 0
new_field: str = "default" # added in v2


def add_new_field_default(state: dict) -> dict:
return {**state, "new_field": "default"}


graph = (
GraphBuilder(MyState)
.add_node(...)
.with_checkpointer(SQLiteCheckpointer("ck.db", serialization="json"))
.with_state_migration("v1", "v2", add_new_field_default)
.compile()
)
```

On resume, the engine reads the saved record's `schema_version`. If
it equals `MyState.schema_version`, the record loads via the §10.4
fast path (no migration consulted). If it differs, the engine
resolves a chain through the registry (BFS for the shortest path),
applies each migration in order to the record's state, then
deserializes the result into your current state class.

### Chain resolution

Registered migrations form a directed graph. Each
`with_state_migration(a, b, fn)` is an edge from `a` to `b`. Chain
resolution finds the shortest path between the saved version and the
current version. Branching is fine: a v1 record can have one
migration leading to v2 and another leading to v2-experimental;
chain resolution picks the path that ends at the current declared
version.

Two ambiguity cases are configuration errors. Both surface as
`CheckpointStateMigrationChainAmbiguous`:

- **Duplicate edges.** Registering two migrations with the same
`(from_version, to_version)` pair raises at registration time so
the configuration error surfaces before any resume attempt.
Either delete one or pick distinct version identifiers.
- **Multiple shortest paths.** A diamond like
`v1 → v2 → v4` and `v1 → v3 → v4` is ambiguous: both paths have
length 2. The engine raises during resume so the user can
register fewer migrations or pick a single canonical route.

### The three new error categories

- **`CheckpointStateMigrationChainAmbiguous`**: the registered
migration graph is ambiguous (duplicate `(from, to)` pair at
registration time, OR multiple distinct shortest paths between
the saved and current versions at resume time). Surfaces before
any migration function runs. Carries `from_version` and
`to_version` when known.
- **`CheckpointStateMigrationMissing`**: the saved version doesn't
match the current version, and no chain bridges them. Carries
`from_version`, `to_version`, a count of registered migrations,
and a human-readable `registry_description` so operators see what
IS available.
- **`CheckpointStateMigrationFailed`**: a user-supplied migration
function raised. Subsequent migrations in the chain don't run;
the resume fails. The migration's exception rides `__cause__`.

Routing precedence on resume: chain-ambiguous → missing → failed →
record-invalid.

A third category, `CheckpointRecordInvalid`, continues to cover the
**post**-migration case: a migration ran cleanly but produced
output that the current state class can't deserialize (missing a
required field, wrong type, etc.). The three categories are
mutually exclusive on any given resume.

### Backend support

Not every backend can migrate. Migration needs the backend to expose
a **structural intermediate form** of the loaded state (a plain
dict, JSON tree, or similar) that's independent of the current
state class.

- **`SQLiteCheckpointer(serialization="json")`** can. JSON-encoded
state loads to a dict; the migration function operates on the
dict directly.
- **`SQLiteCheckpointer(serialization="pickle")`** can NOT. Pickle
holds class identity and round-trips back to typed instances.
- **`InMemoryCheckpointer`** can NOT. It holds live typed-state
references by reference; there's no serialization step.

On version mismatch against a non-migration-eligible backend, the
engine raises `CheckpointRecordInvalid` (not
`CheckpointStateMigrationMissing`): the registry has no chance to
bridge.

### Parent-state migration

Subgraph saves carry a `parent_states` chain of the outer-graph
state captured at the moment of the inner save. On resume, the same
migration chain applies to each entry in `parent_states` in lockstep
with the outer state. The spec treats `parent_states` as carrying
the same `schema_version` as the outer record (no per-parent
version metadata in v1).

### Migrations MUST be pure

A migration function MUST be deterministic, with no I/O, no implicit
state, no random or wall-clock-derived output. The framework
doesn't enforce purity, but violating it breaks determinism
guarantees for resume.

## When NOT to use checkpointing

- **Pure pipelines that complete in seconds.** Restart-from-entry is
Expand Down
2 changes: 1 addition & 1 deletion openarmature-spec
Submodule openarmature-spec updated 44 files
+76 −0 .github/workflows/docs.yml
+8 −1 .gitignore
+53 −37 CHANGELOG.md
+2 −2 README.md
+1 −0 docs/capabilities/graph-engine.md
+1 −0 docs/capabilities/llm-provider.md
+1 −0 docs/capabilities/observability.md
+1 −0 docs/capabilities/pipeline-utilities.md
+1 −0 docs/capabilities/prompt-management.md
+1 −0 docs/changelog.md
+1 −0 docs/governance.md
+157 −0 docs/index.md
+17 −0 docs/javascripts/header-link.js
+6 −0 docs/javascripts/tablesort.js
+6 −0 docs/javascripts/tablesort.min.js
+0 −34 docs/openarmature.md
+29 −0 docs/proposals.md
+1 −0 docs/proposals/0001-graph-engine-foundation.md
+1 −0 docs/proposals/0002-subgraph-explicit-mapping.md
+1 −0 docs/proposals/0003-node-boundary-observer-hooks.md
+1 −0 docs/proposals/0004-pipeline-utilities-middleware.md
+1 −0 docs/proposals/0005-pipeline-utilities-parallel-fan-out.md
+1 −0 docs/proposals/0006-llm-provider-core.md
+1 −0 docs/proposals/0007-observability-otel-span-mapping.md
+1 −0 docs/proposals/0008-pipeline-utilities-checkpointing.md
+1 −0 docs/proposals/0009-pipeline-utilities-per-instance-fan-out-resume.md
+1 −0 docs/proposals/0010-drain-timeout.md
+1 −0 docs/proposals/0011-pipeline-utilities-parallel-branches.md
+1 −0 docs/proposals/0012-graph-engine-completed-event-after-edges.md
+1 −0 docs/proposals/0013-fan-out-config-on-node-event.md
+1 −0 docs/proposals/0014-pipeline-utilities-state-migration.md
+1 −0 docs/proposals/0015-llm-provider-multimodal-images.md
+1 −0 docs/proposals/0016-llm-provider-structured-output.md
+1 −0 docs/proposals/0017-prompt-management-core.md
+1 −0 docs/proposals/0018-state-migration-chain-ambiguity.md
+353 −0 docs/stylesheets/extra.css
+132 −0 mkdocs.yml
+42 −0 mkdocs_hooks.py
+231 −0 proposals/0018-state-migration-chain-ambiguity.md
+18 −0 pyproject.toml
+65 −0 spec/pipeline-utilities/conformance/047-state-migration-chain-ambiguous.md
+96 −0 spec/pipeline-utilities/conformance/047-state-migration-chain-ambiguous.yaml
+39 −16 spec/pipeline-utilities/spec.md
+739 −0 uv.lock
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Repository = "https://github.com/LunarCommand/openarmature-python"
Specification = "https://github.com/LunarCommand/openarmature-spec"

[tool.openarmature]
spec_version = "0.15.0"
spec_version = "0.16.0"

[dependency-groups]
dev = [
Expand Down
2 changes: 1 addition & 1 deletion src/openarmature/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""OpenArmature — workflow framework for LLM pipelines and tool-calling agents."""

__version__ = "0.5.0"
__spec_version__ = "0.15.0"
__spec_version__ = "0.16.0"
11 changes: 9 additions & 2 deletions src/openarmature/checkpoint/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,12 @@
CheckpointNotFound,
CheckpointRecordInvalid,
CheckpointSaveFailed,
CheckpointStateMigrationChainAmbiguous,
CheckpointStateMigrationFailed,
CheckpointStateMigrationMissing,
)
from .migration import MigrationRegistry, StateMigration
from .protocol import (
CHECKPOINT_SCHEMA_VERSION,
Checkpointer,
CheckpointFilter,
CheckpointRecord,
Expand All @@ -37,17 +40,21 @@
)

__all__ = [
"CHECKPOINT_SCHEMA_VERSION",
"CheckpointError",
"CheckpointFilter",
"CheckpointNotFound",
"CheckpointRecord",
"CheckpointRecordInvalid",
"CheckpointSaveFailed",
"CheckpointStateMigrationChainAmbiguous",
"CheckpointStateMigrationFailed",
"CheckpointStateMigrationMissing",
"CheckpointSummary",
"Checkpointer",
"InMemoryCheckpointer",
"MigrationRegistry",
"NodePosition",
"SQLiteCheckpointer",
"SerializationMode",
"StateMigration",
]
17 changes: 17 additions & 0 deletions src/openarmature/checkpoint/backends/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,25 @@ class InMemoryCheckpointer:
Pydantic state instance the engine produces is what comes back
from :meth:`load` — no serialization round-trip. (This is the
feature: tests can assert on the saved state's identity.)

**State-migration eligibility:** none. Per spec §10.12.1, a
backend supports migration only when it can expose a structural
intermediate form of the loaded state independent of the current
state class. This backend holds live typed instances by
reference, so a version mismatch on resume raises
``CheckpointRecordInvalid`` rather than consulting the
migration registry.
"""

# Per spec §10.12.1: in-memory storage holds live typed-state
# references, so there's no class-independent intermediate form
# the migration registry could consume. Declared at the class
# level (not as a per-instance attribute) since the answer is
# constructor-independent; the Protocol declaration in
# ``protocol.py`` types this as ``bool`` (not ``ClassVar[bool]``)
# so Pyright accepts a class-attribute override here.
supports_state_migration: bool = False

def __init__(self) -> None:
self._records: dict[str, CheckpointRecord] = {}
self._lock = asyncio.Lock()
Expand Down
20 changes: 13 additions & 7 deletions src/openarmature/checkpoint/backends/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@

from ..errors import CheckpointRecordInvalid
from ..protocol import (
CHECKPOINT_SCHEMA_VERSION,
CheckpointFilter,
CheckpointRecord,
CheckpointSummary,
Expand Down Expand Up @@ -109,6 +108,13 @@ def __init__(
self._serialization: SerializationMode = serialization
self._lock = asyncio.Lock()
self._initialized = False
# Per spec §10.12.1, a backend supports state migration only
# when it can expose a structural intermediate form of the
# loaded state that is independent of the current state
# class. JSON serialization satisfies this (loads to dicts);
# pickle holds class identity and round-trips to typed
# instances, so it cannot bridge a schema-version mismatch.
self.supports_state_migration: bool = serialization == "json"

def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(self._path)
Expand Down Expand Up @@ -230,12 +236,12 @@ def _do() -> tuple[Any, ...] | None:
schema_version,
recorded_serialization,
) = row
if schema_version != CHECKPOINT_SCHEMA_VERSION:
raise CheckpointRecordInvalid(
invocation_id,
f"persisted schema_version={schema_version!r} does not match "
f"current {CHECKPOINT_SCHEMA_VERSION!r}",
)
# Note: per spec §10.12 (proposal 0014), version mismatches
# are no longer rejected at the backend boundary. The engine
# routes mismatches through the migration registry on resume
# (CheckpointStateMigrationMissing if no chain, else applies
# the chain). The backend just round-trips the version
# identifier as opaque data.
state = self._decode(state_blob, recorded_serialization, invocation_id)
position_dicts = self._decode(positions_blob, recorded_serialization, invocation_id)
parent_states = self._decode(parent_states_blob, recorded_serialization, invocation_id)
Expand Down
Loading