feat(llm): structured output (proposal 0016)#42
Conversation
Spec submodule moves from v0.10.0 to v0.15.0 — covers the full 5-proposal batch (0011, 0014, 0015, 0016, 0017) in one bump per the skip-ahead governance principle. spec_version in pyproject.toml bumped to match. Adds jsonschema>=4.0 as a runtime dependency (used by the forthcoming structured-output validation path on the dict-schema side; Pydantic-class path uses its own validator). Adds skip markers to the conformance test files for fixtures whose runtime support lands in a later PR of the batch: - llm-provider 009-020 → 0015 multimodal (PR-2) - llm-provider 021-028 → 0016 structured output (this PR, wired up in a later commit) - pipeline-utilities 032-038 → 0011 parallel branches (PR-5) - pipeline-utilities 039-046 → 0014 state migration (PR-4) - graph-engine 021-observer-branch-name → 0011 parallel branches (PR-5) Skip markers also apply to test_fixture_parsing.py for the same set — the typed harness models in tests/conformance/harness/ don't yet know about the new directive shapes (state_migration, parallel branches state-schema variation, NodeEvent.branch_name); each deferring PR drops its own skip rows when it lands the harness work.
Adds the structured_output_invalid canonical category. Raised when a complete() call requested a response_schema and the provider's returned content could not be parsed as JSON OR did not validate against the schema. The exception carries response_schema, raw_content, and failure_description attributes for caller introspection. Non-transient by default — NOT added to TRANSIENT_CATEGORIES. The default RetryMiddleware classifier will not retry this category; callers wanting retry-on-validation-failure can include the category in a custom classifier's transient set.
Adds the parsed field to the Response record. Default None, populated by structured-output calls (response_schema set on complete() and the model returned structured content). The runtime type is a discriminated union over dict (when the caller passed a JSON-Schema dict) and BaseModel instance (when the caller passed a Pydantic class). Pydantic Response config now allows arbitrary types so a BaseModel instance can sit in the parsed slot. No public surface change for free-form callers — parsed defaults to None and remains None when response_schema is not supplied.
Adds two provider-agnostic helpers in openarmature.llm.provider used by structured-output Provider implementations: - validate_response_schema(schema) — pre-send structural check that the value is a dict and its top-level type is "object". Raises ProviderInvalidRequest on failure. - strict_mode_supported(schema) — whether the schema satisfies the strict-mode constraint set (additionalProperties not true, properties fully covered by required) across the full schema tree. Walks anyOf/oneOf/allOf branches and follows $ref targets with cycle protection. An unresolvable $ref or unknown shape returns False (conservative fail). Both are exported from openarmature.llm so OpenAI-compatible providers and any future Anthropic/Gemini provider share the same constraint heuristic.
Extends the Provider Protocol's complete() method signature to accept an optional response_schema parameter. Accepts either a JSON Schema dict or a Pydantic BaseModel subclass; the implementation converts the class form to a JSON Schema at the boundary. Free-form callers (response_schema=None or absent) see no behavior change — the parameter defaults to None and the v0.4.0 contract is preserved. OpenAIProvider's complete() still has the v0.4.0 signature; the next commit wires the response_schema parameter through it.
Threads response_schema through OpenAIProvider.complete() → _do_complete()
→ _parse_response(). Accepts either a JSON Schema dict OR a Pydantic
BaseModel subclass; the latter is converted via model_json_schema()
at the boundary.
Native wire path: when response_schema is supplied, the request body
includes response_format: { type: "json_schema", json_schema: { name,
schema, strict } }. The name field comes from schema.title when
non-empty, otherwise a deterministic sha256 hash of the schema. The
strict flag is set per strict_mode_supported() — true only when the
schema cleanly satisfies the constraints across the full tree.
Post-receive: parses message.content as JSON, then validates against
the schema. Dict-input path validates with jsonschema and returns a
dict. BaseModel-class-input path validates with model.model_validate()
and returns a BaseModel instance. Either way, JSON parse failure or
schema validation failure raises StructuredOutputInvalid carrying the
schema, raw content, and failure description.
parsed is absent on tool-call responses regardless of whether
response_schema was supplied (mutually exclusive paths). Free-form
calls (response_schema=None) see no behavior change — body omits
response_format, parsed stays None.
The prompt-augmentation fallback path is the next commit.
Adds the prompt-augmentation fallback for OpenAI-compatible servers that don't implement response_format (older vLLM, some LM Studio releases, llama.cpp variants). Constructor: force_prompt_augmentation_fallback: bool = False When True, structured-output calls build the wire body by augmenting the message list with a system directive that includes the serialized JSON Schema, and omit response_format entirely. Native path is the default (False). Inspect property: uses_prompt_augmentation_fallback -> bool Read-only; lets callers verify which wire path is active without poking private state. _augment_messages_with_schema_directive returns a fresh list. When the first message is system, its content is extended with the schema directive (preserving caller intent); otherwise a new system message is prepended. The caller's original messages list is NOT mutated — Message instances are reused unchanged (immutable Pydantic models). Response parsing is unchanged from the native path: parse + validate post-receive raise StructuredOutputInvalid on failure. parsed is populated identically whether the wire took the native or fallback route.
…ries Adds tests/conformance/harness/wire.py with helpers used by structured- output and content-block fixtures (and any future capability fixtures that need the same shapes): - match_wire_body(actual, expected) — recursive deep-equal with "*" wildcard support for string slots. - assert_response_format_absent(body) — asserts the wire body has no response_format key. - assert_system_references_schema(body, schema) — asserts the first message in the body is a system message whose content contains the canonical-JSON form of the schema as a substring. - assert_error_carries(exc, carries) — introspects a raised exception's attributes against an expected_carries block; supports _present / _mentions / literal-equal forms; handles the raw_response_content → raw_content fixture-vs-impl naming alias. Extends test_llm_provider.py to drive these from the existing fixture loop: - response_schema is read from call_spec and threaded through provider.complete(). - expected_wire_request literal compare + expected_wire_request_checks sibling checks fire after each captured chat-completions request. - caller_messages_unmodified takes a model_dump snapshot pre-call and asserts byte-equality post-call. - expected.response.parsed is compared for equality. - expected.raises.carries is fed to assert_error_carries. - retry_middleware: block wraps the call in a default-classifier retry simulator (transient = TRANSIENT_CATEGORIES membership); the captured-request count provides provider_call_count. - mock_provider.capabilities.supports_native_response_format: false constructs the provider with force_prompt_augmentation_fallback=True. The 0016 structured-output fixtures (021–028) remain skipped at this commit. The next commit removes their skip markers.
Removes the deferred-fixture skip markers for the 8 structured-output conformance fixtures (021–028). All pass against the OpenAIProvider + harness extensions landed in earlier commits. Adds tests/unit/test_structured_output.py covering bits the conformance fixtures don't exercise directly: - validate_response_schema edge cases: non-dict, non-object top-level, missing type. - strict_mode_supported: required-coverage rule, additionalProperties true, nested-object violation, anyOf branch violation, internal $ref resolution, unresolvable $ref, $ref cycle (self-referential schema). - _derive_schema_name: title-when-present, hash-fallback, determinism, empty-title behavior. - _augment_messages_with_schema_directive: prepend-when-no-system, extend-existing-system, caller-list-not-mutated, serialized-schema- substring. - Pydantic-class overload: class-in returns validated BaseModel instance; pydantic ValidationError wraps in StructuredOutputInvalid; wire body produced from class equals wire body produced from the equivalent .model_json_schema() dict. - uses_prompt_augmentation_fallback inspect property: False by default, True when constructor flag is set.
Documents the structured-output surface added in this PR: the response_schema parameter, Response.parsed field, StructuredOutputInvalid error category, OpenAIProvider native + fallback wire paths, the provider-agnostic schema helpers, the capability-agnostic conformance harness extensions, and the jsonschema runtime dependency. Also records: - Spec pin bump 0.10.0 → 0.15.0 (skip-ahead governance) with per-proposal deferred-skip in the conformance suite until each PR lands. - Release gate: do not tag the consolidated release until all five PRs of the batch (0011, 0014, 0015, 0016, 0017) are merged.
There was a problem hiding this comment.
Pull request overview
This PR implements structured-output support for the LLM provider surface, including schema-aware completion requests, parsed responses, structured-output validation errors, OpenAI native response_format, prompt-augmentation fallback, and conformance/unit coverage for proposal 0016.
Changes:
- Adds
response_schemahandling,Response.parsed, schema validation helpers, andStructuredOutputInvalid. - Extends
OpenAIProviderwith native structured-output requests and fallback prompt augmentation. - Adds conformance harness utilities, structured-output unit tests, dependency updates, and changelog/spec-version updates.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Locks jsonschema and transitive dependencies. |
tests/unit/test_structured_output.py |
Adds focused tests for schema validation, strict-mode heuristics, fallback helpers, and Pydantic parsing. |
tests/conformance/test_pipeline_utilities.py |
Defers later proposal fixtures in pipeline conformance. |
tests/conformance/test_llm_provider.py |
Adds structured-output fixture support, retry simulation, wire assertions, and deferred multimodal skips. |
tests/conformance/test_fixture_parsing.py |
Skips parser checks for deferred fixture shapes. |
tests/conformance/test_conformance.py |
Defers graph-engine fixture requiring later proposal support. |
tests/conformance/harness/wire.py |
Adds reusable wire-body and error-carry assertion helpers. |
tests/conformance/harness/__init__.py |
Exports new conformance wire helpers. |
src/openarmature/llm/response.py |
Adds ParsedValue and Response.parsed. |
src/openarmature/llm/providers/openai.py |
Implements structured-output request/response handling and fallback augmentation. |
src/openarmature/llm/provider.py |
Extends provider protocol and adds schema/strict-mode helpers. |
src/openarmature/llm/errors.py |
Adds StructuredOutputInvalid category and exception class. |
src/openarmature/llm/__init__.py |
Exports new structured-output APIs. |
pyproject.toml |
Adds jsonschema dependency and bumps spec version. |
CHANGELOG.md |
Documents structured-output feature and release gate. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses the 8 CoPilot review threads on the structured-output PR:
- strict_mode_supported now requires additionalProperties to be
EXPLICITLY false (not just missing-or-false). Missing implies the
JSON Schema default of permitting extras, which OpenAI's strict
mode rejects. Pydantic's .model_json_schema() omits the key by
default, so the class-input path would have 400ed against OpenAI
even with conformance fixtures passing.
- _normalize_response_schema now raises ProviderInvalidRequest when
the class form is not a BaseModel subclass, instead of letting
AttributeError leak from model_json_schema.
- validate_response_schema now runs jsonschema.Draft202012Validator
.check_schema() at the boundary, wrapping SchemaError as
ProviderInvalidRequest. Malformed schemas now fail at the API
boundary instead of escaping at decode time.
- _derive_schema_name now regex-checks the title against OpenAI's
name constraint (^[a-zA-Z0-9_-]{1,64}$) and falls back to the
hashed name when the title doesn't match. Sanitizing-in-place
would silently mutate user intent; the hash is a more honest
fallback.
- Two comments claiming Message instances are immutable Pydantic
models were updated. The models are not configured with
frozen=True; the safety actually comes from the helpers not
modifying them in place.
- match_wire_body now fails on extra keys in actual. The previous
permissive default defeated the point of expected_wire_request
being a literal compare; partial assertions continue to live in
the sibling expected_wire_request_checks block.
- _iter_calls now propagates expected_wire_request,
expected_wire_request_checks, response_schema, and
retry_middleware from sibling-of-call into the call dict. Only
expected was being copied before. Cases-form fixtures with
case-level wire expectations were silently running without those
assertions.
The _iter_calls fix surfaced two pre-existing gaps in the harness's
handling of cases-shape fixtures, fixed inline:
- The harness was never wiring config from the call spec into
provider.complete(); fixture 005's runtime_config_passthrough
case was effectively a no-op.
- OpenAIProvider was using json.dumps default formatting for
tool_call.function.arguments (with spaces after colons), which
doesn't match the canonical compact form OpenAI emits or the
spec's fixture 005 expectations. Switched to compact form.
New unit tests cover the missing-additionalProperties strict-mode
case, the non-BaseModel class rejection, the malformed JSON Schema
rejection, and the title-falls-back hash cases.
Replaces the no-LLM hello-world in README.md with a version that makes a real LLM call via OpenAIProvider and uses a Pydantic class as the response_schema. The resulting Response.parsed flows through state as a typed Classification instance and drives the conditional edge that routes between research and summarize. Defaults to OpenAI public API (gpt-4o-mini) with env-var config: LLM_BASE_URL, LLM_MODEL, LLM_API_KEY. A trailing line in the README calls out OpenRouter, vLLM, LM Studio, llama.cpp as drop-in swaps via base_url/model. The example also lands as a runnable file at examples/00-hello-world/main.py and is added to the smoke test suite. examples/README.md gets a corresponding entry.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 12 comments.
Comments suppressed due to low confidence (1)
tests/unit/test_structured_output.py:470
- This test also leaves the provider’s
httpx.AsyncClientopen. Please close the provider after the assertion (consistent with the otherOpenAIProvidertests) so the test suite does not accumulate unclosed clients.
provider = OpenAIProvider(
base_url="http://mock-llm.test",
model="test-model",
api_key="test-key",
force_prompt_augmentation_fallback=True,
Two bugs surfaced during live validation against OpenAI: - The default LLM_BASE_URL was https://api.openai.com/v1, but our OpenAIProvider's wire path posts to /v1/chat/completions itself. httpx URL join produced https://api.openai.com/v1/v1/chat/completions → 404. Convention is base_url = host root; impl adds /v1. Default now matches; doc-string + README comment make it explicit. - The observer trace fired on the OpenAIProvider LLM-span event (sentinel namespace, post_state=None) and crashed accessing .sources. Added a post_state is not None guard.
The hello-world's research and summarize nodes were returning hard-coded source lists. Replaces both with real provider.complete() calls that emit typed structured output, so the example demonstrates the value of a structured-output pipeline end-to-end instead of just the framework's plumbing. The example now exercises both response_schema forms in one demo: - classify and summarize use Pydantic classes (Classification, Summary); Response.parsed comes back as a validated instance. - research uses a raw JSON Schema dict; Response.parsed comes back as a plain dict. State gains two intermediate-artifact fields (research_plan, summary). Final output prints whichever branch fired, in addition to the existing sources/metadata. The reducer-policy story stays intact (last_write_wins on the LLM outputs, append on sources, merge on metadata). Live-validated against OpenAI gpt-4o-mini; both branches verified (structured class instance + structured dict on Response.parsed).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
tests/unit/test_structured_output.py:472
- This test also creates an
OpenAIProviderwithout closing it. Please close the provider after the assertion (or make the test async and useawait provider.aclose()) to avoid unclosed-client resource warnings.
def test_inspect_property_fallback_when_forced() -> None:
provider = OpenAIProvider(
base_url="http://mock-llm.test",
model="test-model",
api_key="test-key",
force_prompt_augmentation_fallback=True,
)
assert provider.uses_prompt_augmentation_fallback is True
Adds docs/concepts/llms.md covering how LLM calls fit into the graph model: LLM calls as async IO inside nodes, structured output (both response_schema forms + native/fallback wire paths + strict mode), routing on parsed fields, and errors at the LLM boundary. Nav entry added to mkdocs.yml's Concepts section; concepts/index.md TOC extended. Updates docs/model-providers/index.md: Protocol signature now shows the response_schema parameter; errors table adds StructuredOutputInvalid; new Structured output section walks through both response_schema forms, the native/fallback wire paths, and strict-mode constraints. Updates docs/model-providers/authoring.md: skeleton's complete() signature now matches the Protocol (response_schema parameter); a new "Structured output" entry in Beyond the skeleton points custom- provider authors at validate_response_schema and strict_mode_supported. mkdocs builds clean in strict mode; the runnable example in the new Structured output section is verified by tests/test_docs_examples.py.
The Returns block on Provider.complete started with "A :class:Response carrying ...", which mkdocstrings' Google-parser misread as a name-type pair: it pulled out "A" as the Name column entry and split the multi-line description across three table rows. Moving the return-value sentence into the prose summary at the top of the docstring (matching the pattern OpenAIProvider.complete already uses) renders cleanly: no spurious Name column entry, single description block.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
tests/unit/test_structured_output.py:472
- This test also leaves the
OpenAIProvider's underlyinghttpx.AsyncClientunclosed. Make the test close the provider after the assertion to keep the unit suite free of leaked async-client resources.
def test_inspect_property_fallback_when_forced() -> None:
provider = OpenAIProvider(
base_url="http://mock-llm.test",
model="test-model",
api_key="test-key",
force_prompt_augmentation_fallback=True,
)
assert provider.uses_prompt_augmentation_fallback is True
Addresses 19 review threads from the second CoPilot pass; about half
were duplicates of the same underlying issue:
- examples/00-hello-world/main.py + README hello-world: api_key now
uses `os.environ.get("LLM_API_KEY") or None` so an exported-but-
empty env var falls through to no-auth (matters for local servers
that reject an empty bearer header).
- Both examples now close the OpenAIProvider in the finally block
alongside graph.drain(). Long-running consumers that copy the
snippet had been leaking the underlying httpx.AsyncClient.
- errors.py header dropped the hard-coded "seven canonical
categories" count after StructuredOutputInvalid landed.
- strict_mode_supported docstring and the surrounding spec-anchor
comment block both updated to match the implementation:
additionalProperties must be EXPLICITLY false (an omitted key
counts as non-strict, since JSON Schema's default permits extras).
- _resolve_ref now handles ref == "#" as the document root before
rejecting external refs. Root-recursive schemas that use the bare
JSON-Pointer-root form now resolve correctly. Unit test added.
- _strict_mode_check tightened to return False on unrecognized
shapes (empty {}, const-only, enum-only, unknown keywords) instead
of falling through to True. Primitive types (string/integer/
number/boolean/null) classified as terminal-strict-compatible.
Two unit tests added.
- _build_request_body now explicitly strips response_format from the
body when the provider is in fallback mode. RuntimeConfig is
extra="allow", so a caller could have piped response_format
through the extras loop past the include_response_format gate.
- provider.py module docstring's summary signature line updated to
match the Protocol's response_schema parameter.
- validate_response_schema's spec-anchor comment updated to reflect
that JSON Schema validity is now checked at the boundary via
Draft202012Validator.check_schema(), not delegated to parse time.
- test_pydantic_class_wire_body_matches_dict_form: widened the
assertion from response_format-only to full body equality, so any
regression in the class-input wire mapping (not just
response_format) gets caught.
- test_inspect_property_native_default and
test_inspect_property_fallback_when_forced converted to async
with try/finally + aclose() to match the rest of the file's
provider-lifecycle pattern.
Addresses 5 remaining review threads (3 substantive, 2 stale on already-fixed code): - LlmProviderResponseAssertion (the typed assertion model in harness/expectations.py) now lists `parsed: Any | None`. The runtime assertion in test_llm_provider.py already handled it, but the typed parser had it under extra="forbid" and would have rejected any future case-shape LLM fixture using `parsed`. The 021-028 fixtures slip past today on `calls:` form's permissive `LlmCallSpec.expected: dict[str, Any]`; this lines the two paths up. - docs/model-providers/authoring.md skeleton comment tightened: removed the "ignore it and return free-form text" option from the response_schema guidance. A provider that silently drops the parameter violates the Protocol contract; callers expect either Response.parsed populated or StructuredOutputInvalid raised. Now only two valid options surfaced: raise ProviderInvalidRequest until implemented, or wire it through. - docs/concepts/llms.md softened the static-typing claim in the Pydantic-class form section. Response.parsed is `dict[str, Any] | BaseModel | None`, so a type checker won't narrow from `response_schema=Classification` alone. The page now separates the runtime guarantee (validated instance) from static access (requires cast/isinstance/typed assignment); generic Response[T] flagged as a follow-up. The two stale threads (examples/00-hello-world/main.py provider cleanup, test_structured_output.py provider cleanup) were already fixed in commit 8ed334c; replies sent + threads resolved without code changes.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (2)
tests/unit/test_structured_output.py:150
- The root object is missing
additionalProperties: false, so this assertion would still pass even if theanyOfwalker were removed or broken. Make the root object strict-compatible and keep the failing condition only inside theanyOfbranch to ensure this test covers the intended combinator behavior.
schema = {
"type": "object",
"properties": {
"x": {
"anyOf": [
{"type": "string"},
{"type": "object", "properties": {"y": {"type": "string"}}}, # no required
]
},
},
"required": ["x"],
}
tests/unit/test_structured_output.py:177
- This test is meant to prove external
$reftargets make strict mode unsupported, but the root schema already fails because it lacksadditionalProperties: false. Add the root strict fields here so the only failing condition is the unresolvable reference.
schema = {
"type": "object",
"properties": {"x": {"$ref": "https://example.com/external-schema.json"}},
"required": ["x"],
}
Addresses 6 review threads, several of which surfaced second-order
issues from previous rounds:
- openai.py complete(): the fallback flag was driving
include_response_format=False for every call, including free-form
ones. That triggered the response_format strip on calls that
weren't structured-output at all, clobbering caller-supplied
RuntimeConfig extras. Gating the flag on schema_dict being set so
free-form calls preserve extras. Unit test added.
- src/openarmature/__init__.py + tests/test_smoke.py: bumped
__spec_version__ from "0.10.0" to "0.15.0" to match the
pyproject.toml [tool.openarmature].spec_version bump. AGENTS.md
flags these three values as required to stay in sync; the
submodule-bump commit missed the runtime sources.
- _strict_mode_check array branch: {"type": "array"} without
`items` no longer returns True. Unconstrained array content is
the array analog of an object with no additionalProperties: false:
the walker can't statically verify nested shapes, so strict mode
rejects. Unit test added.
- docs/model-providers/authoring.md: skeleton's complete() now
actually enforces what its comment promised. Added
`if response_schema is not None: raise ProviderInvalidRequest`
to the body and surfaced the exception in the import list, so a
provider copied from the skeleton can't silently violate the
Protocol contract.
- docs/concepts/llms.md Pydantic-class snippet: added
`from typing import Literal` so the example is copy-paste-
runnable (the snippet uses Literal in the class but only imported
BaseModel).
- tests/unit/test_structured_output.py nested-recursion tests:
test_strict_mode_recurses_into_nested_object and
test_strict_mode_anyof_branch_must_satisfy were short-circuiting
at the root because the root schema itself failed strict rules.
Tightened both root schemas so the recursive walk actually fires;
the tests now guard the recursion they claim to.
Captures two follow-ups surfaced by the four CoPilot review rounds: - docs/concepts/llms.md "Strict mode" section expanded into the full constraint list. After four rounds of tightening the strict_mode_supported heuristic, the rule set is stable and the user-facing surface should list it directly rather than make callers read provider.py. The page frames the list as the authoritative set: anything not on it trips to non-strict. - docs/model-providers/index.md "Strict mode" subsection trimmed and now links into the concepts page for the full list, following the established split (concepts/ owns the deep-dive, model-providers/ stays terse). - tests/test_smoke.py adds test_spec_version_matches_pyproject: reads pyproject.toml's [tool.openarmature].spec_version and asserts it equals openarmature.__spec_version__. AGENTS.md flags these as required to stay in sync; the previous smoke test only checked internal consistency between __spec_version__ and its asserted value, so the pyproject side could drift silently (and did, in the original submodule-bump commit).
Addresses 4 review threads:
- examples/00-hello-world/main.py: provider construction moved
from module level to a lazy _get_provider() helper backed by a
module global. Avoids opening an httpx.AsyncClient when tooling
imports the module without running main() — the smoke test now
doesn't trigger construction across 6 example loads. main()'s
finally only closes when the cached instance is set.
- src/openarmature/llm/provider.py: validate_response_schema now
walks all $ref values via _check_refs_resolvable and raises
ProviderInvalidRequest for any non-internal-resolvable ref.
Draft202012Validator.check_schema doesn't traverse refs, so
previously an external ref slipped past the boundary and
surfaced as a raw referencing-library exception at validate
time. Pre-validation surfaces the clean category at the API
boundary.
- src/openarmature/llm/providers/openai.py: _parse_and_validate
now also catches jsonschema.SchemaError and maps it to
StructuredOutputInvalid. Safety net for any schema-side
exception (including ref-resolution failures) that pre-
validation might miss.
- tests/unit/test_structured_output.py:
- test_strict_mode_unresolvable_ref_fails: root tightened with
additionalProperties: false so the walk reaches the $ref
branch (was short-circuiting at the root).
- Added test_validate_response_schema_rejects_external_ref
covering the new pre-validation path.
- tests/test_smoke.py: added test_spec_version_matches_submodule_pin
shelling to `git -C openarmature-spec describe --tags
--exact-match HEAD` and asserting it equals
v{__spec_version__}. Skips cleanly when the submodule isn't a
git checkout (installed-package CI lanes). Completes the
three-place drift check from AGENTS.md
(__spec_version__ ↔ pyproject ↔ submodule pin).
The git-describe-based submodule check from the previous commit passed locally but failed in CI because actions/checkout pins the submodule to its recorded SHA without fetching the spec repo's tags. `git describe --tags --exact-match` then finds nothing and the test fails with "submodule HEAD is not at any tag." Switching to parsing openarmature-spec/CHANGELOG.md: the spec follows Keep a Changelog, so the first non-[Unreleased] `## [X.Y.Z]` heading is the version at the pinned commit. This works regardless of CI tag-fetch state and catches the same drift class (submodule moved to a different release). Skips cleanly when CHANGELOG.md isn't present (installed-package lanes that don't ship the submodule checkout).
CodeQL flagged the for/else: pytest.fail() pattern as a potentially-uninitialized-local-variable warning because it doesn't model pytest.fail as NoReturn — the analyzer sees a path where submodule_latest is referenced after the loop without ever being bound. Pulling the parse into _read_latest_spec_version_from_changelog that explicitly returns the version or raises AssertionError. Eliminates the unreachable-after-fail pattern and reads cleaner.
Six second-order correctness fixes surfaced by the round-7 review, mostly hardening _resolve_ref, _check_refs_resolvable, and the Pydantic-class validation path. - _resolve_ref now distinguishes "unresolvable" (path doesn't exist / external ref) from "resolved to non-dict" via a module-level _UNRESOLVABLE sentinel. Boolean schemas (true/false) are valid JSON Schema subschemas; a $ref to one was being incorrectly rejected as ProviderInvalidRequest. Now resolves cleanly and strict-mode still returns False on bool targets (the correct conservative answer). - validate_response_schema's metaschema check now uses jsonschema.validators.validator_for(schema) instead of the hard-coded Draft 2020-12. A valid draft-07 schema (e.g. tuple- form items, common in tooling) was being rejected at the boundary but accepted at runtime. Boundary and runtime now agree. - _resolve_ref percent-decodes JSON Pointer tokens before applying the ~1 / ~0 unescape pair. Per RFC 6901 §6, a JSON Pointer in a URI fragment is percent-encoded; refs like #/$defs/Name%20With%20Spaces now resolve correctly. - _check_refs_resolvable now walks only known subschema-bearing keywords (properties, patternProperties, additionalProperties, items, prefixItems, contains, if/then/else, allOf/anyOf/oneOf/not, $defs/definitions, dependentSchemas, propertyNames, unevaluatedItems, unevaluatedProperties). A "$ref" key under data positions (default, const, enum, $comment, x-* extensions) is data, not a schema reference, and is no longer incorrectly resolved. - docs/concepts/llms.md "LLM calls are async IO inside a node" section reframed: module-level provider construction leaks the httpx.AsyncClient in tooling/test/docs-build imports. The page now documents application-startup / lifecycle-managed construction (lazy on-first-use plus aclose in finally / shutdown hook), matching the pattern the hello-world example was made lazy for. - _parse_and_validate's Pydantic-class path now runs jsonschema.validate against the generated JSON Schema BEFORE calling model_validate. Pydantic's default model_validate is coercive (accepts "30" for an int field), which diverged from the strict dict-schema path. Both paths now apply the same jsonschema check first; model_validate then constructs the typed instance. - jsonschema.ValidationError's failure description now includes exc.json_path (e.g. "$.age: '30' is not of type 'integer'"). The bare exc.message lost the field name, breaking caller diagnostics for the missing-field / wrong-type-at-path cases. Five new unit tests cover the bool-ref, draft-07, percent-encoded ref, ref-under-data, and Pydantic-coercion-rejection cases.
Consolidated release for the five-PR batch: - Structured output (proposal 0016, PR #42) - Image content blocks (proposal 0015, PR #44) - Prompt management (proposal 0017, PR #45) - State migration for checkpoints (proposal 0014, PR #46) - Parallel branches (proposal 0011, PR #47) Bumps: - ``pyproject.toml`` project.version: 0.5.0 → 0.6.0 - ``__version__`` in src/openarmature/__init__.py - ``uv.lock`` editable package version - ``tests/test_smoke.py`` version assertion Flips CHANGELOG ``[Unreleased]`` to ``[0.6.0] — 2026-05-16``, drops the release-gate Notes entry, and tightens the pre-1.0 MINOR note to list the two behavioral changes (retry-MW attempt-index propagation, CheckpointRecord.schema_version semantic shift) instead of the structured-output-specific note carried over from PR-1. Pinned spec stays at v0.16.1 (set in PR #47).
Summary
openarmature.llm:response_schemaparameter onProvider.complete(),Response.parsedfield,StructuredOutputInvalidnon-transient error category, OpenAI nativeresponse_formatwire path withstrict: trueheuristic, prompt-augmentation fallback, and a Pydantic-class overload (class-in →BaseModelinstance out).match_wire_bodywith"*"wildcards,assert_response_format_absent,assert_system_references_schema,assert_error_carries) land as capability-agnostic infrastructure undertests/conformance/harness/wire.pyso upcoming 0014 / 0015 / 0017 PRs reuse without refactoring.Release gate
This is PR-1 of a five-PR batch (0016 → 0015 → 0017 → 0014 → 0011). Do not tag a release until all five PRs land — the v0.15.0 submodule pin presumes the full batch will ship. The constraint is also recorded in the CHANGELOG
[Unreleased]Notes section.What's new
Provider.complete()response_schema: dict | type[BaseModel] | Noneparameter. Defaults toNone; the v0.4.0 free-form contract is preserved exactly.Response.parseddict | BaseModel | Nonefield. Populated whenresponse_schemais supplied and the model returned structured content; absent on tool-call responses regardless.StructuredOutputInvalidTRANSIENT_CATEGORIES. Carriesresponse_schema,raw_content,failure_description.validate_response_schema,strict_mode_supportedopenarmature.llm.provider. The strict-mode heuristic walksanyOf/oneOf/allOfbranches and follows$refwith cycle protection; unresolvable refs conservatively returnFalse.OpenAIProviderconstructorforce_prompt_augmentation_fallback: bool = Falseflag +uses_prompt_augmentation_fallbackread-only property. Switches structured-output calls to the fallback path for OpenAI-compatible servers that reject or silently ignoreresponse_format.pyproject.tomljsonschema>=4.0runtime dep;spec_versionbumped to 0.15.0.Commits
The PR is reviewable commit-by-commit. Each commit independently builds and passes its targeted subset.
chore: bump spec to v0.15.0; add jsonschema; skip deferred fixturesfeat(llm): add StructuredOutputInvalid error categoryfeat(llm): add Response.parsed fieldfeat(llm): validate_response_schema + strict_mode_supported helpersfeat(llm): Provider Protocol gains response_schema parameterfeat(llm/openai): native response_format wire path + Pydantic overloadfeat(llm/openai): prompt-augmentation fallback + inspect propertytest(conformance): capability-agnostic harness helpers for wire + carriestest: drive 0016 fixtures 021-028 + add structured-output unit testsdocs: changelog entry for proposal 0016 under [Unreleased]Test plan
uv run pytest tests/conformance/test_llm_provider.py— 16 pass, 12 skipped (0015 multimodal, lands in PR-2)uv run pytest tests/unit/test_structured_output.py— 25 passuv run pytest— 483 pass, 77 skipped, 0 faileduv run pyright— cleanuv run ruff check+uv run ruff format— cleanResponse.parsedverified end-to-end.Pre-1.0 SemVer
Additive change. Free-form callers (no
response_schema) see no behavior change — the new parameter defaults toNone, the wire body omitsresponse_format, andResponse.parsedremains absent.