feat(llm): structured output (proposal 0016) by chris-colinsky · Pull Request #42 · LunarCommand/openarmature-python

chris-colinsky · 2026-05-15T15:16:42Z

Summary

Implements spec proposal 0016 (LLM provider structured output) in openarmature.llm: response_schema parameter on Provider.complete(), Response.parsed field, StructuredOutputInvalid non-transient error category, OpenAI native response_format wire path with strict: true heuristic, prompt-augmentation fallback, and a Pydantic-class overload (class-in → BaseModel instance out).
Spec submodule bumps v0.10.0 → v0.15.0 under skip-ahead governance, covering the full 5-proposal batch. Fixtures from proposals 0011 / 0014 / 0015 / 0017 are marked deferred-skip in the conformance suite and unmark as each subsequent PR lands.
Conformance harness helpers (match_wire_body with "*" wildcards, assert_response_format_absent, assert_system_references_schema, assert_error_carries) land as capability-agnostic infrastructure under tests/conformance/harness/wire.py so upcoming 0014 / 0015 / 0017 PRs reuse without refactoring.

Release gate

This is PR-1 of a five-PR batch (0016 → 0015 → 0017 → 0014 → 0011). Do not tag a release until all five PRs land — the v0.15.0 submodule pin presumes the full batch will ship. The constraint is also recorded in the CHANGELOG [Unreleased] Notes section.

What's new

Surface	Change
`Provider.complete()`	New `response_schema: dict \| type[BaseModel] \| None` parameter. Defaults to `None`; the v0.4.0 free-form contract is preserved exactly.
`Response.parsed`	New `dict \| BaseModel \| None` field. Populated when `response_schema` is supplied and the model returned structured content; absent on tool-call responses regardless.
`StructuredOutputInvalid`	New error category. Non-transient by default — NOT in `TRANSIENT_CATEGORIES`. Carries `response_schema`, `raw_content`, `failure_description`.
`validate_response_schema`, `strict_mode_supported`	Provider-agnostic helpers in `openarmature.llm.provider`. The strict-mode heuristic walks `anyOf`/`oneOf`/`allOf` branches and follows `$ref` with cycle protection; unresolvable refs conservatively return `False`.
`OpenAIProvider` constructor	New `force_prompt_augmentation_fallback: bool = False` flag + `uses_prompt_augmentation_fallback` read-only property. Switches structured-output calls to the fallback path for OpenAI-compatible servers that reject or silently ignore `response_format`.
`pyproject.toml`	Adds `jsonschema>=4.0` runtime dep; `spec_version` bumped to 0.15.0.

Commits

The PR is reviewable commit-by-commit. Each commit independently builds and passes its targeted subset.

chore: bump spec to v0.15.0; add jsonschema; skip deferred fixtures
feat(llm): add StructuredOutputInvalid error category
feat(llm): add Response.parsed field
feat(llm): validate_response_schema + strict_mode_supported helpers
feat(llm): Provider Protocol gains response_schema parameter
feat(llm/openai): native response_format wire path + Pydantic overload
feat(llm/openai): prompt-augmentation fallback + inspect property
test(conformance): capability-agnostic harness helpers for wire + carries
test: drive 0016 fixtures 021-028 + add structured-output unit tests
docs: changelog entry for proposal 0016 under [Unreleased]

Test plan

uv run pytest tests/conformance/test_llm_provider.py — 16 pass, 12 skipped (0015 multimodal, lands in PR-2)
uv run pytest tests/unit/test_structured_output.py — 25 pass
uv run pytest — 483 pass, 77 skipped, 0 failed
uv run pyright — clean
uv run ruff check + uv run ruff format — clean
Manual: structured-output call against a live OpenAI-compatible endpoint (dict schema + Pydantic class) with Response.parsed verified end-to-end.

Pre-1.0 SemVer

Additive change. Free-form callers (no response_schema) see no behavior change — the new parameter defaults to None, the wire body omits response_format, and Response.parsed remains absent.

Spec submodule moves from v0.10.0 to v0.15.0 — covers the full 5-proposal batch (0011, 0014, 0015, 0016, 0017) in one bump per the skip-ahead governance principle. spec_version in pyproject.toml bumped to match. Adds jsonschema>=4.0 as a runtime dependency (used by the forthcoming structured-output validation path on the dict-schema side; Pydantic-class path uses its own validator). Adds skip markers to the conformance test files for fixtures whose runtime support lands in a later PR of the batch: - llm-provider 009-020 → 0015 multimodal (PR-2) - llm-provider 021-028 → 0016 structured output (this PR, wired up in a later commit) - pipeline-utilities 032-038 → 0011 parallel branches (PR-5) - pipeline-utilities 039-046 → 0014 state migration (PR-4) - graph-engine 021-observer-branch-name → 0011 parallel branches (PR-5) Skip markers also apply to test_fixture_parsing.py for the same set — the typed harness models in tests/conformance/harness/ don't yet know about the new directive shapes (state_migration, parallel branches state-schema variation, NodeEvent.branch_name); each deferring PR drops its own skip rows when it lands the harness work.

Adds the structured_output_invalid canonical category. Raised when a complete() call requested a response_schema and the provider's returned content could not be parsed as JSON OR did not validate against the schema. The exception carries response_schema, raw_content, and failure_description attributes for caller introspection. Non-transient by default — NOT added to TRANSIENT_CATEGORIES. The default RetryMiddleware classifier will not retry this category; callers wanting retry-on-validation-failure can include the category in a custom classifier's transient set.

Adds the parsed field to the Response record. Default None, populated by structured-output calls (response_schema set on complete() and the model returned structured content). The runtime type is a discriminated union over dict (when the caller passed a JSON-Schema dict) and BaseModel instance (when the caller passed a Pydantic class). Pydantic Response config now allows arbitrary types so a BaseModel instance can sit in the parsed slot. No public surface change for free-form callers — parsed defaults to None and remains None when response_schema is not supplied.

Adds two provider-agnostic helpers in openarmature.llm.provider used by structured-output Provider implementations: - validate_response_schema(schema) — pre-send structural check that the value is a dict and its top-level type is "object". Raises ProviderInvalidRequest on failure. - strict_mode_supported(schema) — whether the schema satisfies the strict-mode constraint set (additionalProperties not true, properties fully covered by required) across the full schema tree. Walks anyOf/oneOf/allOf branches and follows $ref targets with cycle protection. An unresolvable $ref or unknown shape returns False (conservative fail). Both are exported from openarmature.llm so OpenAI-compatible providers and any future Anthropic/Gemini provider share the same constraint heuristic.

Extends the Provider Protocol's complete() method signature to accept an optional response_schema parameter. Accepts either a JSON Schema dict or a Pydantic BaseModel subclass; the implementation converts the class form to a JSON Schema at the boundary. Free-form callers (response_schema=None or absent) see no behavior change — the parameter defaults to None and the v0.4.0 contract is preserved. OpenAIProvider's complete() still has the v0.4.0 signature; the next commit wires the response_schema parameter through it.

Threads response_schema through OpenAIProvider.complete() → _do_complete() → _parse_response(). Accepts either a JSON Schema dict OR a Pydantic BaseModel subclass; the latter is converted via model_json_schema() at the boundary. Native wire path: when response_schema is supplied, the request body includes response_format: { type: "json_schema", json_schema: { name, schema, strict } }. The name field comes from schema.title when non-empty, otherwise a deterministic sha256 hash of the schema. The strict flag is set per strict_mode_supported() — true only when the schema cleanly satisfies the constraints across the full tree. Post-receive: parses message.content as JSON, then validates against the schema. Dict-input path validates with jsonschema and returns a dict. BaseModel-class-input path validates with model.model_validate() and returns a BaseModel instance. Either way, JSON parse failure or schema validation failure raises StructuredOutputInvalid carrying the schema, raw content, and failure description. parsed is absent on tool-call responses regardless of whether response_schema was supplied (mutually exclusive paths). Free-form calls (response_schema=None) see no behavior change — body omits response_format, parsed stays None. The prompt-augmentation fallback path is the next commit.

Adds the prompt-augmentation fallback for OpenAI-compatible servers that don't implement response_format (older vLLM, some LM Studio releases, llama.cpp variants). Constructor: force_prompt_augmentation_fallback: bool = False When True, structured-output calls build the wire body by augmenting the message list with a system directive that includes the serialized JSON Schema, and omit response_format entirely. Native path is the default (False). Inspect property: uses_prompt_augmentation_fallback -> bool Read-only; lets callers verify which wire path is active without poking private state. _augment_messages_with_schema_directive returns a fresh list. When the first message is system, its content is extended with the schema directive (preserving caller intent); otherwise a new system message is prepended. The caller's original messages list is NOT mutated — Message instances are reused unchanged (immutable Pydantic models). Response parsing is unchanged from the native path: parse + validate post-receive raise StructuredOutputInvalid on failure. parsed is populated identically whether the wire took the native or fallback route.

…ries Adds tests/conformance/harness/wire.py with helpers used by structured- output and content-block fixtures (and any future capability fixtures that need the same shapes): - match_wire_body(actual, expected) — recursive deep-equal with "*" wildcard support for string slots. - assert_response_format_absent(body) — asserts the wire body has no response_format key. - assert_system_references_schema(body, schema) — asserts the first message in the body is a system message whose content contains the canonical-JSON form of the schema as a substring. - assert_error_carries(exc, carries) — introspects a raised exception's attributes against an expected_carries block; supports _present / _mentions / literal-equal forms; handles the raw_response_content → raw_content fixture-vs-impl naming alias. Extends test_llm_provider.py to drive these from the existing fixture loop: - response_schema is read from call_spec and threaded through provider.complete(). - expected_wire_request literal compare + expected_wire_request_checks sibling checks fire after each captured chat-completions request. - caller_messages_unmodified takes a model_dump snapshot pre-call and asserts byte-equality post-call. - expected.response.parsed is compared for equality. - expected.raises.carries is fed to assert_error_carries. - retry_middleware: block wraps the call in a default-classifier retry simulator (transient = TRANSIENT_CATEGORIES membership); the captured-request count provides provider_call_count. - mock_provider.capabilities.supports_native_response_format: false constructs the provider with force_prompt_augmentation_fallback=True. The 0016 structured-output fixtures (021–028) remain skipped at this commit. The next commit removes their skip markers.

Removes the deferred-fixture skip markers for the 8 structured-output conformance fixtures (021–028). All pass against the OpenAIProvider + harness extensions landed in earlier commits. Adds tests/unit/test_structured_output.py covering bits the conformance fixtures don't exercise directly: - validate_response_schema edge cases: non-dict, non-object top-level, missing type. - strict_mode_supported: required-coverage rule, additionalProperties true, nested-object violation, anyOf branch violation, internal $ref resolution, unresolvable $ref, $ref cycle (self-referential schema). - _derive_schema_name: title-when-present, hash-fallback, determinism, empty-title behavior. - _augment_messages_with_schema_directive: prepend-when-no-system, extend-existing-system, caller-list-not-mutated, serialized-schema- substring. - Pydantic-class overload: class-in returns validated BaseModel instance; pydantic ValidationError wraps in StructuredOutputInvalid; wire body produced from class equals wire body produced from the equivalent .model_json_schema() dict. - uses_prompt_augmentation_fallback inspect property: False by default, True when constructor flag is set.

Documents the structured-output surface added in this PR: the response_schema parameter, Response.parsed field, StructuredOutputInvalid error category, OpenAIProvider native + fallback wire paths, the provider-agnostic schema helpers, the capability-agnostic conformance harness extensions, and the jsonschema runtime dependency. Also records: - Spec pin bump 0.10.0 → 0.15.0 (skip-ahead governance) with per-proposal deferred-skip in the conformance suite until each PR lands. - Release gate: do not tag the consolidated release until all five PRs of the batch (0011, 0014, 0015, 0016, 0017) are merged.

Copilot

Pull request overview

This PR implements structured-output support for the LLM provider surface, including schema-aware completion requests, parsed responses, structured-output validation errors, OpenAI native response_format, prompt-augmentation fallback, and conformance/unit coverage for proposal 0016.

Changes:

Adds response_schema handling, Response.parsed, schema validation helpers, and StructuredOutputInvalid.
Extends OpenAIProvider with native structured-output requests and fallback prompt augmentation.
Adds conformance harness utilities, structured-output unit tests, dependency updates, and changelog/spec-version updates.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`uv.lock`	Locks `jsonschema` and transitive dependencies.
`tests/unit/test_structured_output.py`	Adds focused tests for schema validation, strict-mode heuristics, fallback helpers, and Pydantic parsing.
`tests/conformance/test_pipeline_utilities.py`	Defers later proposal fixtures in pipeline conformance.
`tests/conformance/test_llm_provider.py`	Adds structured-output fixture support, retry simulation, wire assertions, and deferred multimodal skips.
`tests/conformance/test_fixture_parsing.py`	Skips parser checks for deferred fixture shapes.
`tests/conformance/test_conformance.py`	Defers graph-engine fixture requiring later proposal support.
`tests/conformance/harness/wire.py`	Adds reusable wire-body and error-carry assertion helpers.
`tests/conformance/harness/__init__.py`	Exports new conformance wire helpers.
`src/openarmature/llm/response.py`	Adds `ParsedValue` and `Response.parsed`.
`src/openarmature/llm/providers/openai.py`	Implements structured-output request/response handling and fallback augmentation.
`src/openarmature/llm/provider.py`	Extends provider protocol and adds schema/strict-mode helpers.
`src/openarmature/llm/errors.py`	Adds `StructuredOutputInvalid` category and exception class.
`src/openarmature/llm/__init__.py`	Exports new structured-output APIs.
`pyproject.toml`	Adds `jsonschema` dependency and bumps spec version.
`CHANGELOG.md`	Documents structured-output feature and release gate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Addresses the 8 CoPilot review threads on the structured-output PR: - strict_mode_supported now requires additionalProperties to be EXPLICITLY false (not just missing-or-false). Missing implies the JSON Schema default of permitting extras, which OpenAI's strict mode rejects. Pydantic's .model_json_schema() omits the key by default, so the class-input path would have 400ed against OpenAI even with conformance fixtures passing. - _normalize_response_schema now raises ProviderInvalidRequest when the class form is not a BaseModel subclass, instead of letting AttributeError leak from model_json_schema. - validate_response_schema now runs jsonschema.Draft202012Validator .check_schema() at the boundary, wrapping SchemaError as ProviderInvalidRequest. Malformed schemas now fail at the API boundary instead of escaping at decode time. - _derive_schema_name now regex-checks the title against OpenAI's name constraint (^[a-zA-Z0-9_-]{1,64}$) and falls back to the hashed name when the title doesn't match. Sanitizing-in-place would silently mutate user intent; the hash is a more honest fallback. - Two comments claiming Message instances are immutable Pydantic models were updated. The models are not configured with frozen=True; the safety actually comes from the helpers not modifying them in place. - match_wire_body now fails on extra keys in actual. The previous permissive default defeated the point of expected_wire_request being a literal compare; partial assertions continue to live in the sibling expected_wire_request_checks block. - _iter_calls now propagates expected_wire_request, expected_wire_request_checks, response_schema, and retry_middleware from sibling-of-call into the call dict. Only expected was being copied before. Cases-form fixtures with case-level wire expectations were silently running without those assertions. The _iter_calls fix surfaced two pre-existing gaps in the harness's handling of cases-shape fixtures, fixed inline: - The harness was never wiring config from the call spec into provider.complete(); fixture 005's runtime_config_passthrough case was effectively a no-op. - OpenAIProvider was using json.dumps default formatting for tool_call.function.arguments (with spaces after colons), which doesn't match the canonical compact form OpenAI emits or the spec's fixture 005 expectations. Switched to compact form. New unit tests cover the missing-additionalProperties strict-mode case, the non-BaseModel class rejection, the malformed JSON Schema rejection, and the title-falls-back hash cases.

Replaces the no-LLM hello-world in README.md with a version that makes a real LLM call via OpenAIProvider and uses a Pydantic class as the response_schema. The resulting Response.parsed flows through state as a typed Classification instance and drives the conditional edge that routes between research and summarize. Defaults to OpenAI public API (gpt-4o-mini) with env-var config: LLM_BASE_URL, LLM_MODEL, LLM_API_KEY. A trailing line in the README calls out OpenRouter, vLLM, LM Studio, llama.cpp as drop-in swaps via base_url/model. The example also lands as a runnable file at examples/00-hello-world/main.py and is added to the smoke test suite. examples/README.md gets a corresponding entry.

Copilot

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

tests/unit/test_structured_output.py:470

This test also leaves the provider’s httpx.AsyncClient open. Please close the provider after the assertion (consistent with the other OpenAIProvider tests) so the test suite does not accumulate unclosed clients.

    provider = OpenAIProvider(
        base_url="http://mock-llm.test",
        model="test-model",
        api_key="test-key",
        force_prompt_augmentation_fallback=True,

Two bugs surfaced during live validation against OpenAI: - The default LLM_BASE_URL was https://api.openai.com/v1, but our OpenAIProvider's wire path posts to /v1/chat/completions itself. httpx URL join produced https://api.openai.com/v1/v1/chat/completions → 404. Convention is base_url = host root; impl adds /v1. Default now matches; doc-string + README comment make it explicit. - The observer trace fired on the OpenAIProvider LLM-span event (sentinel namespace, post_state=None) and crashed accessing .sources. Added a post_state is not None guard.

The hello-world's research and summarize nodes were returning hard-coded source lists. Replaces both with real provider.complete() calls that emit typed structured output, so the example demonstrates the value of a structured-output pipeline end-to-end instead of just the framework's plumbing. The example now exercises both response_schema forms in one demo: - classify and summarize use Pydantic classes (Classification, Summary); Response.parsed comes back as a validated instance. - research uses a raw JSON Schema dict; Response.parsed comes back as a plain dict. State gains two intermediate-artifact fields (research_plan, summary). Final output prints whichever branch fired, in addition to the existing sources/metadata. The reducer-policy story stays intact (last_write_wins on the LLM outputs, append on sources, merge on metadata). Live-validated against OpenAI gpt-4o-mini; both branches verified (structured class instance + structured dict on Response.parsed).

Copilot

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

tests/unit/test_structured_output.py:472

This test also creates an OpenAIProvider without closing it. Please close the provider after the assertion (or make the test async and use await provider.aclose()) to avoid unclosed-client resource warnings.

def test_inspect_property_fallback_when_forced() -> None:
    provider = OpenAIProvider(
        base_url="http://mock-llm.test",
        model="test-model",
        api_key="test-key",
        force_prompt_augmentation_fallback=True,
    )
    assert provider.uses_prompt_augmentation_fallback is True

Adds docs/concepts/llms.md covering how LLM calls fit into the graph model: LLM calls as async IO inside nodes, structured output (both response_schema forms + native/fallback wire paths + strict mode), routing on parsed fields, and errors at the LLM boundary. Nav entry added to mkdocs.yml's Concepts section; concepts/index.md TOC extended. Updates docs/model-providers/index.md: Protocol signature now shows the response_schema parameter; errors table adds StructuredOutputInvalid; new Structured output section walks through both response_schema forms, the native/fallback wire paths, and strict-mode constraints. Updates docs/model-providers/authoring.md: skeleton's complete() signature now matches the Protocol (response_schema parameter); a new "Structured output" entry in Beyond the skeleton points custom- provider authors at validate_response_schema and strict_mode_supported. mkdocs builds clean in strict mode; the runnable example in the new Structured output section is verified by tests/test_docs_examples.py.

The Returns block on Provider.complete started with "A :class:Response carrying ...", which mkdocstrings' Google-parser misread as a name-type pair: it pulled out "A" as the Name column entry and split the multi-line description across three table rows. Moving the return-value sentence into the prose summary at the top of the docstring (matching the pattern OpenAIProvider.complete already uses) renders cleanly: no spurious Name column entry, single description block.

Copilot

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

tests/unit/test_structured_output.py:472

This test also leaves the OpenAIProvider's underlying httpx.AsyncClient unclosed. Make the test close the provider after the assertion to keep the unit suite free of leaked async-client resources.

def test_inspect_property_fallback_when_forced() -> None:
    provider = OpenAIProvider(
        base_url="http://mock-llm.test",
        model="test-model",
        api_key="test-key",
        force_prompt_augmentation_fallback=True,
    )
    assert provider.uses_prompt_augmentation_fallback is True

Addresses 19 review threads from the second CoPilot pass; about half were duplicates of the same underlying issue: - examples/00-hello-world/main.py + README hello-world: api_key now uses `os.environ.get("LLM_API_KEY") or None` so an exported-but- empty env var falls through to no-auth (matters for local servers that reject an empty bearer header). - Both examples now close the OpenAIProvider in the finally block alongside graph.drain(). Long-running consumers that copy the snippet had been leaking the underlying httpx.AsyncClient. - errors.py header dropped the hard-coded "seven canonical categories" count after StructuredOutputInvalid landed. - strict_mode_supported docstring and the surrounding spec-anchor comment block both updated to match the implementation: additionalProperties must be EXPLICITLY false (an omitted key counts as non-strict, since JSON Schema's default permits extras). - _resolve_ref now handles ref == "#" as the document root before rejecting external refs. Root-recursive schemas that use the bare JSON-Pointer-root form now resolve correctly. Unit test added. - _strict_mode_check tightened to return False on unrecognized shapes (empty {}, const-only, enum-only, unknown keywords) instead of falling through to True. Primitive types (string/integer/ number/boolean/null) classified as terminal-strict-compatible. Two unit tests added. - _build_request_body now explicitly strips response_format from the body when the provider is in fallback mode. RuntimeConfig is extra="allow", so a caller could have piped response_format through the extras loop past the include_response_format gate. - provider.py module docstring's summary signature line updated to match the Protocol's response_schema parameter. - validate_response_schema's spec-anchor comment updated to reflect that JSON Schema validity is now checked at the boundary via Draft202012Validator.check_schema(), not delegated to parse time. - test_pydantic_class_wire_body_matches_dict_form: widened the assertion from response_format-only to full body equality, so any regression in the class-input wire mapping (not just response_format) gets caught. - test_inspect_property_native_default and test_inspect_property_fallback_when_forced converted to async with try/finally + aclose() to match the rest of the file's provider-lifecycle pattern.

Addresses 5 remaining review threads (3 substantive, 2 stale on already-fixed code): - LlmProviderResponseAssertion (the typed assertion model in harness/expectations.py) now lists `parsed: Any | None`. The runtime assertion in test_llm_provider.py already handled it, but the typed parser had it under extra="forbid" and would have rejected any future case-shape LLM fixture using `parsed`. The 021-028 fixtures slip past today on `calls:` form's permissive `LlmCallSpec.expected: dict[str, Any]`; this lines the two paths up. - docs/model-providers/authoring.md skeleton comment tightened: removed the "ignore it and return free-form text" option from the response_schema guidance. A provider that silently drops the parameter violates the Protocol contract; callers expect either Response.parsed populated or StructuredOutputInvalid raised. Now only two valid options surfaced: raise ProviderInvalidRequest until implemented, or wire it through. - docs/concepts/llms.md softened the static-typing claim in the Pydantic-class form section. Response.parsed is `dict[str, Any] | BaseModel | None`, so a type checker won't narrow from `response_schema=Classification` alone. The page now separates the runtime guarantee (validated instance) from static access (requires cast/isinstance/typed assignment); generic Response[T] flagged as a follow-up. The two stale threads (examples/00-hello-world/main.py provider cleanup, test_structured_output.py provider cleanup) were already fixed in commit 8ed334c; replies sent + threads resolved without code changes.

Copilot

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (2)

tests/unit/test_structured_output.py:150

The root object is missing additionalProperties: false, so this assertion would still pass even if the anyOf walker were removed or broken. Make the root object strict-compatible and keep the failing condition only inside the anyOf branch to ensure this test covers the intended combinator behavior.

    schema = {
        "type": "object",
        "properties": {
            "x": {
                "anyOf": [
                    {"type": "string"},
                    {"type": "object", "properties": {"y": {"type": "string"}}},  # no required
                ]
            },
        },
        "required": ["x"],
    }

tests/unit/test_structured_output.py:177

This test is meant to prove external $ref targets make strict mode unsupported, but the root schema already fails because it lacks additionalProperties: false. Add the root strict fields here so the only failing condition is the unresolvable reference.

    schema = {
        "type": "object",
        "properties": {"x": {"$ref": "https://example.com/external-schema.json"}},
        "required": ["x"],
    }

Addresses 6 review threads, several of which surfaced second-order issues from previous rounds: - openai.py complete(): the fallback flag was driving include_response_format=False for every call, including free-form ones. That triggered the response_format strip on calls that weren't structured-output at all, clobbering caller-supplied RuntimeConfig extras. Gating the flag on schema_dict being set so free-form calls preserve extras. Unit test added. - src/openarmature/__init__.py + tests/test_smoke.py: bumped __spec_version__ from "0.10.0" to "0.15.0" to match the pyproject.toml [tool.openarmature].spec_version bump. AGENTS.md flags these three values as required to stay in sync; the submodule-bump commit missed the runtime sources. - _strict_mode_check array branch: {"type": "array"} without `items` no longer returns True. Unconstrained array content is the array analog of an object with no additionalProperties: false: the walker can't statically verify nested shapes, so strict mode rejects. Unit test added. - docs/model-providers/authoring.md: skeleton's complete() now actually enforces what its comment promised. Added `if response_schema is not None: raise ProviderInvalidRequest` to the body and surfaced the exception in the import list, so a provider copied from the skeleton can't silently violate the Protocol contract. - docs/concepts/llms.md Pydantic-class snippet: added `from typing import Literal` so the example is copy-paste- runnable (the snippet uses Literal in the class but only imported BaseModel). - tests/unit/test_structured_output.py nested-recursion tests: test_strict_mode_recurses_into_nested_object and test_strict_mode_anyof_branch_must_satisfy were short-circuiting at the root because the root schema itself failed strict rules. Tightened both root schemas so the recursive walk actually fires; the tests now guard the recursion they claim to.

Captures two follow-ups surfaced by the four CoPilot review rounds: - docs/concepts/llms.md "Strict mode" section expanded into the full constraint list. After four rounds of tightening the strict_mode_supported heuristic, the rule set is stable and the user-facing surface should list it directly rather than make callers read provider.py. The page frames the list as the authoritative set: anything not on it trips to non-strict. - docs/model-providers/index.md "Strict mode" subsection trimmed and now links into the concepts page for the full list, following the established split (concepts/ owns the deep-dive, model-providers/ stays terse). - tests/test_smoke.py adds test_spec_version_matches_pyproject: reads pyproject.toml's [tool.openarmature].spec_version and asserts it equals openarmature.__spec_version__. AGENTS.md flags these as required to stay in sync; the previous smoke test only checked internal consistency between __spec_version__ and its asserted value, so the pyproject side could drift silently (and did, in the original submodule-bump commit).

Copilot

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.

Addresses 4 review threads: - examples/00-hello-world/main.py: provider construction moved from module level to a lazy _get_provider() helper backed by a module global. Avoids opening an httpx.AsyncClient when tooling imports the module without running main() — the smoke test now doesn't trigger construction across 6 example loads. main()'s finally only closes when the cached instance is set. - src/openarmature/llm/provider.py: validate_response_schema now walks all $ref values via _check_refs_resolvable and raises ProviderInvalidRequest for any non-internal-resolvable ref. Draft202012Validator.check_schema doesn't traverse refs, so previously an external ref slipped past the boundary and surfaced as a raw referencing-library exception at validate time. Pre-validation surfaces the clean category at the API boundary. - src/openarmature/llm/providers/openai.py: _parse_and_validate now also catches jsonschema.SchemaError and maps it to StructuredOutputInvalid. Safety net for any schema-side exception (including ref-resolution failures) that pre- validation might miss. - tests/unit/test_structured_output.py: - test_strict_mode_unresolvable_ref_fails: root tightened with additionalProperties: false so the walk reaches the $ref branch (was short-circuiting at the root). - Added test_validate_response_schema_rejects_external_ref covering the new pre-validation path. - tests/test_smoke.py: added test_spec_version_matches_submodule_pin shelling to `git -C openarmature-spec describe --tags --exact-match HEAD` and asserting it equals v{__spec_version__}. Skips cleanly when the submodule isn't a git checkout (installed-package CI lanes). Completes the three-place drift check from AGENTS.md (__spec_version__ ↔ pyproject ↔ submodule pin).

The git-describe-based submodule check from the previous commit passed locally but failed in CI because actions/checkout pins the submodule to its recorded SHA without fetching the spec repo's tags. `git describe --tags --exact-match` then finds nothing and the test fails with "submodule HEAD is not at any tag." Switching to parsing openarmature-spec/CHANGELOG.md: the spec follows Keep a Changelog, so the first non-[Unreleased] `## [X.Y.Z]` heading is the version at the pinned commit. This works regardless of CI tag-fetch state and catches the same drift class (submodule moved to a different release). Skips cleanly when CHANGELOG.md isn't present (installed-package lanes that don't ship the submodule checkout).

CodeQL flagged the for/else: pytest.fail() pattern as a potentially-uninitialized-local-variable warning because it doesn't model pytest.fail as NoReturn — the analyzer sees a path where submodule_latest is referenced after the loop without ever being bound. Pulling the parse into _read_latest_spec_version_from_changelog that explicitly returns the version or raises AssertionError. Eliminates the unreachable-after-fail pattern and reads cleaner.

Copilot

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 6 comments.

Six second-order correctness fixes surfaced by the round-7 review, mostly hardening _resolve_ref, _check_refs_resolvable, and the Pydantic-class validation path. - _resolve_ref now distinguishes "unresolvable" (path doesn't exist / external ref) from "resolved to non-dict" via a module-level _UNRESOLVABLE sentinel. Boolean schemas (true/false) are valid JSON Schema subschemas; a $ref to one was being incorrectly rejected as ProviderInvalidRequest. Now resolves cleanly and strict-mode still returns False on bool targets (the correct conservative answer). - validate_response_schema's metaschema check now uses jsonschema.validators.validator_for(schema) instead of the hard-coded Draft 2020-12. A valid draft-07 schema (e.g. tuple- form items, common in tooling) was being rejected at the boundary but accepted at runtime. Boundary and runtime now agree. - _resolve_ref percent-decodes JSON Pointer tokens before applying the ~1 / ~0 unescape pair. Per RFC 6901 §6, a JSON Pointer in a URI fragment is percent-encoded; refs like #/$defs/Name%20With%20Spaces now resolve correctly. - _check_refs_resolvable now walks only known subschema-bearing keywords (properties, patternProperties, additionalProperties, items, prefixItems, contains, if/then/else, allOf/anyOf/oneOf/not, $defs/definitions, dependentSchemas, propertyNames, unevaluatedItems, unevaluatedProperties). A "$ref" key under data positions (default, const, enum, $comment, x-* extensions) is data, not a schema reference, and is no longer incorrectly resolved. - docs/concepts/llms.md "LLM calls are async IO inside a node" section reframed: module-level provider construction leaks the httpx.AsyncClient in tooling/test/docs-build imports. The page now documents application-startup / lifecycle-managed construction (lazy on-first-use plus aclose in finally / shutdown hook), matching the pattern the hello-world example was made lazy for. - _parse_and_validate's Pydantic-class path now runs jsonschema.validate against the generated JSON Schema BEFORE calling model_validate. Pydantic's default model_validate is coercive (accepts "30" for an int field), which diverged from the strict dict-schema path. Both paths now apply the same jsonschema check first; model_validate then constructs the typed instance. - jsonschema.ValidationError's failure description now includes exc.json_path (e.g. "$.age: '30' is not of type 'integer'"). The bare exc.message lost the field name, breaking caller diagnostics for the missing-field / wrong-type-at-path cases. Five new unit tests cover the bool-ref, draft-07, percent-encoded ref, ref-under-data, and Pydantic-coercion-rejection cases.

Consolidated release for the five-PR batch: - Structured output (proposal 0016, PR #42) - Image content blocks (proposal 0015, PR #44) - Prompt management (proposal 0017, PR #45) - State migration for checkpoints (proposal 0014, PR #46) - Parallel branches (proposal 0011, PR #47) Bumps: - ``pyproject.toml`` project.version: 0.5.0 → 0.6.0 - ``__version__`` in src/openarmature/__init__.py - ``uv.lock`` editable package version - ``tests/test_smoke.py`` version assertion Flips CHANGELOG ``[Unreleased]`` to ``[0.6.0] — 2026-05-16``, drops the release-gate Notes entry, and tightens the pre-1.0 MINOR note to list the two behavioral changes (retry-MW attempt-index propagation, CheckpointRecord.schema_version semantic shift) instead of the structured-output-specific note carried over from PR-1. Pinned spec stays at v0.16.1 (set in PR #47).

chris-colinsky added 10 commits May 14, 2026 19:01

Copilot AI review requested due to automatic review settings May 15, 2026 15:16

Copilot started reviewing on behalf of chris-colinsky May 15, 2026 15:17 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

chris-colinsky added 2 commits May 15, 2026 11:07

Copilot AI review requested due to automatic review settings May 15, 2026 18:16

Copilot started reviewing on behalf of chris-colinsky May 15, 2026 18:17 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

chris-colinsky added 2 commits May 15, 2026 11:28

Copilot AI review requested due to automatic review settings May 15, 2026 18:43

Copilot started reviewing on behalf of chris-colinsky May 15, 2026 18:44 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

chris-colinsky added 2 commits May 15, 2026 11:56

Copilot AI review requested due to automatic review settings May 15, 2026 19:04

Copilot started reviewing on behalf of chris-colinsky May 15, 2026 19:05 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread tests/conformance/test_llm_provider.py

Comment thread docs/model-providers/authoring.md Outdated

Comment thread examples/00-hello-world/main.py

Comment thread tests/unit/test_structured_output.py Outdated

Comment thread docs/concepts/llms.md Outdated

chris-colinsky added 2 commits May 15, 2026 13:21

Copilot AI review requested due to automatic review settings May 15, 2026 20:37

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread src/openarmature/llm/providers/openai.py Outdated

Comment thread docs/model-providers/authoring.md

Comment thread docs/concepts/llms.md

Comment thread src/openarmature/llm/provider.py

Comment thread pyproject.toml

Comment thread tests/unit/test_structured_output.py

chris-colinsky added 2 commits May 15, 2026 14:22

Copilot AI review requested due to automatic review settings May 15, 2026 21:36

Copilot started reviewing on behalf of chris-colinsky May 15, 2026 21:36 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread tests/test_examples_smoke.py

Comment thread src/openarmature/llm/providers/openai.py

Comment thread tests/unit/test_structured_output.py

Comment thread tests/test_smoke.py Outdated

github-advanced-security AI found potential problems May 15, 2026

View reviewed changes

Comment thread tests/test_smoke.py Fixed

Copilot AI review requested due to automatic review settings May 15, 2026 22:18

Copilot started reviewing on behalf of chris-colinsky May 15, 2026 22:19 View session

github-advanced-security AI found potential problems May 15, 2026

View reviewed changes

Comment thread tests/test_smoke.py Fixed

Copilot AI reviewed May 15, 2026

View reviewed changes

chris-colinsky merged commit 2ecb7b1 into main May 15, 2026
6 checks passed

chris-colinsky deleted the feature/0016-structured-output branch May 15, 2026 22:44

Conversation

chris-colinsky commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Release gate

What's new

Commits

Test plan

Pre-1.0 SemVer

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

chris-colinsky commented May 15, 2026 •

edited

Loading