feat(examples): close spec coverage gaps + add 09-tool-use by chris-colinsky · Pull Request #52 · LunarCommand/openarmature-python

chris-colinsky · 2026-05-18T20:16:41Z

Summary

Response to the spec coverage review (review-examples-coverage coord thread). Spec agent flagged one major gap (tool-calling) and a set of smaller gaps that fold naturally into existing examples. This PR closes them all.

Major gap closed

09-tool-use is a new example demonstrating the full tool-calling contract:

Tool definitions with JSON Schema parameters
complete(messages, tools=...)
Parsing response.message.tool_calls
Dispatching to local Python functions
Feeding results back as ToolMessage(tool_call_id=...)
The multi-turn loop as a graph cycle: call_llm → dispatch_tools → call_llm via a conditional edge

Use case is a lunar-mission assistant with two tools — lookup_mission (factual recall against a baked-in Apollo / Artemis record store) and compute_delta_v (Hohmann transfer arithmetic). Default question naturally exercises both. A MAX_TURNS=5 cap prevents runaway loops.

Folds across existing examples

File	Coverage gap closed	Change
00 hello-world	`RuntimeConfig`	Pass `config=RuntimeConfig(temperature=0.0)` to every `complete()` call. Makes the routing deterministic and surfaces the per-call sampling knob.
05 fan-out-with-retry	`error_policy="collect"` + `errors_field`	New `instance_errors` state field; `error_policy` param on `build_graph` with `COLLECT_MODE` env-var toggle in `main()`.
05 fan-out-with-retry	`fan_out_config` on `NodeEvent`	New `fan_out_config_observer` that reads `event.fan_out_config` on the fan-out node's dispatch event and prints the resolved item_count / concurrency / error_policy.
06 parallel-branches	`branch_name` on `NodeEvent`	New `branch_attribution_observer` that reads `event.branch_name` and prints which branch each inner-node event came from.
07 multimodal-prompt	`PromptGroup`	Two prompts (`caption-lunar-image` + `identify-mission`); pre-rendered in `main()` and wrapped in `with_active_prompt_group(group)` so observers see a shared `group_name` on both LLM calls.
07 multimodal-prompt	Composite prompt backends	`PromptManager(primary, fallback)` wired with two `FilesystemPromptBackend` instances. Fallback path fires when the primary raises `PromptStoreUnavailable` (e.g., remote Langfuse off-line).
07 multimodal-prompt	`ImageSourceInline`	New `_build_image_block` helper switches between `ImageSourceURL` (default) and `ImageSourceInline` (when `IMAGE_PATH` env var is set) with `media_type` inferred from the file extension.
08 checkpointing-and-migration	Multi-version migration chain	Docstring note on `migrate_v1_to_v2` explaining how a v3 schema + second migration would compose via BFS chain resolution + the chain-ambiguity error category. No code change.

Commits

chore(examples): fold spec coverage gaps
feat(examples): add 09-tool-use

Test plan

uv run pytest tests/test_examples_smoke.py -q — 10 example smoke tests green
uv run pyright examples/ — clean
uv run ruff check examples/ tests/ — clean
Manual: every changed example (00, 05, 06, 07, 09) runs end-to-end against a real LLM endpoint and produces the expected output shape

Coverage status after merge

Every coverage gap the spec agent flagged in 01-spec-coverage-gaps.md is now closed except the optional multi-version migration chain (covered by docstring note per the thread's "leave or extend" framing). The fold PR will trigger a re-engagement on the coord thread for spec sign-off.

Spec agent's coverage review (review-examples-coverage thread) flagged one major gap (tool-calling, follow-on) and several smaller gaps that fold naturally into existing examples. This commit closes the folds. - 00 hello-world: pass ``RuntimeConfig(temperature=0.0)`` to every ``complete()`` call. Surfaces the per-call sampling knob and makes the demo's routing reproducible. - 05 fan-out-with-retry: add ``error_policy`` param to ``build_graph`` with a ``COLLECT_MODE`` env-var toggle in main(); new ``instance_errors`` state field for the ``errors_field`` collection. Add ``fan_out_config_observer`` that reads ``NodeEvent.fan_out_config`` on the fan-out node's dispatch event and prints the resolved item_count/concurrency/error_policy. - 06 parallel-branches: add ``branch_attribution_observer`` that reads ``NodeEvent.branch_name`` on inner-node events and prints which branch each inner step came from. Outermost nodes (receive, enrich, present) have ``branch_name=None``. - 07 multimodal-prompt: wire ``PromptManager`` with a primary + fallback ``FilesystemPromptBackend`` to demonstrate composite- backend setup. Add second prompt ``identify-mission.j2`` and a second node ``identify`` that uses the caption from the first node. Wrap the whole invoke in ``with_active_prompt_group(...)`` so an observability ``group_name`` propagates onto both LLM calls' spans. New ``_build_image_block`` helper switches between ``ImageSourceURL`` (default) and ``ImageSourceInline`` (when ``IMAGE_PATH`` env var is set) with media_type inferred from the file extension. - 08 checkpointing-and-migration: docstring note on ``migrate_v1_to_v2`` explaining how a v3 schema would compose via BFS chain resolution + the chain-ambiguity error category. No code change. Tool-calling lands as a new example (09-tool-use) in the next commit on this branch.

Closes the major gap from the spec coverage review: tool-calling was absent from the demo set even though structured output (the other half of complete()'s arity) ships in 00. 09 demonstrates the full tool-calling contract — Tool definitions, complete with tools, ToolCall parsing, dispatch, ToolMessage round-trip, and the multi-turn loop. Use case: a lunar-mission assistant with two tools. - ``lookup_mission(name)``: reads a baked-in dict of Apollo / Artemis records (launch / splashdown dates, crew, outcome). Stand-in for a real fact lookup against a doc store. - ``compute_delta_v(initial_altitude_km, final_altitude_km)``: Hohmann transfer arithmetic between two circular Earth orbits. Returns a JSON record with the two burns and the total. Both Tool definitions use JSON Schema object parameters with ``required`` properties and ``additionalProperties=False``. The default question naturally exercises both: a factual recall about Apollo 13 plus a delta-v computation for a free-return-style injection. Graph shape: a three-node cycle with a conditional edge. call_llm → [route_after_llm] ├── if assistant.tool_calls present → dispatch_tools │ ├── parse each ToolCall │ ├── invoke local function │ ├── append ToolMessage(content, tool_call_id) │ └── → call_llm (cycle) └── else → present → END A ``MAX_TURNS=5`` hard cap on ``state.turn`` prevents runaway loops if the model never settles to a plain ``finish_reason="stop"`` response. Smoke test list grows to ten demos.

Copilot

Pull request overview

Adds a new tool-calling example and updates existing examples to cover additional spec-surface fields (runtime config, fan-out/branch observability, prompt groups, and prompt backend fallback behavior).

Changes:

Add examples/09-tool-use demonstrating the full tool-calling loop (tools schema, tool_calls dispatch, ToolMessage round-trip, graph cycle with turn cap).
Enhance existing examples to surface additional spec fields (e.g., RuntimeConfig, NodeEvent.branch_name, NodeEvent.fan_out_config, PromptGroup, inline image sources, composite prompt backends).
Extend example smoke test and examples index to include the new demo.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_examples_smoke.py	Adds `09-tool-use` to the example load/compile smoke suite.
examples/README.md	Documents the new `09-tool-use` example.
examples/09-tool-use/main.py	New tool-calling assistant demo implemented as a graph cycle.
examples/08-checkpointing-and-migration/main.py	Adds docstring guidance for multi-version migration chains.
examples/07-multimodal-prompt/prompts/production/identify-mission.j2	Adds a second prompt for mission identification.
examples/07-multimodal-prompt/prompts_fallback/production/caption-lunar-image.j2	Adds a fallback prompt variant for backend fallback demonstration.
examples/07-multimodal-prompt/main.py	Adds PromptGroup usage, composite prompt backends, and inline image loading support; adds identify step.
examples/06-parallel-branches/main.py	Adds an observer demonstrating `NodeEvent.branch_name` attribution.
examples/05-fan-out-with-retry/main.py	Adds collect-mode error capture (`errors_field`) and an observer demonstrating `NodeEvent.fan_out_config`.
examples/00-hello-world/main.py	Uses `RuntimeConfig(temperature=0.0)` for deterministic-ish example runs and to surface per-call config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- 09 tool-use: handle ``ToolCall.arguments is None`` (provider- reported parse error path) and wrap dispatch in a try/except that catches KeyError/ValueError/TypeError from bad args. Errors surface as ToolMessage content so the model can retry or give up gracefully rather than crashing the graph. - 07 multimodal-prompt: add ``prompts_fallback/production/ identify-mission.j2`` so the fallback backend actually covers both prompts. Without it, a primary outage would let the caption call fall through but break the identify call with PromptNotFound. Update the in-code comment that previously said fallback shipped only one prompt. - 05 fan-out-with-retry: fix BatchState docstring that referenced ``branch_errors`` (the parallel-branches-side name) — the actual field is ``instance_errors``.

Address the two non-trivial items from spec PR #52 review. - 05 fan-out-with-retry: ``COLLECT_MODE`` previously flipped the fan-out's error_policy but the demo had no failure path to exercise the new branch — ``instance_errors`` always stayed empty. Add a sentinel-detection in ``summarize`` that raises ``ProviderUnavailable`` (transient category) on headlines containing ``[FORCE_FAIL]``. Under ``COLLECT_MODE=1``, main() prepends one sentinel headline so retry exhausts on that instance, the failure lands in ``instance_errors``, and the rest of the batch completes. Default (fail_fast) keeps the headline list clean so the happy path runs unchanged. Print loop now handles partial summaries / topics lists by aligning successes to original indices via the ``fan_out_index`` carried on each error record. - 07 multimodal-prompt: PromptGroup used a placeholder-render pattern for the second prompt because the original design had the second call depend on the first's output. Spec flagged this as teaching the wrong PromptGroup mental model. Restructured to two INDEPENDENT analyses of the same image: ``describe-surface`` and ``describe-equipment``, both taking only ``mission`` as a variable. Both prompts render up front with real variables; PromptGroup contains two genuine PromptResults; no placeholder identity sneaks into the group's metadata. State renamed (``CaptionState`` → ``AnalysisState``, ``caption`` → ``surface_description``, ``identified_mission`` → ``equipment_description``); nodes renamed (``caption`` → ``describe_surface``, ``identify`` → ``describe_equipment``); prompt files renamed accordingly in both primary and fallback backends.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

examples/09-tool-use/main.py:175

compute_delta_v documents “384400 = lunar distance” as an altitude above Earth’s surface, but 384,400 km is the Moon’s mean distance from Earth’s center (altitude above the surface would be ~378,000 km). Consider adjusting the example constant/text (and the default question) so the units/frames are consistent with the function’s “altitude above surface” contract.

    """Hohmann transfer delta-v from initial_altitude_km to
    final_altitude_km, both above Earth's surface (so 0 = surface,
    300 = LEO, 384400 = lunar distance). Returns a JSON record with
    the two burns and the total."""

- 00 hello-world: docstring + module comment described temperature=0.0 as making the run "reproduce deterministically", which over-promises. LLM APIs don't guarantee strict determinism even at temp 0 (provider-side batching, GPU sampling heuristics, model-version drift). Reworded to "reduces sampling variance" and "as reproducible as the API allows" so the pedagogical point (RuntimeConfig is the tuning knob) lands without an inaccurate guarantee. ``_DETERMINISTIC`` variable name kept as a recognizable shorthand for the demo. - 09 tool-use: docstring said the loop terminates when ``finish_reason="stop"``, but the route function actually checks whether the last AssistantMessage carries any ``tool_calls``. finish_reason isn't tracked in state. Reworded to match the implementation: "loop terminates when the assistant message has no tool_calls (the model is done requesting tools) or after a hard turn cap."

chris-colinsky added 2 commits May 18, 2026 12:50

Copilot AI review requested due to automatic review settings May 18, 2026 20:16

Copilot started reviewing on behalf of chris-colinsky May 18, 2026 20:17 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread examples/09-tool-use/main.py Outdated

Comment thread examples/05-fan-out-with-retry/main.py Outdated

Comment thread examples/07-multimodal-prompt/main.py

chris-colinsky added 2 commits May 18, 2026 13:39

Copilot AI review requested due to automatic review settings May 18, 2026 20:57

Copilot started reviewing on behalf of chris-colinsky May 18, 2026 20:57 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread examples/09-tool-use/main.py Outdated

Comment thread examples/00-hello-world/main.py Outdated

chris-colinsky merged commit 0b10e04 into main May 18, 2026
5 checks passed

chris-colinsky deleted the feature/examples-coverage-folds branch May 18, 2026 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples): close spec coverage gaps + add 09-tool-use#52

feat(examples): close spec coverage gaps + add 09-tool-use#52
chris-colinsky merged 5 commits into
mainfrom
feature/examples-coverage-folds

chris-colinsky commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chris-colinsky commented May 18, 2026

Summary

Major gap closed

Folds across existing examples

Commits

Test plan

Coverage status after merge

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants