Skip to content

feat(examples): close spec coverage gaps + add 09-tool-use#52

Merged
chris-colinsky merged 5 commits into
mainfrom
feature/examples-coverage-folds
May 18, 2026
Merged

feat(examples): close spec coverage gaps + add 09-tool-use#52
chris-colinsky merged 5 commits into
mainfrom
feature/examples-coverage-folds

Conversation

@chris-colinsky
Copy link
Copy Markdown
Member

Summary

Response to the spec coverage review (review-examples-coverage coord thread). Spec agent flagged one major gap (tool-calling) and a set of smaller gaps that fold naturally into existing examples. This PR closes them all.

Major gap closed

09-tool-use is a new example demonstrating the full tool-calling contract:

  • Tool definitions with JSON Schema parameters
  • complete(messages, tools=...)
  • Parsing response.message.tool_calls
  • Dispatching to local Python functions
  • Feeding results back as ToolMessage(tool_call_id=...)
  • The multi-turn loop as a graph cycle: call_llm → dispatch_tools → call_llm via a conditional edge

Use case is a lunar-mission assistant with two tools — lookup_mission (factual recall against a baked-in Apollo / Artemis record store) and compute_delta_v (Hohmann transfer arithmetic). Default question naturally exercises both. A MAX_TURNS=5 cap prevents runaway loops.

Folds across existing examples

File Coverage gap closed Change
00 hello-world RuntimeConfig Pass config=RuntimeConfig(temperature=0.0) to every complete() call. Makes the routing deterministic and surfaces the per-call sampling knob.
05 fan-out-with-retry error_policy="collect" + errors_field New instance_errors state field; error_policy param on build_graph with COLLECT_MODE env-var toggle in main().
05 fan-out-with-retry fan_out_config on NodeEvent New fan_out_config_observer that reads event.fan_out_config on the fan-out node's dispatch event and prints the resolved item_count / concurrency / error_policy.
06 parallel-branches branch_name on NodeEvent New branch_attribution_observer that reads event.branch_name and prints which branch each inner-node event came from.
07 multimodal-prompt PromptGroup Two prompts (caption-lunar-image + identify-mission); pre-rendered in main() and wrapped in with_active_prompt_group(group) so observers see a shared group_name on both LLM calls.
07 multimodal-prompt Composite prompt backends PromptManager(primary, fallback) wired with two FilesystemPromptBackend instances. Fallback path fires when the primary raises PromptStoreUnavailable (e.g., remote Langfuse off-line).
07 multimodal-prompt ImageSourceInline New _build_image_block helper switches between ImageSourceURL (default) and ImageSourceInline (when IMAGE_PATH env var is set) with media_type inferred from the file extension.
08 checkpointing-and-migration Multi-version migration chain Docstring note on migrate_v1_to_v2 explaining how a v3 schema + second migration would compose via BFS chain resolution + the chain-ambiguity error category. No code change.

Commits

  1. chore(examples): fold spec coverage gaps
  2. feat(examples): add 09-tool-use

Test plan

  • uv run pytest tests/test_examples_smoke.py -q — 10 example smoke tests green
  • uv run pyright examples/ — clean
  • uv run ruff check examples/ tests/ — clean
  • Manual: every changed example (00, 05, 06, 07, 09) runs end-to-end against a real LLM endpoint and produces the expected output shape

Coverage status after merge

Every coverage gap the spec agent flagged in 01-spec-coverage-gaps.md is now closed except the optional multi-version migration chain (covered by docstring note per the thread's "leave or extend" framing). The fold PR will trigger a re-engagement on the coord thread for spec sign-off.

Spec agent's coverage review (review-examples-coverage thread) flagged
one major gap (tool-calling, follow-on) and several smaller gaps that
fold naturally into existing examples. This commit closes the folds.

- 00 hello-world: pass ``RuntimeConfig(temperature=0.0)`` to every
  ``complete()`` call. Surfaces the per-call sampling knob and makes
  the demo's routing reproducible.
- 05 fan-out-with-retry: add ``error_policy`` param to ``build_graph``
  with a ``COLLECT_MODE`` env-var toggle in main(); new
  ``instance_errors`` state field for the ``errors_field`` collection.
  Add ``fan_out_config_observer`` that reads
  ``NodeEvent.fan_out_config`` on the fan-out node's dispatch event
  and prints the resolved item_count/concurrency/error_policy.
- 06 parallel-branches: add ``branch_attribution_observer`` that
  reads ``NodeEvent.branch_name`` on inner-node events and prints
  which branch each inner step came from. Outermost nodes (receive,
  enrich, present) have ``branch_name=None``.
- 07 multimodal-prompt: wire ``PromptManager`` with a primary +
  fallback ``FilesystemPromptBackend`` to demonstrate composite-
  backend setup. Add second prompt ``identify-mission.j2`` and a
  second node ``identify`` that uses the caption from the first
  node. Wrap the whole invoke in ``with_active_prompt_group(...)``
  so an observability ``group_name`` propagates onto both LLM calls'
  spans. New ``_build_image_block`` helper switches between
  ``ImageSourceURL`` (default) and ``ImageSourceInline`` (when
  ``IMAGE_PATH`` env var is set) with media_type inferred from the
  file extension.
- 08 checkpointing-and-migration: docstring note on
  ``migrate_v1_to_v2`` explaining how a v3 schema would compose via
  BFS chain resolution + the chain-ambiguity error category. No
  code change.

Tool-calling lands as a new example (09-tool-use) in the next
commit on this branch.
Closes the major gap from the spec coverage review: tool-calling
was absent from the demo set even though structured output (the
other half of complete()'s arity) ships in 00. 09 demonstrates
the full tool-calling contract — Tool definitions, complete with
tools, ToolCall parsing, dispatch, ToolMessage round-trip, and
the multi-turn loop.

Use case: a lunar-mission assistant with two tools.

- ``lookup_mission(name)``: reads a baked-in dict of
  Apollo / Artemis records (launch / splashdown dates, crew,
  outcome). Stand-in for a real fact lookup against a doc store.
- ``compute_delta_v(initial_altitude_km, final_altitude_km)``:
  Hohmann transfer arithmetic between two circular Earth orbits.
  Returns a JSON record with the two burns and the total.

Both Tool definitions use JSON Schema object parameters with
``required`` properties and ``additionalProperties=False``. The
default question naturally exercises both: a factual recall about
Apollo 13 plus a delta-v computation for a free-return-style
injection.

Graph shape: a three-node cycle with a conditional edge.

  call_llm → [route_after_llm]
                  ├── if assistant.tool_calls present → dispatch_tools
                  │     ├── parse each ToolCall
                  │     ├── invoke local function
                  │     ├── append ToolMessage(content, tool_call_id)
                  │     └── → call_llm  (cycle)
                  └── else → present → END

A ``MAX_TURNS=5`` hard cap on ``state.turn`` prevents runaway
loops if the model never settles to a plain ``finish_reason="stop"``
response.

Smoke test list grows to ten demos.
Copilot AI review requested due to automatic review settings May 18, 2026 20:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new tool-calling example and updates existing examples to cover additional spec-surface fields (runtime config, fan-out/branch observability, prompt groups, and prompt backend fallback behavior).

Changes:

  • Add examples/09-tool-use demonstrating the full tool-calling loop (tools schema, tool_calls dispatch, ToolMessage round-trip, graph cycle with turn cap).
  • Enhance existing examples to surface additional spec fields (e.g., RuntimeConfig, NodeEvent.branch_name, NodeEvent.fan_out_config, PromptGroup, inline image sources, composite prompt backends).
  • Extend example smoke test and examples index to include the new demo.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_examples_smoke.py Adds 09-tool-use to the example load/compile smoke suite.
examples/README.md Documents the new 09-tool-use example.
examples/09-tool-use/main.py New tool-calling assistant demo implemented as a graph cycle.
examples/08-checkpointing-and-migration/main.py Adds docstring guidance for multi-version migration chains.
examples/07-multimodal-prompt/prompts/production/identify-mission.j2 Adds a second prompt for mission identification.
examples/07-multimodal-prompt/prompts_fallback/production/caption-lunar-image.j2 Adds a fallback prompt variant for backend fallback demonstration.
examples/07-multimodal-prompt/main.py Adds PromptGroup usage, composite prompt backends, and inline image loading support; adds identify step.
examples/06-parallel-branches/main.py Adds an observer demonstrating NodeEvent.branch_name attribution.
examples/05-fan-out-with-retry/main.py Adds collect-mode error capture (errors_field) and an observer demonstrating NodeEvent.fan_out_config.
examples/00-hello-world/main.py Uses RuntimeConfig(temperature=0.0) for deterministic-ish example runs and to surface per-call config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/09-tool-use/main.py Outdated
Comment thread examples/05-fan-out-with-retry/main.py Outdated
Comment thread examples/07-multimodal-prompt/main.py
- 09 tool-use: handle ``ToolCall.arguments is None`` (provider-
  reported parse error path) and wrap dispatch in a try/except that
  catches KeyError/ValueError/TypeError from bad args. Errors surface
  as ToolMessage content so the model can retry or give up gracefully
  rather than crashing the graph.
- 07 multimodal-prompt: add ``prompts_fallback/production/
  identify-mission.j2`` so the fallback backend actually covers both
  prompts. Without it, a primary outage would let the caption call
  fall through but break the identify call with PromptNotFound.
  Update the in-code comment that previously said fallback shipped
  only one prompt.
- 05 fan-out-with-retry: fix BatchState docstring that referenced
  ``branch_errors`` (the parallel-branches-side name) — the actual
  field is ``instance_errors``.
Address the two non-trivial items from spec PR #52 review.

- 05 fan-out-with-retry: ``COLLECT_MODE`` previously flipped the
  fan-out's error_policy but the demo had no failure path to
  exercise the new branch — ``instance_errors`` always stayed empty.
  Add a sentinel-detection in ``summarize`` that raises
  ``ProviderUnavailable`` (transient category) on headlines containing
  ``[FORCE_FAIL]``. Under ``COLLECT_MODE=1``, main() prepends one
  sentinel headline so retry exhausts on that instance, the failure
  lands in ``instance_errors``, and the rest of the batch completes.
  Default (fail_fast) keeps the headline list clean so the happy path
  runs unchanged. Print loop now handles partial summaries / topics
  lists by aligning successes to original indices via the
  ``fan_out_index`` carried on each error record.
- 07 multimodal-prompt: PromptGroup used a placeholder-render pattern
  for the second prompt because the original design had the second
  call depend on the first's output. Spec flagged this as teaching
  the wrong PromptGroup mental model. Restructured to two INDEPENDENT
  analyses of the same image: ``describe-surface`` and
  ``describe-equipment``, both taking only ``mission`` as a variable.
  Both prompts render up front with real variables; PromptGroup
  contains two genuine PromptResults; no placeholder identity sneaks
  into the group's metadata. State renamed (``CaptionState`` →
  ``AnalysisState``, ``caption`` → ``surface_description``,
  ``identified_mission`` → ``equipment_description``); nodes renamed
  (``caption`` → ``describe_surface``, ``identify`` →
  ``describe_equipment``); prompt files renamed accordingly in both
  primary and fallback backends.
Copilot AI review requested due to automatic review settings May 18, 2026 20:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

examples/09-tool-use/main.py:175

  • compute_delta_v documents “384400 = lunar distance” as an altitude above Earth’s surface, but 384,400 km is the Moon’s mean distance from Earth’s center (altitude above the surface would be ~378,000 km). Consider adjusting the example constant/text (and the default question) so the units/frames are consistent with the function’s “altitude above surface” contract.
    """Hohmann transfer delta-v from initial_altitude_km to
    final_altitude_km, both above Earth's surface (so 0 = surface,
    300 = LEO, 384400 = lunar distance). Returns a JSON record with
    the two burns and the total."""

Comment thread examples/09-tool-use/main.py Outdated
Comment thread examples/00-hello-world/main.py Outdated
- 00 hello-world: docstring + module comment described
  temperature=0.0 as making the run "reproduce deterministically",
  which over-promises. LLM APIs don't guarantee strict determinism
  even at temp 0 (provider-side batching, GPU sampling heuristics,
  model-version drift). Reworded to "reduces sampling variance" and
  "as reproducible as the API allows" so the pedagogical point
  (RuntimeConfig is the tuning knob) lands without an inaccurate
  guarantee. ``_DETERMINISTIC`` variable name kept as a recognizable
  shorthand for the demo.
- 09 tool-use: docstring said the loop terminates when
  ``finish_reason="stop"``, but the route function actually checks
  whether the last AssistantMessage carries any ``tool_calls``.
  finish_reason isn't tracked in state. Reworded to match the
  implementation: "loop terminates when the assistant message has
  no tool_calls (the model is done requesting tools) or after a
  hard turn cap."
@chris-colinsky chris-colinsky merged commit 0b10e04 into main May 18, 2026
5 checks passed
@chris-colinsky chris-colinsky deleted the feature/examples-coverage-folds branch May 18, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants