feat(examples): close spec coverage gaps + add 09-tool-use#52
Merged
Conversation
Spec agent's coverage review (review-examples-coverage thread) flagged one major gap (tool-calling, follow-on) and several smaller gaps that fold naturally into existing examples. This commit closes the folds. - 00 hello-world: pass ``RuntimeConfig(temperature=0.0)`` to every ``complete()`` call. Surfaces the per-call sampling knob and makes the demo's routing reproducible. - 05 fan-out-with-retry: add ``error_policy`` param to ``build_graph`` with a ``COLLECT_MODE`` env-var toggle in main(); new ``instance_errors`` state field for the ``errors_field`` collection. Add ``fan_out_config_observer`` that reads ``NodeEvent.fan_out_config`` on the fan-out node's dispatch event and prints the resolved item_count/concurrency/error_policy. - 06 parallel-branches: add ``branch_attribution_observer`` that reads ``NodeEvent.branch_name`` on inner-node events and prints which branch each inner step came from. Outermost nodes (receive, enrich, present) have ``branch_name=None``. - 07 multimodal-prompt: wire ``PromptManager`` with a primary + fallback ``FilesystemPromptBackend`` to demonstrate composite- backend setup. Add second prompt ``identify-mission.j2`` and a second node ``identify`` that uses the caption from the first node. Wrap the whole invoke in ``with_active_prompt_group(...)`` so an observability ``group_name`` propagates onto both LLM calls' spans. New ``_build_image_block`` helper switches between ``ImageSourceURL`` (default) and ``ImageSourceInline`` (when ``IMAGE_PATH`` env var is set) with media_type inferred from the file extension. - 08 checkpointing-and-migration: docstring note on ``migrate_v1_to_v2`` explaining how a v3 schema would compose via BFS chain resolution + the chain-ambiguity error category. No code change. Tool-calling lands as a new example (09-tool-use) in the next commit on this branch.
Closes the major gap from the spec coverage review: tool-calling
was absent from the demo set even though structured output (the
other half of complete()'s arity) ships in 00. 09 demonstrates
the full tool-calling contract — Tool definitions, complete with
tools, ToolCall parsing, dispatch, ToolMessage round-trip, and
the multi-turn loop.
Use case: a lunar-mission assistant with two tools.
- ``lookup_mission(name)``: reads a baked-in dict of
Apollo / Artemis records (launch / splashdown dates, crew,
outcome). Stand-in for a real fact lookup against a doc store.
- ``compute_delta_v(initial_altitude_km, final_altitude_km)``:
Hohmann transfer arithmetic between two circular Earth orbits.
Returns a JSON record with the two burns and the total.
Both Tool definitions use JSON Schema object parameters with
``required`` properties and ``additionalProperties=False``. The
default question naturally exercises both: a factual recall about
Apollo 13 plus a delta-v computation for a free-return-style
injection.
Graph shape: a three-node cycle with a conditional edge.
call_llm → [route_after_llm]
├── if assistant.tool_calls present → dispatch_tools
│ ├── parse each ToolCall
│ ├── invoke local function
│ ├── append ToolMessage(content, tool_call_id)
│ └── → call_llm (cycle)
└── else → present → END
A ``MAX_TURNS=5`` hard cap on ``state.turn`` prevents runaway
loops if the model never settles to a plain ``finish_reason="stop"``
response.
Smoke test list grows to ten demos.
There was a problem hiding this comment.
Pull request overview
Adds a new tool-calling example and updates existing examples to cover additional spec-surface fields (runtime config, fan-out/branch observability, prompt groups, and prompt backend fallback behavior).
Changes:
- Add
examples/09-tool-usedemonstrating the full tool-calling loop (tools schema, tool_calls dispatch, ToolMessage round-trip, graph cycle with turn cap). - Enhance existing examples to surface additional spec fields (e.g.,
RuntimeConfig,NodeEvent.branch_name,NodeEvent.fan_out_config,PromptGroup, inline image sources, composite prompt backends). - Extend example smoke test and examples index to include the new demo.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_examples_smoke.py | Adds 09-tool-use to the example load/compile smoke suite. |
| examples/README.md | Documents the new 09-tool-use example. |
| examples/09-tool-use/main.py | New tool-calling assistant demo implemented as a graph cycle. |
| examples/08-checkpointing-and-migration/main.py | Adds docstring guidance for multi-version migration chains. |
| examples/07-multimodal-prompt/prompts/production/identify-mission.j2 | Adds a second prompt for mission identification. |
| examples/07-multimodal-prompt/prompts_fallback/production/caption-lunar-image.j2 | Adds a fallback prompt variant for backend fallback demonstration. |
| examples/07-multimodal-prompt/main.py | Adds PromptGroup usage, composite prompt backends, and inline image loading support; adds identify step. |
| examples/06-parallel-branches/main.py | Adds an observer demonstrating NodeEvent.branch_name attribution. |
| examples/05-fan-out-with-retry/main.py | Adds collect-mode error capture (errors_field) and an observer demonstrating NodeEvent.fan_out_config. |
| examples/00-hello-world/main.py | Uses RuntimeConfig(temperature=0.0) for deterministic-ish example runs and to surface per-call config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- 09 tool-use: handle ``ToolCall.arguments is None`` (provider- reported parse error path) and wrap dispatch in a try/except that catches KeyError/ValueError/TypeError from bad args. Errors surface as ToolMessage content so the model can retry or give up gracefully rather than crashing the graph. - 07 multimodal-prompt: add ``prompts_fallback/production/ identify-mission.j2`` so the fallback backend actually covers both prompts. Without it, a primary outage would let the caption call fall through but break the identify call with PromptNotFound. Update the in-code comment that previously said fallback shipped only one prompt. - 05 fan-out-with-retry: fix BatchState docstring that referenced ``branch_errors`` (the parallel-branches-side name) — the actual field is ``instance_errors``.
Address the two non-trivial items from spec PR #52 review. - 05 fan-out-with-retry: ``COLLECT_MODE`` previously flipped the fan-out's error_policy but the demo had no failure path to exercise the new branch — ``instance_errors`` always stayed empty. Add a sentinel-detection in ``summarize`` that raises ``ProviderUnavailable`` (transient category) on headlines containing ``[FORCE_FAIL]``. Under ``COLLECT_MODE=1``, main() prepends one sentinel headline so retry exhausts on that instance, the failure lands in ``instance_errors``, and the rest of the batch completes. Default (fail_fast) keeps the headline list clean so the happy path runs unchanged. Print loop now handles partial summaries / topics lists by aligning successes to original indices via the ``fan_out_index`` carried on each error record. - 07 multimodal-prompt: PromptGroup used a placeholder-render pattern for the second prompt because the original design had the second call depend on the first's output. Spec flagged this as teaching the wrong PromptGroup mental model. Restructured to two INDEPENDENT analyses of the same image: ``describe-surface`` and ``describe-equipment``, both taking only ``mission`` as a variable. Both prompts render up front with real variables; PromptGroup contains two genuine PromptResults; no placeholder identity sneaks into the group's metadata. State renamed (``CaptionState`` → ``AnalysisState``, ``caption`` → ``surface_description``, ``identified_mission`` → ``equipment_description``); nodes renamed (``caption`` → ``describe_surface``, ``identify`` → ``describe_equipment``); prompt files renamed accordingly in both primary and fallback backends.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
examples/09-tool-use/main.py:175
compute_delta_vdocuments “384400 = lunar distance” as an altitude above Earth’s surface, but 384,400 km is the Moon’s mean distance from Earth’s center (altitude above the surface would be ~378,000 km). Consider adjusting the example constant/text (and the default question) so the units/frames are consistent with the function’s “altitude above surface” contract.
"""Hohmann transfer delta-v from initial_altitude_km to
final_altitude_km, both above Earth's surface (so 0 = surface,
300 = LEO, 384400 = lunar distance). Returns a JSON record with
the two burns and the total."""
- 00 hello-world: docstring + module comment described temperature=0.0 as making the run "reproduce deterministically", which over-promises. LLM APIs don't guarantee strict determinism even at temp 0 (provider-side batching, GPU sampling heuristics, model-version drift). Reworded to "reduces sampling variance" and "as reproducible as the API allows" so the pedagogical point (RuntimeConfig is the tuning knob) lands without an inaccurate guarantee. ``_DETERMINISTIC`` variable name kept as a recognizable shorthand for the demo. - 09 tool-use: docstring said the loop terminates when ``finish_reason="stop"``, but the route function actually checks whether the last AssistantMessage carries any ``tool_calls``. finish_reason isn't tracked in state. Reworded to match the implementation: "loop terminates when the assistant message has no tool_calls (the model is done requesting tools) or after a hard turn cap."
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Response to the spec coverage review (
review-examples-coveragecoord thread). Spec agent flagged one major gap (tool-calling) and a set of smaller gaps that fold naturally into existing examples. This PR closes them all.Major gap closed
09-tool-use is a new example demonstrating the full tool-calling contract:
Tooldefinitions with JSON Schema parameterscomplete(messages, tools=...)response.message.tool_callsToolMessage(tool_call_id=...)call_llm → dispatch_tools → call_llmvia a conditional edgeUse case is a lunar-mission assistant with two tools —
lookup_mission(factual recall against a baked-in Apollo / Artemis record store) andcompute_delta_v(Hohmann transfer arithmetic). Default question naturally exercises both. AMAX_TURNS=5cap prevents runaway loops.Folds across existing examples
RuntimeConfigconfig=RuntimeConfig(temperature=0.0)to everycomplete()call. Makes the routing deterministic and surfaces the per-call sampling knob.error_policy="collect"+errors_fieldinstance_errorsstate field;error_policyparam onbuild_graphwithCOLLECT_MODEenv-var toggle inmain().fan_out_configonNodeEventfan_out_config_observerthat readsevent.fan_out_configon the fan-out node's dispatch event and prints the resolved item_count / concurrency / error_policy.branch_nameonNodeEventbranch_attribution_observerthat readsevent.branch_nameand prints which branch each inner-node event came from.PromptGroupcaption-lunar-image+identify-mission); pre-rendered inmain()and wrapped inwith_active_prompt_group(group)so observers see a sharedgroup_nameon both LLM calls.PromptManager(primary, fallback)wired with twoFilesystemPromptBackendinstances. Fallback path fires when the primary raisesPromptStoreUnavailable(e.g., remote Langfuse off-line).ImageSourceInline_build_image_blockhelper switches betweenImageSourceURL(default) andImageSourceInline(whenIMAGE_PATHenv var is set) withmedia_typeinferred from the file extension.migrate_v1_to_v2explaining how a v3 schema + second migration would compose via BFS chain resolution + the chain-ambiguity error category. No code change.Commits
chore(examples): fold spec coverage gapsfeat(examples): add 09-tool-useTest plan
uv run pytest tests/test_examples_smoke.py -q— 10 example smoke tests greenuv run pyright examples/— cleanuv run ruff check examples/ tests/— cleanCoverage status after merge
Every coverage gap the spec agent flagged in
01-spec-coverage-gaps.mdis now closed except the optional multi-version migration chain (covered by docstring note per the thread's "leave or extend" framing). The fold PR will trigger a re-engagement on the coord thread for spec sign-off.