Add [langfuse] extras and SDK adapter (PR 3.6)#82
Merged
Conversation
Validates the Langfuse observer against the real langfuse>=4.6 SDK
and ships a bridge so production users get the same Protocol-shaped
observer surface as the InMemoryLangfuseClient.
[langfuse] optional-dependency group pins langfuse>=4.6,<5. The v4
SDK is structurally different from v2 / v3 — traces are auto-created
when the first observation starts, span and generation collapse into
start_observation with as_type, trace_id threads through
TraceContext, and trace-level metadata sets via propagate_attributes
context manager. Per Chris's directive ("no existing-user constraint;
do what's right for OA"), the adapter targets v4 only; earlier SDK
versions are out of scope.
LangfuseSDKAdapter wraps the v4 client to satisfy the four-method
LangfuseClient Protocol. Key translations:
- UUID4 invocation_id -> OTel-hex trace_id (32 chars, no dashes).
v4 fails int(uuid, 16) parsing on the dashed form; OA's observer
error-isolation pattern swallowed that as a warnings.warn,
silently dropping traces.
- propagate_attributes(trace_name=, metadata=) runs on EVERY
observation under each trace_id (not just the first). Without
this, v4's last-attribute-wins display logic let later
observations clobber the trace's display name to whatever the
final observation was called.
- usage values translate from the Protocol's LangfuseUsage record
to v4's usage_details dict (int values only).
- Returned LangfuseSpan / LangfuseGeneration handles wrap into
_SpanHandle to expose the .update() / .end() the observer calls.
Trace-info cache persists per trace_id rather than popping on first
observation. Memory is linear in unique trace_ids; a close_trace
cleanup hook is deferred to a future PR.
Tests:
- Five unit tests covering Protocol satisfaction, observer
construction, trace_info cache lifecycle, update_trace merge,
UUID4 -> OTel-hex conversion (with idempotency on already-hex and
non-UUID passthrough).
- One opt-in integration test against real Langfuse Cloud, gated by
@pytest.mark.integration + LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY
env vars. Calls auth_check() to fail loud on bad credentials,
client.shutdown() for synchronous batch-exporter drain. Accepts
LANGFUSE_HOST or LANGFUSE_BASE_URL for the host.
- New pytest marker config: addopts defaults to "-m not integration"
so CI auto-skips integration tests; run with -m integration to
include.
Docs:
- docs/concepts/observability.md flips the "no SDK version validated"
disclosure to the validated v4 state, shows the LangfuseSDKAdapter
wire-up snippet.
- docs/examples/10-langfuse-observability.md same.
- examples/10-langfuse-observability/main.py docstring + inline
comment updated with the production swap recipe.
- AGENTS.md regenerated.
Validated end-to-end against Langfuse Cloud (US region) with
langfuse 4.7.0: a two-node graph produces one Trace with
entry-node-name set as the display name, both nodes as Span
observations under it, and the spec §8.4.1 trace metadata
(correlation_id, entry_node, spec_version) populated.
Fifth of 6 core PRs in the v0.10.0 batch (PR 3.6).
There was a problem hiding this comment.
Pull request overview
Adds an optional Langfuse v4 integration path by pinning a [langfuse] extra and introducing a LangfuseSDKAdapter that maps the v4 SDK surface (start_observation, propagate_attributes, OTel trace IDs) onto OpenArmature’s LangfuseClient Protocol, so LangfuseObserver can be used consistently in tests (in-memory) and production (real SDK).
Changes:
- Add
[langfuse]optional dependency (pinned>=4.6,<5) and lockfile updates. - Implement
LangfuseSDKAdapterbridging Langfuse v4’s API to the 4-methodLangfuseClientProtocol (including UUID→OTel-hex trace_id conversion). - Add unit tests + an opt-in integration test, register an
integrationpytest marker, and update docs/examples to describe the production wire-up.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Locks new optional dependency set for langfuse and its transitive deps. |
| pyproject.toml | Adds [langfuse] extra and configures pytest integration marker + default skip. |
| src/openarmature/observability/langfuse/adapter.py | Implements LangfuseSDKAdapter + handle wrapper to satisfy LangfuseClient. |
| src/openarmature/observability/langfuse/init.py | Conditionally exports LangfuseSDKAdapter when the extra is installed. |
| tests/unit/test_observability_langfuse_adapter.py | Adds adapter unit tests and an opt-in Langfuse Cloud integration test. |
| examples/10-langfuse-observability/main.py | Updates example guidance to use LangfuseSDKAdapter for production. |
| docs/examples/10-langfuse-observability.md | Documents installing [langfuse] and wiring the adapter. |
| docs/concepts/observability.md | Updates Langfuse mapping docs to describe adapter-based production usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three stale-doc residues from when the adapter consumed trace_info on the first observation only. The current behavior — propagate on every observation under each trace_id to avoid v4's last-attribute- wins display logic clobbering the trace name — got caught by the integration-test run and the cache+propagation refactored, but the module header comments and one example-doc paragraph still described the original "first observation only" path. Comment / doc text updated; no code change. Addresses CoPilot PR review feedback on #82.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validates the Langfuse observer against real
langfuse>=4.6and ships a bridge so production users get the same Protocol-shaped observer surface asInMemoryLangfuseClient. Fifth of 6 core PRs in the v0.10.0 batch.[langfuse]extras pinned tolangfuse>=4.6,<5. The v4 SDK is structurally different from v2 / v3 — traces auto-create on first observation,span/generationcollapse intostart_observation(as_type=...), trace_id threads throughTraceContext, trace-level metadata sets viapropagate_attributescontext manager. Per the "no existing-user constraint" directive, the adapter targets v4 only.LangfuseSDKAdapterwraps the v4 client to satisfy the four-methodLangfuseClientProtocol. Key translations:int(uuid, 16)parsing on the dashed form; OA's observer error-isolation pattern swallowed that as awarnings.warn, silently dropping traces. The conversion makes the bridge correct.propagate_attributes(trace_name=, metadata=)on every observation (not just the first). Without this, v4's last-attribute-wins display logic let later observations clobber the trace's display name to whatever the final observation was called.usage→usage_detailstranslation fromLangfuseUsagerecord to v4's int-only dict.LangfuseSpan/LangfuseGenerationhandles wrap into_SpanHandleexposing the.update()/.end()the observer calls.Trace-info cache persists per trace_id (rather than popping on first observation). Memory is linear in unique trace_ids; a
close_tracecleanup hook is deferred to a future PR.Tests
update_tracemerge, UUID4 → OTel-hex conversion (with idempotency on already-hex + non-UUID passthrough).@pytest.mark.integration+LANGFUSE_PUBLIC_KEY/LANGFUSE_SECRET_KEYenv vars. Callsauth_check()to fail loud on bad credentials,client.shutdown()for synchronous batch-exporter drain. AcceptsLANGFUSE_HOSTorLANGFUSE_BASE_URLfor the host (downstream convention).addoptsdefaults to-m "not integration"so CI auto-skips integration tests; run with-m integrationto include.Docs
docs/concepts/observability.mdanddocs/examples/10-langfuse-observability.mdflip the "no SDK version validated" disclosure to the validated v4 state and show theLangfuseSDKAdapterwire-up snippet.examples/10-langfuse-observability/main.pydocstring + inline comment updated with the production swap recipe.AGENTS.mdregenerated.Validated end-to-end
Real run against Langfuse Cloud (US region) with
langfuse==4.7.0: two-node graph produces one Trace with the entry-node-name as the display name, both nodes as Span observations under it, and the spec §8.4.1 trace metadata (correlation_id,entry_node,spec_version) populated. Earlier runs caught two real bugs that this PR fixed: the UUID format mismatch and the trace-name clobbering.Test plan
-m "not integration"