Skip to content

Add [langfuse] extras and SDK adapter (PR 3.6)#82

Merged
chris-colinsky merged 2 commits into
mainfrom
feature/langfuse-extras
May 27, 2026
Merged

Add [langfuse] extras and SDK adapter (PR 3.6)#82
chris-colinsky merged 2 commits into
mainfrom
feature/langfuse-extras

Conversation

@chris-colinsky
Copy link
Copy Markdown
Member

@chris-colinsky chris-colinsky commented May 27, 2026

Summary

Validates the Langfuse observer against real langfuse>=4.6 and ships a bridge so production users get the same Protocol-shaped observer surface as InMemoryLangfuseClient. Fifth of 6 core PRs in the v0.10.0 batch.

[langfuse] extras pinned to langfuse>=4.6,<5. The v4 SDK is structurally different from v2 / v3 — traces auto-create on first observation, span / generation collapse into start_observation(as_type=...), trace_id threads through TraceContext, trace-level metadata sets via propagate_attributes context manager. Per the "no existing-user constraint" directive, the adapter targets v4 only.

LangfuseSDKAdapter wraps the v4 client to satisfy the four-method LangfuseClient Protocol. Key translations:

  • UUID4 invocation_id → OTel-hex trace_id (32 chars, no dashes). v4 fails int(uuid, 16) parsing on the dashed form; OA's observer error-isolation pattern swallowed that as a warnings.warn, silently dropping traces. The conversion makes the bridge correct.
  • propagate_attributes(trace_name=, metadata=) on every observation (not just the first). Without this, v4's last-attribute-wins display logic let later observations clobber the trace's display name to whatever the final observation was called.
  • usageusage_details translation from LangfuseUsage record to v4's int-only dict.
  • Returned LangfuseSpan / LangfuseGeneration handles wrap into _SpanHandle exposing the .update() / .end() the observer calls.

Trace-info cache persists per trace_id (rather than popping on first observation). Memory is linear in unique trace_ids; a close_trace cleanup hook is deferred to a future PR.

Tests

  • Five new unit tests: Protocol satisfaction, observer construction, trace_info cache lifecycle, update_trace merge, UUID4 → OTel-hex conversion (with idempotency on already-hex + non-UUID passthrough).
  • One opt-in integration test against real Langfuse Cloud, gated by @pytest.mark.integration + LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY env vars. Calls auth_check() to fail loud on bad credentials, client.shutdown() for synchronous batch-exporter drain. Accepts LANGFUSE_HOST or LANGFUSE_BASE_URL for the host (downstream convention).
  • New pytest marker config: addopts defaults to -m "not integration" so CI auto-skips integration tests; run with -m integration to include.

Docs

  • docs/concepts/observability.md and docs/examples/10-langfuse-observability.md flip the "no SDK version validated" disclosure to the validated v4 state and show the LangfuseSDKAdapter wire-up snippet.
  • examples/10-langfuse-observability/main.py docstring + inline comment updated with the production swap recipe.
  • AGENTS.md regenerated.

Validated end-to-end

Real run against Langfuse Cloud (US region) with langfuse==4.7.0: two-node graph produces one Trace with the entry-node-name as the display name, both nodes as Span observations under it, and the spec §8.4.1 trace metadata (correlation_id, entry_node, spec_version) populated. Earlier runs caught two real bugs that this PR fixed: the UUID format mismatch and the trace-name clobbering.

Test plan

  • CI green (lint, format, types, conformance, unit, smoke, agents-md drift)
  • 5 new adapter unit tests pass, 1 integration test correctly deselected by default -m "not integration"
  • Optional: manual run of the integration test against Langfuse Cloud to confirm the trace lands with the expected shape

Validates the Langfuse observer against the real langfuse>=4.6 SDK
and ships a bridge so production users get the same Protocol-shaped
observer surface as the InMemoryLangfuseClient.

[langfuse] optional-dependency group pins langfuse>=4.6,<5. The v4
SDK is structurally different from v2 / v3 — traces are auto-created
when the first observation starts, span and generation collapse into
start_observation with as_type, trace_id threads through
TraceContext, and trace-level metadata sets via propagate_attributes
context manager. Per Chris's directive ("no existing-user constraint;
do what's right for OA"), the adapter targets v4 only; earlier SDK
versions are out of scope.

LangfuseSDKAdapter wraps the v4 client to satisfy the four-method
LangfuseClient Protocol. Key translations:
- UUID4 invocation_id -> OTel-hex trace_id (32 chars, no dashes).
  v4 fails int(uuid, 16) parsing on the dashed form; OA's observer
  error-isolation pattern swallowed that as a warnings.warn,
  silently dropping traces.
- propagate_attributes(trace_name=, metadata=) runs on EVERY
  observation under each trace_id (not just the first). Without
  this, v4's last-attribute-wins display logic let later
  observations clobber the trace's display name to whatever the
  final observation was called.
- usage values translate from the Protocol's LangfuseUsage record
  to v4's usage_details dict (int values only).
- Returned LangfuseSpan / LangfuseGeneration handles wrap into
  _SpanHandle to expose the .update() / .end() the observer calls.

Trace-info cache persists per trace_id rather than popping on first
observation. Memory is linear in unique trace_ids; a close_trace
cleanup hook is deferred to a future PR.

Tests:
- Five unit tests covering Protocol satisfaction, observer
  construction, trace_info cache lifecycle, update_trace merge,
  UUID4 -> OTel-hex conversion (with idempotency on already-hex and
  non-UUID passthrough).
- One opt-in integration test against real Langfuse Cloud, gated by
  @pytest.mark.integration + LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY
  env vars. Calls auth_check() to fail loud on bad credentials,
  client.shutdown() for synchronous batch-exporter drain. Accepts
  LANGFUSE_HOST or LANGFUSE_BASE_URL for the host.
- New pytest marker config: addopts defaults to "-m not integration"
  so CI auto-skips integration tests; run with -m integration to
  include.

Docs:
- docs/concepts/observability.md flips the "no SDK version validated"
  disclosure to the validated v4 state, shows the LangfuseSDKAdapter
  wire-up snippet.
- docs/examples/10-langfuse-observability.md same.
- examples/10-langfuse-observability/main.py docstring + inline
  comment updated with the production swap recipe.
- AGENTS.md regenerated.

Validated end-to-end against Langfuse Cloud (US region) with
langfuse 4.7.0: a two-node graph produces one Trace with
entry-node-name set as the display name, both nodes as Span
observations under it, and the spec §8.4.1 trace metadata
(correlation_id, entry_node, spec_version) populated.

Fifth of 6 core PRs in the v0.10.0 batch (PR 3.6).
Copilot AI review requested due to automatic review settings May 27, 2026 20:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional Langfuse v4 integration path by pinning a [langfuse] extra and introducing a LangfuseSDKAdapter that maps the v4 SDK surface (start_observation, propagate_attributes, OTel trace IDs) onto OpenArmature’s LangfuseClient Protocol, so LangfuseObserver can be used consistently in tests (in-memory) and production (real SDK).

Changes:

  • Add [langfuse] optional dependency (pinned >=4.6,<5) and lockfile updates.
  • Implement LangfuseSDKAdapter bridging Langfuse v4’s API to the 4-method LangfuseClient Protocol (including UUID→OTel-hex trace_id conversion).
  • Add unit tests + an opt-in integration test, register an integration pytest marker, and update docs/examples to describe the production wire-up.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Locks new optional dependency set for langfuse and its transitive deps.
pyproject.toml Adds [langfuse] extra and configures pytest integration marker + default skip.
src/openarmature/observability/langfuse/adapter.py Implements LangfuseSDKAdapter + handle wrapper to satisfy LangfuseClient.
src/openarmature/observability/langfuse/init.py Conditionally exports LangfuseSDKAdapter when the extra is installed.
tests/unit/test_observability_langfuse_adapter.py Adds adapter unit tests and an opt-in Langfuse Cloud integration test.
examples/10-langfuse-observability/main.py Updates example guidance to use LangfuseSDKAdapter for production.
docs/examples/10-langfuse-observability.md Documents installing [langfuse] and wiring the adapter.
docs/concepts/observability.md Updates Langfuse mapping docs to describe adapter-based production usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/observability/langfuse/adapter.py Outdated
Comment thread src/openarmature/observability/langfuse/adapter.py Outdated
Comment thread docs/examples/10-langfuse-observability.md Outdated
Three stale-doc residues from when the adapter consumed trace_info
on the first observation only. The current behavior — propagate on
every observation under each trace_id to avoid v4's last-attribute-
wins display logic clobbering the trace name — got caught by the
integration-test run and the cache+propagation refactored, but
the module header comments and one example-doc paragraph still
described the original "first observation only" path.

Comment / doc text updated; no code change.

Addresses CoPilot PR review feedback on #82.
@chris-colinsky chris-colinsky merged commit 2b4bc1b into main May 27, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/langfuse-extras branch May 27, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants