Skip to content

mcp: capture upstream http_status in tool-call telemetry (enrich_titles API_ERROR floor)#100

Draft
ArtyETH06 wants to merge 1 commit into
mainfrom
ArtyETH06/mcp-enrich-titles-http-status-telemetry
Draft

mcp: capture upstream http_status in tool-call telemetry (enrich_titles API_ERROR floor)#100
ArtyETH06 wants to merge 1 commit into
mainfrom
ArtyETH06/mcp-enrich-titles-http-status-telemetry

Conversation

@ArtyETH06

Copy link
Copy Markdown
Contributor

What

Observability-first step for the leadbay_enrich_titles ~7% API_ERROR floor (10/136 calls last 7d).

API_ERROR is the catch-all in packages/core/src/client.ts mapErrorResponse for any backend non-2xx that isn't 401/402/403/404/429. The error envelope already carries the upstream status at _meta.http_status (client.ts makeError, ~L876), and the Sentry capture path forwards it via buildBusinessCtx (server.ts L832) — but the high-volume PostHog product-analytics events (mcp tool called / mcp composite call) did not. So on the dashboard where the floor was measured, we can't tell whether it's 503s, 500s, or a 4xx edge.

The fix (one concern, low risk)

  • ToolCallProps + CompositeCallProps gain http_status?: number.
  • The business-error capture sites in server.ts forward err._meta.http_status onto both events.

No retry logic. No change to client.ts request behavior. This is deliberately observability-only: capture the status first, then let the data confirm which upstream status dominates before deciding whether a retry fix is warranted. enrich_titles is a composite, so it flows through both events — both now carry the status.

Tests

New file packages/mcp/test/tool-call-http-status.test.ts drives a tool that throws a LeadbayError-shaped envelope carrying _meta.http_status: 503 and asserts the captured tool-call event includes http_status === 503.

  • before the fix: FAIL (expected undefined to be 503)
  • after the fix: PASS

pnpm -r test (core 380 / promptforge 16 / mcp 383) and pnpm -r typecheck both green.

Follow-up (deferred)

Once a few days of data confirm the dominant status (503 vs 500 vs 4xx), decide whether to add bounded retry/backoff for the transient subset. Not in this PR.

leadbay_enrich_titles shows a ~7% API_ERROR floor (10/136 calls last 7d).
API_ERROR is the catch-all in client.ts mapErrorResponse for any backend
non-2xx that isn't 401/402/403/404/429. The error envelope already carries
the upstream status at _meta.http_status (client.ts makeError), and the
Sentry capture path forwards it via buildBusinessCtx — but the high-volume
PostHog product-analytics events (`mcp tool called` / `mcp composite call`)
did NOT, so the dashboard couldn't tell whether the floor is 503s, 500s,
or a 4xx edge.

Observability-only fix: add http_status?: number to ToolCallProps and
CompositeCallProps, and forward err._meta.http_status onto both events at
the business-error capture sites in server.ts. No retry logic, no change
to client.ts request behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ArtyETH06 ArtyETH06 self-assigned this Jun 11, 2026
@ArtyETH06

Copy link
Copy Markdown
Contributor Author

TEST PR from the mcp-analysis / eval skill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant