Skip to content

feat: refactoring and enabling diagnostics logger and traces#3300

Open
ommeirelles wants to merge 42 commits into
devfrom
feat/diagnostics
Open

feat: refactoring and enabling diagnostics logger and traces#3300
ommeirelles wants to merge 42 commits into
devfrom
feat/diagnostics

Conversation

@ommeirelles

@ommeirelles ommeirelles commented May 6, 2026

Copy link
Copy Markdown
Contributor
image

Summary by CodeRabbit

  • New Features

    • Added a logging client that captures app logs and routes console calls for OTLP export.
  • Improvements

    • Diagnostics now use shared singleton clients for telemetry, tracing and logging.
    • Trace sampling increased from 1% to 30% for greater coverage.
    • Analytics service identity and runtime analytics message updated.
    • Tracing lifecycle and error handling updated for resolver spans.
  • Dependencies

    • Updated diagnostics dependency and added a semantic-conventions package.

Review Change Stack

@ommeirelles ommeirelles requested a review from a team as a code owner May 6, 2026 22:39
@ommeirelles ommeirelles requested review from eduardoformiga and lemagnetic and removed request for a team May 6, 2026 22:39
@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Migrate diagnostics from per-package maps to global singletons; add OTLP traces/log exporter modules, console proxying, and refactor telemetry bootstrap, exports, OTEL context injection, and prod config/deps.

Changes

Global Singleton Diagnostics Architecture

Layer / File(s) Summary
Type Definitions & Global Structure
packages/diagnostics/@types/global.d.ts, packages/diagnostics/src/globals.ts
fsDiagnostics global changes from map-based TELEMETRY_CLIENTS/TRACE_CLIENTS to singleton TELEMETRY_CLIENT, TRACE_CLIENT, and new LOGGER_CLIENT; adds OTLP endpoint strings and a LogClient type import.
Telemetry Client Refactoring
packages/diagnostics/src/start.ts
getTelemetryClient(...) now returns/caches a single telemetry client, delegates logger/tracer setup, registers HTTP instrumentation, removes per-package maps and direct exporter wiring.
Logger Implementation
packages/diagnostics/src/logger.ts
New setupLogsExporter() and exported getLogger(...) lazily initialize a cached LogsClient and install a Proxy for globalThis.console that forwards formatted messages to telemetry while preserving original console calls.
Tracer Implementation
packages/diagnostics/src/tracer.ts
New setupTracesExporter(), getTracesClient(telemetryClient) for lazy trace exporter/client creation and caching; getTraceClient(...) accessor added.
Export Wiring
packages/diagnostics/src/index.ts
Re-exports split: getTelemetryClient from ./start, getTraceClient from ./tracer.
Deps & Production Config
packages/diagnostics/package.json, packages/diagnostics/configs/prod.json
Bumped @vtex/diagnostics-nodejs to ^5.1.9, added @vtex/diagnostics-semconv@5.1.4; production traces sampling defaultRate changed from 0.01 to 0.3.
Server OTEL Injection
packages/core/src/server/options.ts
withTraceClient now awaits getTelemetryClient, starts an OTEL span, injects active context into options.OTEL.__otelContext, records errors/status, and enforces OTEL.enabled: true.
API Resolver Tracing
packages/api/src/observability/telemetry.ts
ResolverTrace refactored to use tracer named 'Graphql', set span from injected OTEL context, record exceptions/status, and end the span in finally immediately after invocation.
Typings & Misc Core Edits
packages/api/src/typings/globals.ts, packages/core/*, packages/components/*
Removed traceparent/tracestate from Options.OTEL; changed discovery analytics serviceName to 'faststore'; simplified Analytics log message; instrumentation gate simplified to runtime check; widened Toast timeoutRef type.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Start as getTelemetryClient()
    participant Logger as getLogger()
    participant Tracer as getTracesClient()
    participant Console as globalThis.console
    participant OTLP as OTLP Endpoint

    App->>Start: Initialize telemetry
    Start->>Logger: getLogger(client, opts)
    Logger->>OTLP: setupLogsExporter()
    OTLP-->>Logger: Exporter ready
    Logger->>Logger: Create LogsClient
    Logger->>Console: Replace with Proxy
    Logger-->>Start: Logger cached

    Start->>Tracer: getTracesClient(telemetry)
    Tracer->>OTLP: setupTracesExporter()
    OTLP-->>Tracer: Exporter ready
    Tracer->>Tracer: Create TraceClient
    Tracer-->>Start: Traces cached

    Start-->>App: TelemetryClient ready

    rect rgba(100, 150, 200, 0.5)
    Note over Console,OTLP: Runtime console override
    App->>Console: console.info("message")
    Console->>Logger: logger.info(formatted)
    Logger->>OTLP: Send log
    Console->>Console: Original console.info()
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • vtex/faststore#3215: Similar diagnostics work — both modify diagnostics client handling, globals, and telemetry/tracing wiring.

Suggested reviewers

  • lemagnetic
  • eduardoformiga
  • hellofanny

Poem

From maps dispersed to a single sign,
Console echoes routed through a line,
Traces and logs now join the rhyme,
Startup trimmed to save the time,
Diagnostics hum — one client, one design.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main objective: refactoring and enabling diagnostics logger and traces across the diagnostics package and related modules.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/diagnostics

Comment @coderabbitai help to get the list of available commands and usage tips.

@pkg-pr-new

pkg-pr-new Bot commented May 6, 2026

Copy link
Copy Markdown

Open in StackBlitz

@faststore/api

npm i https://pkg.pr.new/vtex/faststore/@faststore/api@219085b

@faststore/cli

npm i https://pkg.pr.new/vtex/faststore/@faststore/cli@219085b

@faststore/components

npm i https://pkg.pr.new/vtex/faststore/@faststore/components@219085b

@faststore/core

npm i https://pkg.pr.new/vtex/faststore/@faststore/core@219085b

@faststore/diagnostics

npm i https://pkg.pr.new/vtex/faststore/@faststore/diagnostics@219085b

@faststore/lighthouse

npm i https://pkg.pr.new/vtex/faststore/@faststore/lighthouse@219085b

@faststore/sdk

npm i https://pkg.pr.new/vtex/faststore/@faststore/sdk@219085b

@faststore/ui

npm i https://pkg.pr.new/vtex/faststore/@faststore/ui@219085b

commit: 219085b

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/diagnostics/configs/prod.json (1)

4-8: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reduce the production sampling jump.

Sampling every request at 0.3 turns a 1%-style baseline into a 30x higher trace volume for all traffic. That is a meaningful cost and backpressure increase in production. Please gate this behind an env flag or narrow it to specific rules/endpoints first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/diagnostics/configs/prod.json` around lines 4 - 8, The production
diagnostics config currently raises global sampling from defaultRate to 0.3 and
defines a broad rule "trace_all" with sampleRate 0.3 which will massively
increase trace volume; change this by lowering defaultRate and/or sampleRate on
the "trace_all" rule, and gate the higher sampling behind an environment flag
(e.g., DIAGNOSTICS_HIGH_SAMPLING) or restrict the rule to specific
endpoints/tags instead of "trace_all" so you only sample targeted traffic when
the flag is set; update the JSON keys defaultRate, rules[*].name "trace_all",
and rules[*].sampleRate accordingly and add logic to read the env flag before
enabling the high-sampling rule.
🧹 Nitpick comments (1)
packages/diagnostics/package.json (1)

30-31: ⚡ Quick win

Please capture the bundle-size impact for these new runtime deps.

This adds two runtime dependencies, but there’s no accompanying budget/update note to show the downstream size impact. Please attach the delta or update the package budget check with this change.

As per coding guidelines, "Maintain strict performance budgets by monitoring bundle size when adding new dependencies".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/diagnostics/package.json` around lines 30 - 31, The PR adds two
runtime deps "@vtex/diagnostics-nodejs" and "@vtex/diagnostics-semconv" but
lacks a bundle-size impact report or budget update; run your project's
bundle-size/size-budget tool (or local webpack/rollup build +
source-map-explorer) to measure the delta introduced by these packages, attach
the resulting size delta to the PR, and update the package budget/check file
(the project's size budget config) to account for the added bytes if acceptable;
reference the dependency entries "@vtex/diagnostics-nodejs" and
"@vtex/diagnostics-semconv" when updating the budget or PR description so
reviewers can verify the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/diagnostics/src/globals.ts`:
- Around line 3-5: The globals TELEMETRY_CLIENT, TRACE_CLIENT, and LOGGER_CLIENT
are currently singletons which makes getTelemetryClient(serviceName, clientName,
account, packageName) sticky to the first caller’s metadata; change these
globals back to keyed caches (e.g., Map or object) and update
getTelemetryClient, getTraceClient, and getLoggerClient to compute a composite
key from serviceName/clientName/account/packageName and look up/create clients
per-key instead of using a single process-wide value; alternatively, if
singleton behavior is intended, remove per-call metadata parameters from the
public API (getTelemetryClient signature) so callers cannot pass differing
metadata that would be ignored. Ensure creation paths set the keyed cache entry
and retrieval paths use the same composite key.

In `@packages/diagnostics/src/logger.ts`:
- Around line 15-18: The logs exporter is forcing plaintext transport by passing
insecure: true to Exporters.CreateLogsExporterConfig with OTLP_LOGGER_ENDPOINT;
change this to be environment-aware (e.g. derive insecure from IS_DEV or an env
var) so insecure is true only in development/testing and false in production,
updating the call in logger.ts (the Exporters.CreateLogsExporterConfig
invocation) to compute insecure dynamically and preserve current behavior for
traces which already use IS_DEV.

In `@packages/diagnostics/src/start.ts`:
- Around line 17-18: getTelemetryClient() currently returns the existing
global.fsDiagnostics.TELEMETRY_CLIENT synchronously before getLogger() and
getTracesClient() finish, causing tracing/logger setup race conditions; change
the function to await completion of getLogger() and getTracesClient() before
returning the client and promote the singleton to an initialization promise
(e.g. store a TELEMETRY_CLIENT_INIT promise on global.fsDiagnostics or make
TELEMETRY_CLIENT hold the promise) so concurrent first calls reuse the same
initialization flow and do not create duplicate clients or surface unhandled
rejections.

In `@packages/diagnostics/src/tracer.ts`:
- Around line 18-24: Move the cache check to the top of getTracesClient so we
don't create an exporter when a trace client already exists: check
global.fsDiagnostics.TRACE_CLIENT first and return it if present, and only call
setupTracesExporter() and telemetryClient.newTracesClient(...) when the cache is
empty; update getTracesClient to set global.fsDiagnostics.TRACE_CLIENT after
creating the exporter/client (use the existing setupTracesExporter and
telemetryClient.newTracesClient symbols).

---

Outside diff comments:
In `@packages/diagnostics/configs/prod.json`:
- Around line 4-8: The production diagnostics config currently raises global
sampling from defaultRate to 0.3 and defines a broad rule "trace_all" with
sampleRate 0.3 which will massively increase trace volume; change this by
lowering defaultRate and/or sampleRate on the "trace_all" rule, and gate the
higher sampling behind an environment flag (e.g., DIAGNOSTICS_HIGH_SAMPLING) or
restrict the rule to specific endpoints/tags instead of "trace_all" so you only
sample targeted traffic when the flag is set; update the JSON keys defaultRate,
rules[*].name "trace_all", and rules[*].sampleRate accordingly and add logic to
read the env flag before enabling the high-sampling rule.

---

Nitpick comments:
In `@packages/diagnostics/package.json`:
- Around line 30-31: The PR adds two runtime deps "@vtex/diagnostics-nodejs" and
"@vtex/diagnostics-semconv" but lacks a bundle-size impact report or budget
update; run your project's bundle-size/size-budget tool (or local webpack/rollup
build + source-map-explorer) to measure the delta introduced by these packages,
attach the resulting size delta to the PR, and update the package budget/check
file (the project's size budget config) to account for the added bytes if
acceptable; reference the dependency entries "@vtex/diagnostics-nodejs" and
"@vtex/diagnostics-semconv" when updating the budget or PR description so
reviewers can verify the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d2e341d-d5bb-4397-aaea-bb6035adb74b

📥 Commits

Reviewing files that changed from the base of the PR and between e912b5e and ebfb56d.

⛔ Files ignored due to path filters (4)
  • .github/workflows/ci.yml is excluded by none and included by none
  • .github/workflows/packages-preview.yml is excluded by none and included by none
  • .npmrc is excluded by none and included by none
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml and included by none
📒 Files selected for processing (8)
  • packages/diagnostics/@types/global.d.ts
  • packages/diagnostics/configs/prod.json
  • packages/diagnostics/package.json
  • packages/diagnostics/src/globals.ts
  • packages/diagnostics/src/index.ts
  • packages/diagnostics/src/logger.ts
  • packages/diagnostics/src/start.ts
  • packages/diagnostics/src/tracer.ts

Comment thread packages/diagnostics/src/globals.ts
Comment thread packages/diagnostics/src/logger.ts
Comment thread packages/diagnostics/src/start.ts
Comment thread packages/diagnostics/src/tracer.ts
@eduardoformiga eduardoformiga force-pushed the dev branch 3 times, most recently from 6c7ef8e to 4d4ac3a Compare May 7, 2026 15:18

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/core/src/instrumentation.ts (2)

18-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Swallowed error loses the actual failure reason.

The catch discards error, so on-call will see “Failed to initialize OTEL Instrumentation” with zero context. Please include the error in the log so misconfigurations are debuggable.

🪵 Suggested fix
-    } catch (error) {
-      console.error('Failed to initialize OTEL Instrumentation')
-    }
+    } catch (error) {
+      console.error('Failed to initialize OTEL Instrumentation', error)
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/instrumentation.ts` around lines 18 - 20, The catch in
packages/core/src/instrumentation.ts currently swallows the error; update the
catch block in the OTEL initialization code (the try/catch around
instrumentation setup in instrumentation.ts) to include the thrown error when
logging (e.g., pass the caught error or its message/stack to console.error or
your logger) so the log reads the failure message plus the actual error details.

9-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Typo in log: Instrumemtation.

Tiny but it will show up in every node boot log.

✏️ Suggested fix
-      console.log('Instrumemtation.ts: Getting telemetry client')
+      console.log('Instrumentation.ts: Getting telemetry client')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/instrumentation.ts` at line 9, Fix the typo in the
console.log message that reads 'Instrumemtation.ts: Getting telemetry client' by
updating the string to the correct spelling 'Instrumentation.ts: Getting
telemetry client' in the file where that log is emitted (look for the exact log
string to locate it).
packages/api/src/observability/telemetry.ts (1)

21-24: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Async resolvers will get truncated spans and uncaught rejections.

GraphQL resolvers very often return a Promise. With the new try/catch/finally:

  • finally { span.end() } runs the moment fn(...) returns the Promise, not when it resolves — so span duration measures Promise construction, not the actual work, and any child spans created during await chains have no live parent.
  • catch only catches synchronous throws. A rejected Promise from fn walks past this handler entirely, so setStatus(ERROR) / recordException / the console.error log never fire for the most common failure mode.

You'll want to branch on the return value (or await it) so spans only end on settle and rejections are recorded. Since the wrapper returns TReturn (which may legitimately be a Promise), keeping the sync fast path is fine:

🧵 Sketch — handle sync and async paths
-    try {
-      return fn(source, vars, graphqlContext, info)
-    } catch (error: any) {
-      span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
-      span?.recordException(error)
-      console.error(
-        `Error when executing resolver: ${resolverName}: \n %o`,
-        error
-      )
-      throw error
-    } finally {
-      span?.end()
-    }
+    const recordError = (error: any) => {
+      span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
+      span?.recordException(error)
+      console.error(`Error when executing resolver: ${resolverName}: \n %o`, error)
+    }
+
+    try {
+      const result = fn(source, vars, graphqlContext, info)
+      if (result && typeof (result as any).then === 'function') {
+        return (result as any).then(
+          (v: any) => { span?.end(); return v },
+          (err: any) => { recordError(err); span?.end(); throw err }
+        ) as TReturn
+      }
+      span?.end()
+      return result
+    } catch (error: any) {
+      recordError(error)
+      span?.end()
+      throw error
+    }

Also applies to: 46-58

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/observability/telemetry.ts` around lines 21 - 24, The
wrapper currently ends spans and catches errors synchronously which truncates
async spans and misses promise rejections; change the wrapper around fn(...) so
it examines the return value and handles both sync and async paths: call const
result = fn(source, vars, graphqlContext, info) inside your try block, then if
result is a Promise (e.g., result && typeof result.then === 'function') attach
result.then(...) to end the span and return the resolved value, and attach a
.catch(...) that calls span.setStatus/recordException/console.error as needed
and rethrows after ending the span; for non-Promise results keep the existing
try/catch/finally behavior (ending the span in finally). Ensure you reference
and update the span.end(), span.setStatus(ERROR), span.recordException(...) and
console.error(...) usage so they run on promise rejection as well; preserve the
function signature that returns TReturn (which may be a Promise).
🧹 Nitpick comments (3)
packages/components/src/molecules/Toast/Toast.tsx (1)

8-8: ⚡ Quick win

Consider using ReturnType<typeof setTimeout> for automatic type inference.

The union type number | NodeJS.Timeout handles both browser and Node.js environments correctly, but ReturnType<typeof setTimeout> would automatically adapt to the current environment's type definitions without manual union types.

♻️ Alternative approach using ReturnType
-  const timeoutRef = useRef<number | NodeJS.Timeout>()
+  const timeoutRef = useRef<ReturnType<typeof setTimeout>>()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/components/src/molecules/Toast/Toast.tsx` at line 8, Replace the
explicit union type on the timeout ref with an environment-agnostic inferred
type: update the timeoutRef declaration in Toast.tsx (the useRef for the
timeout) to use ReturnType<typeof setTimeout> so the type automatically matches
the runtime (browser or Node) instead of manually using number | NodeJS.Timeout.
packages/api/src/observability/telemetry.ts (1)

26-26: 💤 Low value

Nit: tracer name 'Graphql''GraphQL'.

Standard capitalization; will read better in trace UIs and matches the wider ecosystem (@opentelemetry/instrumentation-graphql uses GraphQL).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/observability/telemetry.ts` at line 26, Update the tracer
name passed to OTELAPI.trace.getTracer from 'Graphql' to the standard 'GraphQL'
to match ecosystem capitalization; locate the getTracer call (const tracer =
OTELAPI.trace.getTracer('Graphql')) and change the literal to 'GraphQL' so trace
UIs and instrumentation naming are consistent.
packages/core/src/instrumentation.ts (1)

4-22: ⚡ Quick win

Unconditional OTEL bootstrap removes the only opt-out.

With config.analytics.otelEnabled gone, every nodejs runtime — including local dev, CI, and self-hosted deployments without an OTLP collector — will attempt to initialize the diagnostics client and emit traces/logs. Consider keeping a single env-based kill switch (e.g. OTEL_SDK_DISABLED or a FASTSTORE_OTEL_ENABLED env) so consumers can opt out without forking the package. Otherwise misconfigured collectors will spew connection errors on every request.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/instrumentation.ts` around lines 4 - 22, The register()
function unconditionally initializes OTEL in Node runtimes; add an opt-out check
(e.g. respect a FASTSTORE_OTEL_ENABLED or OTEL_SDK_DISABLED env var or a
config.analytics.otelEnabled flag) and return early when disabled to avoid
attempting to import/@faststore/diagnostics; update the top of register() to
bail out if the env/config indicates OTEL is disabled, and only proceed to
import getTelemetryClient and call it when enabled (preserve existing try/catch
and include the same return behavior using pkgJSON, config.api.storeId, and
getTelemetryClient).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/api/src/observability/telemetry.ts`:
- Around line 38-44: The current code discards the Context returned by
OTELAPI.trace.setSpan and never activates it, so child operations don't inherit
the parent span; capture the returned Context (e.g., newCtx =
OTELAPI.trace.setSpan(...)) and run the wrapped work via
OTELAPI.context.with(newCtx, () => ... ) so the extracted parent is actually
activated; also ensure span?.end() runs after fn completes by awaiting the
result if fn returns a Promise (e.g., run fn inside the context, await its
result in a try/finally, then end the span in finally) and return the awaited
result so callers receive the original return value.

In `@packages/core/discovery.config.default.js`:
- Line 169: The change to the serviceName entry (serviceName: 'faststore') will
fragment historical telemetry keyed by service.name; either revert this value
back to the previous identity (e.g. 'faststore-proxy') in
discovery.config.default.js or, if the rename is intentional, coordinate with
observability owners and update dashboards/alerts/saved queries to use the new
name and add a clear migration note/release-log entry so historical and new
telemetry are reconciled; locate the serviceName assignment to make the revert
or to add the migration comment and notify owners.

In `@packages/core/src/server/options.ts`:
- Around line 2-6: The import list in options.ts includes an unused symbol
getTraceClient which triggers lint errors; edit the import from
'@faststore/diagnostics' to remove getTraceClient and keep only
getTelemetryClient and OTELAPI so the file only imports used symbols (ensure any
other references to getTraceClient in this file are also removed or replaced if
present).
- Around line 36-71: The withTraceClient function currently starts and ends a
span inside initialization (span in withTraceClient) which dies before request
handling (so child spans from GraphqlVtexContextFactory lose their parent), has
a try/catch that can fall through without returning in strict mode, and narrows
options.OTEL.__otelContext to {} which breaks propagation typing; fix by: stop
starting/ending a request-scoped span inside withTraceClient (move
startSpan/span.end to the HTTP/graphql request handler so the span wraps the
full request lifecycle), ensure the function always returns or rethrows in the
catch (e.g., remove the try/catch or rethrow the error after recording it in
span), and change the OTEL carrier declaration to an explicit
Record<string,string> (set options.OTEL.__otelContext type to
Record<string,string>) so OTELAPI.propagation.inject accepts it.

In `@packages/diagnostics/src/logger.ts`:
- Around line 27-38: The current getLogger implementation caches a single logger
in global.fsDiagnostics.LOGGER_CLIENT which captures and reuses the first call's
opt.serviceName/opt.client for all subsequent callers; change this to cache
loggers per unique caller (e.g., key = `${opt.serviceName}:${opt.client}`)
instead of a single global value so each service/client pair gets its own
LogsClient and console proxy. Update getLogger to look up and return from a map
(or object) stored on global.fsDiagnostics (instead of LOGGER_CLIENT), create a
new logger and call overrideConsole(logger, opt) only for the missing key, and
ensure overrideConsole is passed the current opt so metadata is not closed over
by the first initialization. Ensure references to getLogger,
global.fsDiagnostics.LOGGER_CLIENT, overrideConsole, opt.serviceName and
opt.client are updated accordingly.
- Around line 31-40: Multiple concurrent callers can pass the initial
global.fsDiagnostics.LOGGER_CLIENT check and each start creating
exporters/clients; fix this by caching the in-flight initialization promise in
global.fsDiagnostics.LOGGER_CLIENT before awaiting client.newLogsClient so other
callers await the same promise. Concretely: when entering the init, assign a
promise (that calls setupLogsExporter(), client.newLogsClient(...), then calls
overrideConsole(logger) and returns logger) to
global.fsDiagnostics.LOGGER_CLIENT immediately; await that promise to get the
final logger; on rejection ensure you clear global.fsDiagnostics.LOGGER_CLIENT
so future attempts can retry. Use the existing symbols client.newLogsClient,
setupLogsExporter, overrideConsole and global.fsDiagnostics.LOGGER_CLIENT to
implement this change.

---

Outside diff comments:
In `@packages/api/src/observability/telemetry.ts`:
- Around line 21-24: The wrapper currently ends spans and catches errors
synchronously which truncates async spans and misses promise rejections; change
the wrapper around fn(...) so it examines the return value and handles both sync
and async paths: call const result = fn(source, vars, graphqlContext, info)
inside your try block, then if result is a Promise (e.g., result && typeof
result.then === 'function') attach result.then(...) to end the span and return
the resolved value, and attach a .catch(...) that calls
span.setStatus/recordException/console.error as needed and rethrows after ending
the span; for non-Promise results keep the existing try/catch/finally behavior
(ending the span in finally). Ensure you reference and update the span.end(),
span.setStatus(ERROR), span.recordException(...) and console.error(...) usage so
they run on promise rejection as well; preserve the function signature that
returns TReturn (which may be a Promise).

In `@packages/core/src/instrumentation.ts`:
- Around line 18-20: The catch in packages/core/src/instrumentation.ts currently
swallows the error; update the catch block in the OTEL initialization code (the
try/catch around instrumentation setup in instrumentation.ts) to include the
thrown error when logging (e.g., pass the caught error or its message/stack to
console.error or your logger) so the log reads the failure message plus the
actual error details.
- Line 9: Fix the typo in the console.log message that reads
'Instrumemtation.ts: Getting telemetry client' by updating the string to the
correct spelling 'Instrumentation.ts: Getting telemetry client' in the file
where that log is emitted (look for the exact log string to locate it).

---

Nitpick comments:
In `@packages/api/src/observability/telemetry.ts`:
- Line 26: Update the tracer name passed to OTELAPI.trace.getTracer from
'Graphql' to the standard 'GraphQL' to match ecosystem capitalization; locate
the getTracer call (const tracer = OTELAPI.trace.getTracer('Graphql')) and
change the literal to 'GraphQL' so trace UIs and instrumentation naming are
consistent.

In `@packages/components/src/molecules/Toast/Toast.tsx`:
- Line 8: Replace the explicit union type on the timeout ref with an
environment-agnostic inferred type: update the timeoutRef declaration in
Toast.tsx (the useRef for the timeout) to use ReturnType<typeof setTimeout> so
the type automatically matches the runtime (browser or Node) instead of manually
using number | NodeJS.Timeout.

In `@packages/core/src/instrumentation.ts`:
- Around line 4-22: The register() function unconditionally initializes OTEL in
Node runtimes; add an opt-out check (e.g. respect a FASTSTORE_OTEL_ENABLED or
OTEL_SDK_DISABLED env var or a config.analytics.otelEnabled flag) and return
early when disabled to avoid attempting to import/@faststore/diagnostics; update
the top of register() to bail out if the env/config indicates OTEL is disabled,
and only proceed to import getTelemetryClient and call it when enabled (preserve
existing try/catch and include the same return behavior using pkgJSON,
config.api.storeId, and getTelemetryClient).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bc7f7cf8-7fc1-4ed9-95c3-eeff6f2c33b8

📥 Commits

Reviewing files that changed from the base of the PR and between f48f6c7 and cf8caa9.

⛔ Files ignored due to path filters (1)
  • docs/observability.md is excluded by none and included by none
📒 Files selected for processing (13)
  • packages/api/src/observability/telemetry.ts
  • packages/api/src/typings/globals.ts
  • packages/components/src/molecules/Toast/Toast.tsx
  • packages/core/discovery.config.default.js
  • packages/core/next.config.js
  • packages/core/src/instrumentation.ts
  • packages/core/src/server/options.ts
  • packages/diagnostics/@types/global.d.ts
  • packages/diagnostics/package.json
  • packages/diagnostics/src/globals.ts
  • packages/diagnostics/src/logger.ts
  • packages/diagnostics/src/start.ts
  • packages/diagnostics/src/tracer.ts
💤 Files with no reviewable changes (1)
  • packages/api/src/typings/globals.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/diagnostics/src/tracer.ts
  • packages/diagnostics/src/globals.ts
  • packages/diagnostics/package.json
  • packages/diagnostics/src/start.ts
  • packages/diagnostics/@types/global.d.ts

Comment on lines +38 to +44
OTELAPI.trace.setSpan(
OTELAPI.propagation.extract(
OTELAPI.context.active(),
graphqlContext.OTEL.__otelContext
),
span
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

@opentelemetry/api context.with vs trace.setSpan immutable context pattern

💡 Result:

In OpenTelemetry JS, the “immutable context” pattern applies to both choices: 1) api.context.with(ctx, callback) - context.with makes the provided Context the active one for the duration of callback execution, and returns the callback’s return value. [1] - It is scoped (so you don’t permanently change the global active context), and it requires a ContextManager to actually propagate the active context. [1] 2) trace.setSpan(context, span) - trace.setSpan (and the related context setValue pattern) returns a new Context instance that includes the given span; it does not mutate the input Context. This matches the general rule that Context is immutable and write operations create a new Context. [2][3] - The usual pattern is: create a new context with trace.setSpan(existingContext, span) and then make that new context active via context.with. (This is how to run code “in context of some span.”) [1][4] So what’s the practical difference? - trace.setSpan answers “which Context should contain this span?”—it produces a new (immutable) Context. [2][3] - context.with answers “for this code region (sync block), which Context is active?”—it activates the provided Context only during callback execution. [1] A canonical pattern (JS) - Create/obtain a Context (often context.active). - Call trace.setSpan(ctx, span) to get a new Context that contains that span (immutable update). [2] - Call context.with(newCtx, => {... }) so that context.active inside the callback is the new context (scoped activation). [1] Key takeaway - There is no conflict between “context.with” and “trace.setSpan immutable context pattern”: trace.setSpan is the immutable update step; context.with is the scoped activation step. [1][2] Notes on “startActiveSpan” - The SDK also provides tracer.startActiveSpan, which “starts a new Span and calls the given function … Additionally the new span gets set in context and this context is activated for the duration of the function call.” [5] This is essentially a convenience wrapper around the same two concerns (set span into a context, then activate it for the callback duration). [5]

Citations:


🏁 Script executed:

find . -type f -name "telemetry.ts" -path "*/observability/*" | head -5

Repository: vtex/faststore

Length of output: 104


🏁 Script executed:

cat -n ./packages/api/src/observability/telemetry.ts | sed -n '30,65p'

Repository: vtex/faststore

Length of output: 1041


🏁 Script executed:

cat -n ./packages/api/src/observability/telemetry.ts | sed -n '1,60p'

Repository: vtex/faststore

Length of output: 2101


setSpan() return value is discarded — propagated context is never activated.

setSpan returns a new immutable Context; the extracted parent context is computed and immediately thrown away. As written, fn(...) runs under the pre-existing active context, not the one carrying the parent span. Use context.with(newCtx, () => fn(...)) to activate the context so child operations inherit the parent.

Additionally, if fn() can return a Promise, span?.end() fires immediately in the finally block before resolution completes—the span will close before async operations finish.

🔧 Suggested fix
-    OTELAPI.trace.setSpan(
-      OTELAPI.propagation.extract(
-        OTELAPI.context.active(),
-        graphqlContext.OTEL.__otelContext
-      ),
-      span
-    )
-
-    try {
-      return fn(source, vars, graphqlContext, info)
-    } catch (error: any) {
+    const parentCtx = OTELAPI.propagation.extract(
+      OTELAPI.context.active(),
+      graphqlContext.OTEL.__otelContext
+    )
+    const ctx = OTELAPI.trace.setSpan(parentCtx, span)
+
+    try {
+      return OTELAPI.context.with(ctx, () => fn(source, vars, graphqlContext, info))
+    } catch (error: any) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/observability/telemetry.ts` around lines 38 - 44, The
current code discards the Context returned by OTELAPI.trace.setSpan and never
activates it, so child operations don't inherit the parent span; capture the
returned Context (e.g., newCtx = OTELAPI.trace.setSpan(...)) and run the wrapped
work via OTELAPI.context.with(newCtx, () => ... ) so the extracted parent is
actually activated; also ensure span?.end() runs after fn completes by awaiting
the result if fn returns a Promise (e.g., run fn inside the context, await its
result in a try/finally, then end the span in finally) and return the awaited
result so callers receive the original return value.

Comment thread packages/core/discovery.config.default.js
Comment on lines +2 to +6
import {
getTelemetryClient,
getTraceClient,
OTELAPI,
} from '@faststore/diagnostics'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP "\bgetTraceClient\b" --type=ts -C2

Repository: vtex/faststore

Length of output: 825


Remove unused import getTraceClient.

It's imported but never used in this file, which will trigger Biome/lint. Safe cleanup.

Suggested fix
 import {
   getTelemetryClient,
-  getTraceClient,
   OTELAPI,
 } from '@faststore/diagnostics'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import {
getTelemetryClient,
getTraceClient,
OTELAPI,
} from '@faststore/diagnostics'
import {
getTelemetryClient,
OTELAPI,
} from '@faststore/diagnostics'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/server/options.ts` around lines 2 - 6, The import list in
options.ts includes an unused symbol getTraceClient which triggers lint errors;
edit the import from '@faststore/diagnostics' to remove getTraceClient and keep
only getTelemetryClient and OTELAPI so the file only imports used symbols
(ensure any other references to getTraceClient in this file are also removed or
replaced if present).

Comment on lines 36 to 71
export async function withTraceClient<T extends APIOptions = typeof apiOptions>(
apiOptions: T
): Promise<T> {
const OTEL = {}
getTraceClient(
apiOptions?.discoveryConfig?.analytics?.serviceName ?? name
)?.inject(OTEL)
// Safe guard in dev mode to prevent the
// global scope to be erased in hot-module-reload.
await getTelemetryClient({
serviceName: storeConfig.analytics?.serviceName ?? 'faststore',
version,
account: storeConfig.api.storeId,
clientName: storeConfig.api.storeId,
packageName: name,
})

return {
const options = {
...apiOptions,
OTEL: {
...OTEL,
enabled: storeConfig.analytics?.otelEnabled?.toString() === 'true',
__otelContext: {},
enabled: true,
},
} as T
} satisfies T

const tracer = OTELAPI.trace.getTracer('@faststore/core')
const span = tracer.startSpan('@faststore/core graphql')

const context = OTELAPI.trace.setSpan(OTELAPI.context.active(), span)
OTELAPI.propagation.inject(options.OTEL.__otelContext, context)

try {
return options as T
} catch (error) {
span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
span?.recordException(error)
} finally {
span?.end()
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

@opentelemetry/api propagation.inject carrier type and recommended pattern for request-scoped parent spans

💡 Result:

In @opentelemetry/api, propagation.inject takes: 1) context: @opentelemetry/api.Context 2) carrier: a generic “Carrier” type (depends on the propagator); for TextMap propagators it’s typically a mutable key/value map (commonly an object used like HTTP headers) 3) optional setter: TextMapSetter (defaults to defaultTextMapSetter) So the “carrier type” is not a single fixed TS type in the API—it is whatever Carrier type the configured TextMapPropagator expects; the API treats it as a generic parameter and passes it to the propagator’s inject method. [1][2][3] Recommended pattern for request-scoped parent spans (i.e., for making child spans use the inbound parent): A) Extract on the receiving service at the request entry point, using the incoming request metadata (for HTTP: headers). This yields an extracted Context; spans started/created within that context will use the extracted SpanContext as parent. [4][5][6] B) For outbound calls from that request scope, inject the current active context into the outbound carrier (for HTTP: headers) immediately before the call. This propagates trace/span IDs so the next service can extract them and continue the same trace. [7][5] Concrete JS-style pattern (matches the OpenTelemetry JS docs examples): // Sending side: inject current context into a mutable carrier (e.g., headers object) propagation.inject(context.active, output); [7][1] // Receiving side: extract context from incoming carrier (e.g., parsed headers) const activeContext = propagation.extract(context.active, req.headers); [7][5] // Then start spans using that context as current so they become children of the extracted parent. context.with(activeContext, => { // startSpan(...) }); [8] Notes: - Carriers used at inject time are expected to be mutable. [9][2] - The TextMapPropagator injects/extracts string key/value pairs (encoded like HTTP headers). [1][2] - Instrumentation should call inject on outbound requests and extract on inbound requests; the propagators API is the manual mechanism when you must do it yourself. [10][5] - For request-scoped parent spans across async code in JS, ensure you have a proper Context Manager; otherwise api.context.active may always be ROOT_CONTEXT. [8]

Citations:


🏁 Script executed:

# First, verify the file exists and check the code at the specified lines
fd -type f -name "options.ts" -path "*/packages/core/src/server/*"

Repository: vtex/faststore

Length of output: 229


🏁 Script executed:

# Read the file to verify the code matches the review snippet
# File: packages/core/src/server/options.ts, lines 36-71
cat -n packages/core/src/server/options.ts | sed -n '30,80p'

Repository: vtex/faststore

Length of output: 1409


🏁 Script executed:

# Check if there's any usage of withTraceClient to understand context
rg "withTraceClient" --type ts --type tsx -B 2 -A 5

Repository: vtex/faststore

Length of output: 86


🏁 Script executed:

# Fix the rg command - search for withTraceClient usage
rg "withTraceClient" -B 2 -A 5

Repository: vtex/faststore

Length of output: 1594


🏁 Script executed:

# Also check imports at the top of the file to understand OTELAPI
head -30 packages/core/src/server/options.ts

Repository: vtex/faststore

Length of output: 1091


🏁 Script executed:

# Check tsconfig.json for strict mode setting
find . -name "tsconfig.json" -type f | head -5

Repository: vtex/faststore

Length of output: 228


🏁 Script executed:

# Look at tsconfig in the packages/core directory
cat packages/tsconfig.json 2>/dev/null || cat tsconfig.json 2>/dev/null | head -50

Repository: vtex/faststore

Length of output: 634


🏁 Script executed:

# Check how GraphqlVtexContextFactory uses the options
rg "GraphqlVtexContextFactory" -B 3 -A 10

Repository: vtex/faststore

Length of output: 12467


Span lifecycle ends prematurely, catch block lacks return in strict mode, and carrier type needs explicit definition.

Three stacked problems in this function:

  1. Span ends before work begins. span.end() fires in the finally block immediately after return options resolves, but the actual GraphQL work (including the propagation context extraction in GraphqlVtexContextFactory) happens later. Child spans created from __otelContext won't have a live parent, defeating distributed tracing. The span should wrap the entire request lifecycle, not this initialization step.

  2. Catch block missing return violates noImplicitReturns. With strict mode enabled, all code paths must explicitly return a value matching Promise<T>. If an error is caught, the function falls through without returning, yielding undefined instead. Either rethrow or remove the try/catch—there's no error recovery logic here.

  3. satisfies T narrows __otelContext to {}. When you mutate it with OTELAPI.propagation.inject(options.OTEL.__otelContext, context), TypeScript loses type info. Explicitly type it as Record<string, string> to match what the propagation API expects.

🩹 Recommended fix (propagation only)
-  const options = {
-    ...apiOptions,
-    OTEL: {
-      __otelContext: {},
-      enabled: true,
-    },
-  } satisfies T
-
-  const tracer = OTELAPI.trace.getTracer('@faststore/core')
-  const span = tracer.startSpan('@faststore/core graphql')
-
-  const context = OTELAPI.trace.setSpan(OTELAPI.context.active(), span)
-  OTELAPI.propagation.inject(options.OTEL.__otelContext, context)
-
-  try {
-    return options as T
-  } catch (error) {
-    span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
-    span?.recordException(error)
-  } finally {
-    span?.end()
-  }
+  const carrier: Record<string, string> = {}
+  OTELAPI.propagation.inject(OTELAPI.context.active(), carrier)
+
+  return {
+    ...apiOptions,
+    OTEL: {
+      __otelContext: carrier,
+      enabled: true,
+    },
+  } as T

If a request-scoped parent span is genuinely needed, lift startSpan and span.end() to the caller (e.g., the HTTP route handler) so it wraps the full await graphqlHandler(...) execution.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function withTraceClient<T extends APIOptions = typeof apiOptions>(
apiOptions: T
): Promise<T> {
const OTEL = {}
getTraceClient(
apiOptions?.discoveryConfig?.analytics?.serviceName ?? name
)?.inject(OTEL)
// Safe guard in dev mode to prevent the
// global scope to be erased in hot-module-reload.
await getTelemetryClient({
serviceName: storeConfig.analytics?.serviceName ?? 'faststore',
version,
account: storeConfig.api.storeId,
clientName: storeConfig.api.storeId,
packageName: name,
})
return {
const options = {
...apiOptions,
OTEL: {
...OTEL,
enabled: storeConfig.analytics?.otelEnabled?.toString() === 'true',
__otelContext: {},
enabled: true,
},
} as T
} satisfies T
const tracer = OTELAPI.trace.getTracer('@faststore/core')
const span = tracer.startSpan('@faststore/core graphql')
const context = OTELAPI.trace.setSpan(OTELAPI.context.active(), span)
OTELAPI.propagation.inject(options.OTEL.__otelContext, context)
try {
return options as T
} catch (error) {
span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
span?.recordException(error)
} finally {
span?.end()
}
}
export async function withTraceClient<T extends APIOptions = typeof apiOptions>(
apiOptions: T
): Promise<T> {
// Safe guard in dev mode to prevent the
// global scope to be erased in hot-module-reload.
await getTelemetryClient({
serviceName: storeConfig.analytics?.serviceName ?? 'faststore',
version,
account: storeConfig.api.storeId,
clientName: storeConfig.api.storeId,
packageName: name,
})
const carrier: Record<string, string> = {}
OTELAPI.propagation.inject(OTELAPI.context.active(), carrier)
return {
...apiOptions,
OTEL: {
__otelContext: carrier,
enabled: true,
},
} as T
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/server/options.ts` around lines 36 - 71, The
withTraceClient function currently starts and ends a span inside initialization
(span in withTraceClient) which dies before request handling (so child spans
from GraphqlVtexContextFactory lose their parent), has a try/catch that can fall
through without returning in strict mode, and narrows options.OTEL.__otelContext
to {} which breaks propagation typing; fix by: stop starting/ending a
request-scoped span inside withTraceClient (move startSpan/span.end to the
HTTP/graphql request handler so the span wraps the full request lifecycle),
ensure the function always returns or rethrows in the catch (e.g., remove the
try/catch or rethrow the error after recording it in span), and change the OTEL
carrier declaration to an explicit Record<string,string> (set
options.OTEL.__otelContext type to Record<string,string>) so
OTELAPI.propagation.inject accepts it.

Comment thread packages/diagnostics/src/logger.ts Outdated
Comment on lines +27 to +38
export async function getLogger(
client: TelemetryClient,
opt: { serviceName: string; client: string }
) {
if (global.fsDiagnostics.LOGGER_CLIENT)
return global.fsDiagnostics.LOGGER_CLIENT

const logger = await client.newLogsClient({
exporter: await setupLogsExporter(),
})

overrideConsole(logger, opt)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The singleton console proxy hard-codes the first caller’s metadata.

The proxy closes over the first opt.serviceName / opt.client, and later getLogger(...) calls just return the cached singleton. If different services initialize diagnostics in the same process, every forwarded console entry is attributed to the first one.

Also applies to: 45-76

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/diagnostics/src/logger.ts` around lines 27 - 38, The current
getLogger implementation caches a single logger in
global.fsDiagnostics.LOGGER_CLIENT which captures and reuses the first call's
opt.serviceName/opt.client for all subsequent callers; change this to cache
loggers per unique caller (e.g., key = `${opt.serviceName}:${opt.client}`)
instead of a single global value so each service/client pair gets its own
LogsClient and console proxy. Update getLogger to look up and return from a map
(or object) stored on global.fsDiagnostics (instead of LOGGER_CLIENT), create a
new logger and call overrideConsole(logger, opt) only for the missing key, and
ensure overrideConsole is passed the current opt so metadata is not closed over
by the first initialization. Ensure references to getLogger,
global.fsDiagnostics.LOGGER_CLIENT, overrideConsole, opt.serviceName and
opt.client are updated accordingly.

Comment thread packages/diagnostics/src/logger.ts Outdated
Comment on lines +31 to +40
if (global.fsDiagnostics.LOGGER_CLIENT)
return global.fsDiagnostics.LOGGER_CLIENT

const logger = await client.newLogsClient({
exporter: await setupLogsExporter(),
})

overrideConsole(logger, opt)

global.fsDiagnostics.LOGGER_CLIENT ??= logger

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cache the in-flight logger initialization.

Concurrent callers can all pass the LOGGER_CLIENT check before it is assigned, so this can create multiple exporters/clients and wrap globalThis.console more than once. That leads to duplicated log writes and double-formatted console output.

Suggested change
+let loggerPromise: Promise<LogClient> | undefined
+
 export async function getLogger(
   client: TelemetryClient,
   opt: { serviceName: string; client: string }
 ) {
   if (global.fsDiagnostics.LOGGER_CLIENT)
     return global.fsDiagnostics.LOGGER_CLIENT
-
-  const logger = await client.newLogsClient({
-    exporter: await setupLogsExporter(),
-  })
-
-  overrideConsole(logger, opt)
-
-  global.fsDiagnostics.LOGGER_CLIENT ??= logger
-
-  return logger
+  if (!loggerPromise) {
+    loggerPromise = (async () => {
+      const logger = await client.newLogsClient({
+        exporter: await setupLogsExporter(),
+      })
+
+      overrideConsole(logger, opt)
+      global.fsDiagnostics.LOGGER_CLIENT = logger
+
+      return logger
+    })().catch((error) => {
+      loggerPromise = undefined
+      throw error
+    })
+  }
+
+  return loggerPromise
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/diagnostics/src/logger.ts` around lines 31 - 40, Multiple concurrent
callers can pass the initial global.fsDiagnostics.LOGGER_CLIENT check and each
start creating exporters/clients; fix this by caching the in-flight
initialization promise in global.fsDiagnostics.LOGGER_CLIENT before awaiting
client.newLogsClient so other callers await the same promise. Concretely: when
entering the init, assign a promise (that calls setupLogsExporter(),
client.newLogsClient(...), then calls overrideConsole(logger) and returns
logger) to global.fsDiagnostics.LOGGER_CLIENT immediately; await that promise to
get the final logger; on rejection ensure you clear
global.fsDiagnostics.LOGGER_CLIENT so future attempts can retry. Use the
existing symbols client.newLogsClient, setupLogsExporter, overrideConsole and
global.fsDiagnostics.LOGGER_CLIENT to implement this change.

Comment thread packages/core/src/instrumentation.ts Outdated
Comment thread docs/install.md
Comment thread docs/observability.md
- `@faststore_account_name: required_trace_account_name`
- `@faststore_environment:production|development`


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pesquisar por @faststore_environment="development"

Image

Comment thread docs/observability.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants