feat: refactoring and enabling diagnostics logger and traces by ommeirelles · Pull Request #3300 · vtex/faststore

ommeirelles · 2026-05-06T22:39:53Z

Summary by CodeRabbit

New Features
- Added a logging client that captures app logs and routes console calls for OTLP export.
Improvements
- Diagnostics now use shared singleton clients for telemetry, tracing and logging.
- Trace sampling increased from 1% to 30% for greater coverage.
- Analytics service identity and runtime analytics message updated.
- Tracing lifecycle and error handling updated for resolver spans.
Dependencies
- Updated diagnostics dependency and added a semantic-conventions package.

coderabbitai · 2026-05-06T22:39:59Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Migrate diagnostics from per-package maps to global singletons; add OTLP traces/log exporter modules, console proxying, and refactor telemetry bootstrap, exports, OTEL context injection, and prod config/deps.

Changes

Global Singleton Diagnostics Architecture

Layer / File(s)	Summary
Type Definitions & Global Structure `packages/diagnostics/@types/global.d.ts`, `packages/diagnostics/src/globals.ts`	`fsDiagnostics` global changes from map-based `TELEMETRY_CLIENTS`/`TRACE_CLIENTS` to singleton `TELEMETRY_CLIENT`, `TRACE_CLIENT`, and new `LOGGER_CLIENT`; adds OTLP endpoint strings and a `LogClient` type import.
Telemetry Client Refactoring `packages/diagnostics/src/start.ts`	`getTelemetryClient(...)` now returns/caches a single telemetry client, delegates logger/tracer setup, registers HTTP instrumentation, removes per-package maps and direct exporter wiring.
Logger Implementation `packages/diagnostics/src/logger.ts`	New `setupLogsExporter()` and exported `getLogger(...)` lazily initialize a cached LogsClient and install a `Proxy` for `globalThis.console` that forwards formatted messages to telemetry while preserving original console calls.
Tracer Implementation `packages/diagnostics/src/tracer.ts`	New `setupTracesExporter()`, `getTracesClient(telemetryClient)` for lazy trace exporter/client creation and caching; `getTraceClient(...)` accessor added.
Export Wiring `packages/diagnostics/src/index.ts`	Re-exports split: `getTelemetryClient` from `./start`, `getTraceClient` from `./tracer`.
Deps & Production Config `packages/diagnostics/package.json`, `packages/diagnostics/configs/prod.json`	Bumped `@vtex/diagnostics-nodejs` to `^5.1.9`, added `@vtex/diagnostics-semconv@5.1.4`; production traces sampling defaultRate changed from `0.01` to `0.3`.
Server OTEL Injection `packages/core/src/server/options.ts`	`withTraceClient` now awaits `getTelemetryClient`, starts an OTEL span, injects active context into `options.OTEL.__otelContext`, records errors/status, and enforces `OTEL.enabled: true`.
API Resolver Tracing `packages/api/src/observability/telemetry.ts`	`ResolverTrace` refactored to use tracer named `'Graphql'`, set span from injected OTEL context, record exceptions/status, and end the span in `finally` immediately after invocation.
Typings & Misc Core Edits `packages/api/src/typings/globals.ts`, `packages/core/`, `packages/components/`	Removed `traceparent`/`tracestate` from `Options.OTEL`; changed discovery analytics `serviceName` to `'faststore'`; simplified Analytics log message; instrumentation gate simplified to runtime check; widened Toast `timeoutRef` type.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Start as getTelemetryClient()
    participant Logger as getLogger()
    participant Tracer as getTracesClient()
    participant Console as globalThis.console
    participant OTLP as OTLP Endpoint

    App->>Start: Initialize telemetry
    Start->>Logger: getLogger(client, opts)
    Logger->>OTLP: setupLogsExporter()
    OTLP-->>Logger: Exporter ready
    Logger->>Logger: Create LogsClient
    Logger->>Console: Replace with Proxy
    Logger-->>Start: Logger cached

    Start->>Tracer: getTracesClient(telemetry)
    Tracer->>OTLP: setupTracesExporter()
    OTLP-->>Tracer: Exporter ready
    Tracer->>Tracer: Create TraceClient
    Tracer-->>Start: Traces cached

    Start-->>App: TelemetryClient ready

    rect rgba(100, 150, 200, 0.5)
    Note over Console,OTLP: Runtime console override
    App->>Console: console.info("message")
    Console->>Logger: logger.info(formatted)
    Logger->>OTLP: Send log
    Console->>Console: Original console.info()
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

vtex/faststore#3215: Similar diagnostics work — both modify diagnostics client handling, globals, and telemetry/tracing wiring.

Suggested reviewers

lemagnetic
eduardoformiga
hellofanny

Poem

From maps dispersed to a single sign,
Console echoes routed through a line,
Traces and logs now join the rhyme,
Startup trimmed to save the time,
Diagnostics hum — one client, one design.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main objective: refactoring and enabling diagnostics logger and traces across the diagnostics package and related modules.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/diagnostics

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

pkg-pr-new · 2026-05-06T23:35:44Z

Open in StackBlitz

@faststore/api

npm i https://pkg.pr.new/vtex/faststore/@faststore/api@219085b

@faststore/cli

npm i https://pkg.pr.new/vtex/faststore/@faststore/cli@219085b

@faststore/components

npm i https://pkg.pr.new/vtex/faststore/@faststore/components@219085b

@faststore/core

npm i https://pkg.pr.new/vtex/faststore/@faststore/core@219085b

@faststore/diagnostics

npm i https://pkg.pr.new/vtex/faststore/@faststore/diagnostics@219085b

@faststore/lighthouse

npm i https://pkg.pr.new/vtex/faststore/@faststore/lighthouse@219085b

@faststore/sdk

npm i https://pkg.pr.new/vtex/faststore/@faststore/sdk@219085b

@faststore/ui

npm i https://pkg.pr.new/vtex/faststore/@faststore/ui@219085b

commit: 219085b

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/diagnostics/configs/prod.json (1)
4-8: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reduce the production sampling jump.

Sampling every request at 0.3 turns a 1%-style baseline into a 30x higher trace volume for all traffic. That is a meaningful cost and backpressure increase in production. Please gate this behind an env flag or narrow it to specific rules/endpoints first.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/diagnostics/configs/prod.json` around lines 4 - 8, The production
diagnostics config currently raises global sampling from defaultRate to 0.3 and
defines a broad rule "trace_all" with sampleRate 0.3 which will massively
increase trace volume; change this by lowering defaultRate and/or sampleRate on
the "trace_all" rule, and gate the higher sampling behind an environment flag
(e.g., DIAGNOSTICS_HIGH_SAMPLING) or restrict the rule to specific
endpoints/tags instead of "trace_all" so you only sample targeted traffic when
the flag is set; update the JSON keys defaultRate, rules[*].name "trace_all",
and rules[*].sampleRate accordingly and add logic to read the env flag before
enabling the high-sampling rule.

🧹 Nitpick comments (1)

packages/diagnostics/package.json (1)
30-31: ⚡ Quick win

Please capture the bundle-size impact for these new runtime deps.

This adds two runtime dependencies, but there’s no accompanying budget/update note to show the downstream size impact. Please attach the delta or update the package budget check with this change.

As per coding guidelines, "Maintain strict performance budgets by monitoring bundle size when adding new dependencies".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/diagnostics/package.json` around lines 30 - 31, The PR adds two
runtime deps "@vtex/diagnostics-nodejs" and "@vtex/diagnostics-semconv" but
lacks a bundle-size impact report or budget update; run your project's
bundle-size/size-budget tool (or local webpack/rollup build +
source-map-explorer) to measure the delta introduced by these packages, attach
the resulting size delta to the PR, and update the package budget/check file
(the project's size budget config) to account for the added bytes if acceptable;
reference the dependency entries "@vtex/diagnostics-nodejs" and
"@vtex/diagnostics-semconv" when updating the budget or PR description so
reviewers can verify the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/diagnostics/src/globals.ts`:
- Around line 3-5: The globals TELEMETRY_CLIENT, TRACE_CLIENT, and LOGGER_CLIENT
are currently singletons which makes getTelemetryClient(serviceName, clientName,
account, packageName) sticky to the first caller’s metadata; change these
globals back to keyed caches (e.g., Map or object) and update
getTelemetryClient, getTraceClient, and getLoggerClient to compute a composite
key from serviceName/clientName/account/packageName and look up/create clients
per-key instead of using a single process-wide value; alternatively, if
singleton behavior is intended, remove per-call metadata parameters from the
public API (getTelemetryClient signature) so callers cannot pass differing
metadata that would be ignored. Ensure creation paths set the keyed cache entry
and retrieval paths use the same composite key.

In `@packages/diagnostics/src/logger.ts`:
- Around line 15-18: The logs exporter is forcing plaintext transport by passing
insecure: true to Exporters.CreateLogsExporterConfig with OTLP_LOGGER_ENDPOINT;
change this to be environment-aware (e.g. derive insecure from IS_DEV or an env
var) so insecure is true only in development/testing and false in production,
updating the call in logger.ts (the Exporters.CreateLogsExporterConfig
invocation) to compute insecure dynamically and preserve current behavior for
traces which already use IS_DEV.

In `@packages/diagnostics/src/start.ts`:
- Around line 17-18: getTelemetryClient() currently returns the existing
global.fsDiagnostics.TELEMETRY_CLIENT synchronously before getLogger() and
getTracesClient() finish, causing tracing/logger setup race conditions; change
the function to await completion of getLogger() and getTracesClient() before
returning the client and promote the singleton to an initialization promise
(e.g. store a TELEMETRY_CLIENT_INIT promise on global.fsDiagnostics or make
TELEMETRY_CLIENT hold the promise) so concurrent first calls reuse the same
initialization flow and do not create duplicate clients or surface unhandled
rejections.

In `@packages/diagnostics/src/tracer.ts`:
- Around line 18-24: Move the cache check to the top of getTracesClient so we
don't create an exporter when a trace client already exists: check
global.fsDiagnostics.TRACE_CLIENT first and return it if present, and only call
setupTracesExporter() and telemetryClient.newTracesClient(...) when the cache is
empty; update getTracesClient to set global.fsDiagnostics.TRACE_CLIENT after
creating the exporter/client (use the existing setupTracesExporter and
telemetryClient.newTracesClient symbols).

---

Outside diff comments:
In `@packages/diagnostics/configs/prod.json`:
- Around line 4-8: The production diagnostics config currently raises global
sampling from defaultRate to 0.3 and defines a broad rule "trace_all" with
sampleRate 0.3 which will massively increase trace volume; change this by
lowering defaultRate and/or sampleRate on the "trace_all" rule, and gate the
higher sampling behind an environment flag (e.g., DIAGNOSTICS_HIGH_SAMPLING) or
restrict the rule to specific endpoints/tags instead of "trace_all" so you only
sample targeted traffic when the flag is set; update the JSON keys defaultRate,
rules[*].name "trace_all", and rules[*].sampleRate accordingly and add logic to
read the env flag before enabling the high-sampling rule.

---

Nitpick comments:
In `@packages/diagnostics/package.json`:
- Around line 30-31: The PR adds two runtime deps "@vtex/diagnostics-nodejs" and
"@vtex/diagnostics-semconv" but lacks a bundle-size impact report or budget
update; run your project's bundle-size/size-budget tool (or local webpack/rollup
build + source-map-explorer) to measure the delta introduced by these packages,
attach the resulting size delta to the PR, and update the package budget/check
file (the project's size budget config) to account for the added bytes if
acceptable; reference the dependency entries "@vtex/diagnostics-nodejs" and
"@vtex/diagnostics-semconv" when updating the budget or PR description so
reviewers can verify the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d2e341d-d5bb-4397-aaea-bb6035adb74b

📥 Commits

Reviewing files that changed from the base of the PR and between e912b5e and ebfb56d.

⛔ Files ignored due to path filters (4)

.github/workflows/ci.yml is excluded by none and included by none
.github/workflows/packages-preview.yml is excluded by none and included by none
.npmrc is excluded by none and included by none
pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml and included by none

📒 Files selected for processing (8)

packages/diagnostics/@types/global.d.ts
packages/diagnostics/configs/prod.json
packages/diagnostics/package.json
packages/diagnostics/src/globals.ts
packages/diagnostics/src/index.ts
packages/diagnostics/src/logger.ts
packages/diagnostics/src/start.ts
packages/diagnostics/src/tracer.ts

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

packages/core/src/instrumentation.ts (2)

18-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Swallowed error loses the actual failure reason.

The catch discards error, so on-call will see “Failed to initialize OTEL Instrumentation” with zero context. Please include the error in the log so misconfigurations are debuggable.

🪵 Suggested fix

-    } catch (error) {
-      console.error('Failed to initialize OTEL Instrumentation')
-    }
+    } catch (error) {
+      console.error('Failed to initialize OTEL Instrumentation', error)
+    }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/instrumentation.ts` around lines 18 - 20, The catch in
packages/core/src/instrumentation.ts currently swallows the error; update the
catch block in the OTEL initialization code (the try/catch around
instrumentation setup in instrumentation.ts) to include the thrown error when
logging (e.g., pass the caught error or its message/stack to console.error or
your logger) so the log reads the failure message plus the actual error details.

9-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Typo in log: Instrumemtation.

Tiny but it will show up in every node boot log.

✏️ Suggested fix

-      console.log('Instrumemtation.ts: Getting telemetry client')
+      console.log('Instrumentation.ts: Getting telemetry client')

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/instrumentation.ts` at line 9, Fix the typo in the
console.log message that reads 'Instrumemtation.ts: Getting telemetry client' by
updating the string to the correct spelling 'Instrumentation.ts: Getting
telemetry client' in the file where that log is emitted (look for the exact log
string to locate it).

packages/api/src/observability/telemetry.ts (1)

21-24: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Async resolvers will get truncated spans and uncaught rejections.

GraphQL resolvers very often return a Promise. With the new try/catch/finally:

finally { span.end() } runs the moment fn(...) returns the Promise, not when it resolves — so span duration measures Promise construction, not the actual work, and any child spans created during await chains have no live parent.
catch only catches synchronous throws. A rejected Promise from fn walks past this handler entirely, so setStatus(ERROR) / recordException / the console.error log never fire for the most common failure mode.

You'll want to branch on the return value (or await it) so spans only end on settle and rejections are recorded. Since the wrapper returns TReturn (which may legitimately be a Promise), keeping the sync fast path is fine:

🧵 Sketch — handle sync and async paths

-    try {
-      return fn(source, vars, graphqlContext, info)
-    } catch (error: any) {
-      span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
-      span?.recordException(error)
-      console.error(
-        `Error when executing resolver: ${resolverName}: \n %o`,
-        error
-      )
-      throw error
-    } finally {
-      span?.end()
-    }
+    const recordError = (error: any) => {
+      span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
+      span?.recordException(error)
+      console.error(`Error when executing resolver: ${resolverName}: \n %o`, error)
+    }
+
+    try {
+      const result = fn(source, vars, graphqlContext, info)
+      if (result && typeof (result as any).then === 'function') {
+        return (result as any).then(
+          (v: any) => { span?.end(); return v },
+          (err: any) => { recordError(err); span?.end(); throw err }
+        ) as TReturn
+      }
+      span?.end()
+      return result
+    } catch (error: any) {
+      recordError(error)
+      span?.end()
+      throw error
+    }

Also applies to: 46-58

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/observability/telemetry.ts` around lines 21 - 24, The
wrapper currently ends spans and catches errors synchronously which truncates
async spans and misses promise rejections; change the wrapper around fn(...) so
it examines the return value and handles both sync and async paths: call const
result = fn(source, vars, graphqlContext, info) inside your try block, then if
result is a Promise (e.g., result && typeof result.then === 'function') attach
result.then(...) to end the span and return the resolved value, and attach a
.catch(...) that calls span.setStatus/recordException/console.error as needed
and rethrows after ending the span; for non-Promise results keep the existing
try/catch/finally behavior (ending the span in finally). Ensure you reference
and update the span.end(), span.setStatus(ERROR), span.recordException(...) and
console.error(...) usage so they run on promise rejection as well; preserve the
function signature that returns TReturn (which may be a Promise).

🧹 Nitpick comments (3)

packages/components/src/molecules/Toast/Toast.tsx (1)
8-8: ⚡ Quick win

Consider using ReturnType<typeof setTimeout> for automatic type inference.

The union type number | NodeJS.Timeout handles both browser and Node.js environments correctly, but ReturnType<typeof setTimeout> would automatically adapt to the current environment's type definitions without manual union types.
♻️ Alternative approach using ReturnType
-  const timeoutRef = useRef<number | NodeJS.Timeout>()
+  const timeoutRef = useRef<ReturnType<typeof setTimeout>>()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/components/src/molecules/Toast/Toast.tsx` at line 8, Replace the
explicit union type on the timeout ref with an environment-agnostic inferred
type: update the timeoutRef declaration in Toast.tsx (the useRef for the
timeout) to use ReturnType<typeof setTimeout> so the type automatically matches
the runtime (browser or Node) instead of manually using number | NodeJS.Timeout.
packages/api/src/observability/telemetry.ts (1)
26-26: 💤 Low value

Nit: tracer name 'Graphql' → 'GraphQL'.

Standard capitalization; will read better in trace UIs and matches the wider ecosystem (@opentelemetry/instrumentation-graphql uses GraphQL).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/observability/telemetry.ts` at line 26, Update the tracer
name passed to OTELAPI.trace.getTracer from 'Graphql' to the standard 'GraphQL'
to match ecosystem capitalization; locate the getTracer call (const tracer =
OTELAPI.trace.getTracer('Graphql')) and change the literal to 'GraphQL' so trace
UIs and instrumentation naming are consistent.
packages/core/src/instrumentation.ts (1)
4-22: ⚡ Quick win

Unconditional OTEL bootstrap removes the only opt-out.

With config.analytics.otelEnabled gone, every nodejs runtime — including local dev, CI, and self-hosted deployments without an OTLP collector — will attempt to initialize the diagnostics client and emit traces/logs. Consider keeping a single env-based kill switch (e.g. OTEL_SDK_DISABLED or a FASTSTORE_OTEL_ENABLED env) so consumers can opt out without forking the package. Otherwise misconfigured collectors will spew connection errors on every request.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/instrumentation.ts` around lines 4 - 22, The register()
function unconditionally initializes OTEL in Node runtimes; add an opt-out check
(e.g. respect a FASTSTORE_OTEL_ENABLED or OTEL_SDK_DISABLED env var or a
config.analytics.otelEnabled flag) and return early when disabled to avoid
attempting to import/@faststore/diagnostics; update the top of register() to
bail out if the env/config indicates OTEL is disabled, and only proceed to
import getTelemetryClient and call it when enabled (preserve existing try/catch
and include the same return behavior using pkgJSON, config.api.storeId, and
getTelemetryClient).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/api/src/observability/telemetry.ts`:
- Around line 38-44: The current code discards the Context returned by
OTELAPI.trace.setSpan and never activates it, so child operations don't inherit
the parent span; capture the returned Context (e.g., newCtx =
OTELAPI.trace.setSpan(...)) and run the wrapped work via
OTELAPI.context.with(newCtx, () => ... ) so the extracted parent is actually
activated; also ensure span?.end() runs after fn completes by awaiting the
result if fn returns a Promise (e.g., run fn inside the context, await its
result in a try/finally, then end the span in finally) and return the awaited
result so callers receive the original return value.

In `@packages/core/discovery.config.default.js`:
- Line 169: The change to the serviceName entry (serviceName: 'faststore') will
fragment historical telemetry keyed by service.name; either revert this value
back to the previous identity (e.g. 'faststore-proxy') in
discovery.config.default.js or, if the rename is intentional, coordinate with
observability owners and update dashboards/alerts/saved queries to use the new
name and add a clear migration note/release-log entry so historical and new
telemetry are reconciled; locate the serviceName assignment to make the revert
or to add the migration comment and notify owners.

In `@packages/core/src/server/options.ts`:
- Around line 2-6: The import list in options.ts includes an unused symbol
getTraceClient which triggers lint errors; edit the import from
'@faststore/diagnostics' to remove getTraceClient and keep only
getTelemetryClient and OTELAPI so the file only imports used symbols (ensure any
other references to getTraceClient in this file are also removed or replaced if
present).
- Around line 36-71: The withTraceClient function currently starts and ends a
span inside initialization (span in withTraceClient) which dies before request
handling (so child spans from GraphqlVtexContextFactory lose their parent), has
a try/catch that can fall through without returning in strict mode, and narrows
options.OTEL.__otelContext to {} which breaks propagation typing; fix by: stop
starting/ending a request-scoped span inside withTraceClient (move
startSpan/span.end to the HTTP/graphql request handler so the span wraps the
full request lifecycle), ensure the function always returns or rethrows in the
catch (e.g., remove the try/catch or rethrow the error after recording it in
span), and change the OTEL carrier declaration to an explicit
Record<string,string> (set options.OTEL.__otelContext type to
Record<string,string>) so OTELAPI.propagation.inject accepts it.

In `@packages/diagnostics/src/logger.ts`:
- Around line 27-38: The current getLogger implementation caches a single logger
in global.fsDiagnostics.LOGGER_CLIENT which captures and reuses the first call's
opt.serviceName/opt.client for all subsequent callers; change this to cache
loggers per unique caller (e.g., key = `${opt.serviceName}:${opt.client}`)
instead of a single global value so each service/client pair gets its own
LogsClient and console proxy. Update getLogger to look up and return from a map
(or object) stored on global.fsDiagnostics (instead of LOGGER_CLIENT), create a
new logger and call overrideConsole(logger, opt) only for the missing key, and
ensure overrideConsole is passed the current opt so metadata is not closed over
by the first initialization. Ensure references to getLogger,
global.fsDiagnostics.LOGGER_CLIENT, overrideConsole, opt.serviceName and
opt.client are updated accordingly.
- Around line 31-40: Multiple concurrent callers can pass the initial
global.fsDiagnostics.LOGGER_CLIENT check and each start creating
exporters/clients; fix this by caching the in-flight initialization promise in
global.fsDiagnostics.LOGGER_CLIENT before awaiting client.newLogsClient so other
callers await the same promise. Concretely: when entering the init, assign a
promise (that calls setupLogsExporter(), client.newLogsClient(...), then calls
overrideConsole(logger) and returns logger) to
global.fsDiagnostics.LOGGER_CLIENT immediately; await that promise to get the
final logger; on rejection ensure you clear global.fsDiagnostics.LOGGER_CLIENT
so future attempts can retry. Use the existing symbols client.newLogsClient,
setupLogsExporter, overrideConsole and global.fsDiagnostics.LOGGER_CLIENT to
implement this change.

---

Outside diff comments:
In `@packages/api/src/observability/telemetry.ts`:
- Around line 21-24: The wrapper currently ends spans and catches errors
synchronously which truncates async spans and misses promise rejections; change
the wrapper around fn(...) so it examines the return value and handles both sync
and async paths: call const result = fn(source, vars, graphqlContext, info)
inside your try block, then if result is a Promise (e.g., result && typeof
result.then === 'function') attach result.then(...) to end the span and return
the resolved value, and attach a .catch(...) that calls
span.setStatus/recordException/console.error as needed and rethrows after ending
the span; for non-Promise results keep the existing try/catch/finally behavior
(ending the span in finally). Ensure you reference and update the span.end(),
span.setStatus(ERROR), span.recordException(...) and console.error(...) usage so
they run on promise rejection as well; preserve the function signature that
returns TReturn (which may be a Promise).

In `@packages/core/src/instrumentation.ts`:
- Around line 18-20: The catch in packages/core/src/instrumentation.ts currently
swallows the error; update the catch block in the OTEL initialization code (the
try/catch around instrumentation setup in instrumentation.ts) to include the
thrown error when logging (e.g., pass the caught error or its message/stack to
console.error or your logger) so the log reads the failure message plus the
actual error details.
- Line 9: Fix the typo in the console.log message that reads
'Instrumemtation.ts: Getting telemetry client' by updating the string to the
correct spelling 'Instrumentation.ts: Getting telemetry client' in the file
where that log is emitted (look for the exact log string to locate it).

---

Nitpick comments:
In `@packages/api/src/observability/telemetry.ts`:
- Line 26: Update the tracer name passed to OTELAPI.trace.getTracer from
'Graphql' to the standard 'GraphQL' to match ecosystem capitalization; locate
the getTracer call (const tracer = OTELAPI.trace.getTracer('Graphql')) and
change the literal to 'GraphQL' so trace UIs and instrumentation naming are
consistent.

In `@packages/components/src/molecules/Toast/Toast.tsx`:
- Line 8: Replace the explicit union type on the timeout ref with an
environment-agnostic inferred type: update the timeoutRef declaration in
Toast.tsx (the useRef for the timeout) to use ReturnType<typeof setTimeout> so
the type automatically matches the runtime (browser or Node) instead of manually
using number | NodeJS.Timeout.

In `@packages/core/src/instrumentation.ts`:
- Around line 4-22: The register() function unconditionally initializes OTEL in
Node runtimes; add an opt-out check (e.g. respect a FASTSTORE_OTEL_ENABLED or
OTEL_SDK_DISABLED env var or a config.analytics.otelEnabled flag) and return
early when disabled to avoid attempting to import/@faststore/diagnostics; update
the top of register() to bail out if the env/config indicates OTEL is disabled,
and only proceed to import getTelemetryClient and call it when enabled (preserve
existing try/catch and include the same return behavior using pkgJSON,
config.api.storeId, and getTelemetryClient).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bc7f7cf8-7fc1-4ed9-95c3-eeff6f2c33b8

📥 Commits

Reviewing files that changed from the base of the PR and between f48f6c7 and cf8caa9.

⛔ Files ignored due to path filters (1)

docs/observability.md is excluded by none and included by none

📒 Files selected for processing (13)

packages/api/src/observability/telemetry.ts
packages/api/src/typings/globals.ts
packages/components/src/molecules/Toast/Toast.tsx
packages/core/discovery.config.default.js
packages/core/next.config.js
packages/core/src/instrumentation.ts
packages/core/src/server/options.ts
packages/diagnostics/@types/global.d.ts
packages/diagnostics/package.json
packages/diagnostics/src/globals.ts
packages/diagnostics/src/logger.ts
packages/diagnostics/src/start.ts
packages/diagnostics/src/tracer.ts

💤 Files with no reviewable changes (1)

packages/api/src/typings/globals.ts

🚧 Files skipped from review as they are similar to previous changes (5)

packages/diagnostics/src/tracer.ts
packages/diagnostics/src/globals.ts
packages/diagnostics/package.json
packages/diagnostics/src/start.ts
packages/diagnostics/@types/global.d.ts

coderabbitai · 2026-05-11T15:20:30Z

+    OTELAPI.trace.setSpan(
+      OTELAPI.propagation.extract(
+        OTELAPI.context.active(),
+        graphqlContext.OTEL.__otelContext
+      ),
+      span
+    )


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

@opentelemetry/api context.with vs trace.setSpan immutable context pattern

💡 Result:

In OpenTelemetry JS, the “immutable context” pattern applies to both choices: 1) api.context.with(ctx, callback) - context.with makes the provided Context the active one for the duration of callback execution, and returns the callback’s return value. [1] - It is scoped (so you don’t permanently change the global active context), and it requires a ContextManager to actually propagate the active context. [1] 2) trace.setSpan(context, span) - trace.setSpan (and the related context setValue pattern) returns a new Context instance that includes the given span; it does not mutate the input Context. This matches the general rule that Context is immutable and write operations create a new Context. [2][3] - The usual pattern is: create a new context with trace.setSpan(existingContext, span) and then make that new context active via context.with. (This is how to run code “in context of some span.”) [1][4] So what’s the practical difference? - trace.setSpan answers “which Context should contain this span?”—it produces a new (immutable) Context. [2][3] - context.with answers “for this code region (sync block), which Context is active?”—it activates the provided Context only during callback execution. [1] A canonical pattern (JS) - Create/obtain a Context (often context.active). - Call trace.setSpan(ctx, span) to get a new Context that contains that span (immutable update). [2] - Call context.with(newCtx, => {... }) so that context.active inside the callback is the new context (scoped activation). [1] Key takeaway - There is no conflict between “context.with” and “trace.setSpan immutable context pattern”: trace.setSpan is the immutable update step; context.with is the scoped activation step. [1][2] Notes on “startActiveSpan” - The SDK also provides tracer.startActiveSpan, which “starts a new Span and calls the given function … Additionally the new span gets set in context and this context is activated for the duration of the function call.” [5] This is essentially a convenience wrapper around the same two concerns (set span into a context, then activate it for the callback duration). [5]

Citations:

1: https://opentelemetry.io/docs/instrumentation/js/api/context/

2: https://opentelemetry.io/docs/specs/otel/context/

3: https://opentelemetry.io/docs/languages/js/context/

4: how to set the current active context and current active span without callbacks open-telemetry/opentelemetry-js#3558

5: https://open-telemetry.github.io/opentelemetry-js/interfaces/_opentelemetry_sdk-node._opentelemetry_api.Tracer.html

🏁 Script executed:

find . -type f -name "telemetry.ts" -path "*/observability/*" | head -5

Repository: vtex/faststore

Length of output: 104

🏁 Script executed:

cat -n ./packages/api/src/observability/telemetry.ts | sed -n '30,65p'

Repository: vtex/faststore

Length of output: 1041

🏁 Script executed:

cat -n ./packages/api/src/observability/telemetry.ts | sed -n '1,60p'

Repository: vtex/faststore

Length of output: 2101

setSpan() return value is discarded — propagated context is never activated.

setSpan returns a new immutable Context; the extracted parent context is computed and immediately thrown away. As written, fn(...) runs under the pre-existing active context, not the one carrying the parent span. Use context.with(newCtx, () => fn(...)) to activate the context so child operations inherit the parent.

Additionally, if fn() can return a Promise, span?.end() fires immediately in the finally block before resolution completes—the span will close before async operations finish.

🔧 Suggested fix

- OTELAPI.trace.setSpan( - OTELAPI.propagation.extract( - OTELAPI.context.active(), - graphqlContext.OTEL.__otelContext - ), - span - ) - - try { - return fn(source, vars, graphqlContext, info) - } catch (error: any) { + const parentCtx = OTELAPI.propagation.extract( + OTELAPI.context.active(), + graphqlContext.OTEL.__otelContext + ) + const ctx = OTELAPI.trace.setSpan(parentCtx, span) + + try { + return OTELAPI.context.with(ctx, () => fn(source, vars, graphqlContext, info)) + } catch (error: any) {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/api/src/observability/telemetry.ts` around lines 38 - 44, The current code discards the Context returned by OTELAPI.trace.setSpan and never activates it, so child operations don't inherit the parent span; capture the returned Context (e.g., newCtx = OTELAPI.trace.setSpan(...)) and run the wrapped work via OTELAPI.context.with(newCtx, () => ... ) so the extracted parent is actually activated; also ensure span?.end() runs after fn completes by awaiting the result if fn returns a Promise (e.g., run fn inside the context, await its result in a try/finally, then end the span in finally) and return the awaited result so callers receive the original return value.

coderabbitai · 2026-05-11T15:20:30Z

+import {
+  getTelemetryClient,
+  getTraceClient,
+  OTELAPI,
+} from '@faststore/diagnostics'


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP "\bgetTraceClient\b" --type=ts -C2

Repository: vtex/faststore

Length of output: 825

Remove unused import getTraceClient.

It's imported but never used in this file, which will trigger Biome/lint. Safe cleanup.

Suggested fix

import { getTelemetryClient, - getTraceClient, OTELAPI, } from '@faststore/diagnostics'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import {

getTelemetryClient,

getTraceClient,

OTELAPI,

} from '@faststore/diagnostics'

import {

getTelemetryClient,

OTELAPI,

} from '@faststore/diagnostics'

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/server/options.ts` around lines 2 - 6, The import list in options.ts includes an unused symbol getTraceClient which triggers lint errors; edit the import from '@faststore/diagnostics' to remove getTraceClient and keep only getTelemetryClient and OTELAPI so the file only imports used symbols (ensure any other references to getTraceClient in this file are also removed or replaced if present).

coderabbitai · 2026-05-11T15:20:30Z

 export async function withTraceClient<T extends APIOptions = typeof apiOptions>(
  apiOptions: T
 ): Promise<T> {
-  const OTEL = {}
-  getTraceClient(
-    apiOptions?.discoveryConfig?.analytics?.serviceName ?? name
-  )?.inject(OTEL)
+  // Safe guard in dev mode to prevent the
+  // global scope to be erased in hot-module-reload.
+  await getTelemetryClient({
+    serviceName: storeConfig.analytics?.serviceName ?? 'faststore',
+    version,
+    account: storeConfig.api.storeId,
+    clientName: storeConfig.api.storeId,
+    packageName: name,
+  })

-  return {
+  const options = {
    ...apiOptions,
    OTEL: {
-      ...OTEL,
-      enabled: storeConfig.analytics?.otelEnabled?.toString() === 'true',
+      __otelContext: {},
+      enabled: true,
    },
-  } as T
+  } satisfies T
+
+  const tracer = OTELAPI.trace.getTracer('@faststore/core')
+  const span = tracer.startSpan('@faststore/core graphql')
+
+  const context = OTELAPI.trace.setSpan(OTELAPI.context.active(), span)
+  OTELAPI.propagation.inject(options.OTEL.__otelContext, context)
+
+  try {
+    return options as T
+  } catch (error) {
+    span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })
+    span?.recordException(error)
+  } finally {
+    span?.end()
+  }
 }


⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

@opentelemetry/api propagation.inject carrier type and recommended pattern for request-scoped parent spans

💡 Result:

In @opentelemetry/api, propagation.inject takes: 1) context: @opentelemetry/api.Context 2) carrier: a generic “Carrier” type (depends on the propagator); for TextMap propagators it’s typically a mutable key/value map (commonly an object used like HTTP headers) 3) optional setter: TextMapSetter (defaults to defaultTextMapSetter) So the “carrier type” is not a single fixed TS type in the API—it is whatever Carrier type the configured TextMapPropagator expects; the API treats it as a generic parameter and passes it to the propagator’s inject method. [1][2][3] Recommended pattern for request-scoped parent spans (i.e., for making child spans use the inbound parent): A) Extract on the receiving service at the request entry point, using the incoming request metadata (for HTTP: headers). This yields an extracted Context; spans started/created within that context will use the extracted SpanContext as parent. [4][5][6] B) For outbound calls from that request scope, inject the current active context into the outbound carrier (for HTTP: headers) immediately before the call. This propagates trace/span IDs so the next service can extract them and continue the same trace. [7][5] Concrete JS-style pattern (matches the OpenTelemetry JS docs examples): // Sending side: inject current context into a mutable carrier (e.g., headers object) propagation.inject(context.active, output); [7][1] // Receiving side: extract context from incoming carrier (e.g., parsed headers) const activeContext = propagation.extract(context.active, req.headers); [7][5] // Then start spans using that context as current so they become children of the extracted parent. context.with(activeContext, => { // startSpan(...) }); [8] Notes: - Carriers used at inject time are expected to be mutable. [9][2] - The TextMapPropagator injects/extracts string key/value pairs (encoded like HTTP headers). [1][2] - Instrumentation should call inject on outbound requests and extract on inbound requests; the propagators API is the manual mechanism when you must do it yourself. [10][5] - For request-scoped parent spans across async code in JS, ensure you have a proper Context Manager; otherwise api.context.active may always be ROOT_CONTEXT. [8]

Citations:

1: https://open-telemetry.github.io/opentelemetry-js/interfaces/_opentelemetry_api._opentelemetry_api.TextMapPropagator.html

2: https://opentelemetry.io/docs/specs/otel/context/api-propagators/

3: https://github.com/open-telemetry/opentelemetry-js/blob/08f597f3a3d71a4852b0afbba120af15ca038121/api/src/api/propagation.ts

4: https://opentelemetry.io/docs/concepts/context-propagation/

5: http://opentelemetry.io/docs/concepts/context-propagation/

6: https://oneuptime.com/blog/post/2026-01-24-opentelemetry-span-context-propagation/view

7: https://opentelemetry.io/docs/languages/js/propagation/

8: https://opentelemetry.io/docs/instrumentation/js/api/context/

9: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.6.1/specification/context/api-propagators.md

10: http://opentelemetry.io/docs/specs/otel/context/api-propagators/

🏁 Script executed:

# First, verify the file exists and check the code at the specified lines fd -type f -name "options.ts" -path "*/packages/core/src/server/*"

Repository: vtex/faststore

Length of output: 229

🏁 Script executed:

# Read the file to verify the code matches the review snippet # File: packages/core/src/server/options.ts, lines 36-71 cat -n packages/core/src/server/options.ts | sed -n '30,80p'

Repository: vtex/faststore

Length of output: 1409

🏁 Script executed:

# Check if there's any usage of withTraceClient to understand context rg "withTraceClient" --type ts --type tsx -B 2 -A 5

Repository: vtex/faststore

Length of output: 86

🏁 Script executed:

# Fix the rg command - search for withTraceClient usage rg "withTraceClient" -B 2 -A 5

Repository: vtex/faststore

Length of output: 1594

🏁 Script executed:

# Also check imports at the top of the file to understand OTELAPI head -30 packages/core/src/server/options.ts

Repository: vtex/faststore

Length of output: 1091

🏁 Script executed:

# Check tsconfig.json for strict mode setting find . -name "tsconfig.json" -type f | head -5

Repository: vtex/faststore

Length of output: 228

🏁 Script executed:

# Look at tsconfig in the packages/core directory cat packages/tsconfig.json 2>/dev/null || cat tsconfig.json 2>/dev/null | head -50

Repository: vtex/faststore

Length of output: 634

🏁 Script executed:

# Check how GraphqlVtexContextFactory uses the options rg "GraphqlVtexContextFactory" -B 3 -A 10

Repository: vtex/faststore

Length of output: 12467

Span lifecycle ends prematurely, catch block lacks return in strict mode, and carrier type needs explicit definition.

Three stacked problems in this function:

Span ends before work begins. span.end() fires in the finally block immediately after return options resolves, but the actual GraphQL work (including the propagation context extraction in GraphqlVtexContextFactory) happens later. Child spans created from __otelContext won't have a live parent, defeating distributed tracing. The span should wrap the entire request lifecycle, not this initialization step.

Catch block missing return violates noImplicitReturns. With strict mode enabled, all code paths must explicitly return a value matching Promise<T>. If an error is caught, the function falls through without returning, yielding undefined instead. Either rethrow or remove the try/catch—there's no error recovery logic here.

satisfies T narrows __otelContext to {}. When you mutate it with OTELAPI.propagation.inject(options.OTEL.__otelContext, context), TypeScript loses type info. Explicitly type it as Record<string, string> to match what the propagation API expects.

🩹 Recommended fix (propagation only)

- const options = { - ...apiOptions, - OTEL: { - __otelContext: {}, - enabled: true, - }, - } satisfies T - - const tracer = OTELAPI.trace.getTracer('@faststore/core') - const span = tracer.startSpan('@faststore/core graphql') - - const context = OTELAPI.trace.setSpan(OTELAPI.context.active(), span) - OTELAPI.propagation.inject(options.OTEL.__otelContext, context) - - try { - return options as T - } catch (error) { - span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR }) - span?.recordException(error) - } finally { - span?.end() - } + const carrier: Record<string, string> = {} + OTELAPI.propagation.inject(OTELAPI.context.active(), carrier) + + return { + ...apiOptions, + OTEL: { + __otelContext: carrier, + enabled: true, + }, + } as T

If a request-scoped parent span is genuinely needed, lift startSpan and span.end() to the caller (e.g., the HTTP route handler) so it wraps the full await graphqlHandler(...) execution.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export async function withTraceClient<T extends APIOptions = typeof apiOptions>(

apiOptions: T

): Promise<T> {

const OTEL = {}

getTraceClient(

apiOptions?.discoveryConfig?.analytics?.serviceName ?? name

)?.inject(OTEL)

// Safe guard in dev mode to prevent the

// global scope to be erased in hot-module-reload.

await getTelemetryClient({

serviceName: storeConfig.analytics?.serviceName ?? 'faststore',

version,

account: storeConfig.api.storeId,

clientName: storeConfig.api.storeId,

packageName: name,

})

return {

const options = {

...apiOptions,

OTEL: {

...OTEL,

enabled: storeConfig.analytics?.otelEnabled?.toString() === 'true',

__otelContext: {},

enabled: true,

},

} as T

} satisfies T

const tracer = OTELAPI.trace.getTracer('@faststore/core')

const span = tracer.startSpan('@faststore/core graphql')

const context = OTELAPI.trace.setSpan(OTELAPI.context.active(), span)

OTELAPI.propagation.inject(options.OTEL.__otelContext, context)

try {

return options as T

} catch (error) {

span?.setStatus({ code: OTELAPI.SpanStatusCode.ERROR })

span?.recordException(error)

} finally {

span?.end()

}

}

export async function withTraceClient<T extends APIOptions = typeof apiOptions>(

apiOptions: T

): Promise<T> {

// Safe guard in dev mode to prevent the

// global scope to be erased in hot-module-reload.

await getTelemetryClient({

serviceName: storeConfig.analytics?.serviceName ?? 'faststore',

version,

account: storeConfig.api.storeId,

clientName: storeConfig.api.storeId,

packageName: name,

})

const carrier: Record<string, string> = {}

OTELAPI.propagation.inject(OTELAPI.context.active(), carrier)

return {

...apiOptions,

OTEL: {

__otelContext: carrier,

enabled: true,

},

} as T

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/server/options.ts` around lines 36 - 71, The withTraceClient function currently starts and ends a span inside initialization (span in withTraceClient) which dies before request handling (so child spans from GraphqlVtexContextFactory lose their parent), has a try/catch that can fall through without returning in strict mode, and narrows options.OTEL.__otelContext to {} which breaks propagation typing; fix by: stop starting/ending a request-scoped span inside withTraceClient (move startSpan/span.end to the HTTP/graphql request handler so the span wraps the full request lifecycle), ensure the function always returns or rethrows in the catch (e.g., remove the try/catch or rethrow the error after recording it in span), and change the OTEL carrier declaration to an explicit Record<string,string> (set options.OTEL.__otelContext type to Record<string,string>) so OTELAPI.propagation.inject accepts it.

coderabbitai · 2026-05-11T15:20:30Z

+export async function getLogger(
+  client: TelemetryClient,
+  opt: { serviceName: string; client: string }
+) {
+  if (global.fsDiagnostics.LOGGER_CLIENT)
+    return global.fsDiagnostics.LOGGER_CLIENT
+
+  const logger = await client.newLogsClient({
+    exporter: await setupLogsExporter(),
+  })
+
+  overrideConsole(logger, opt)


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The singleton console proxy hard-codes the first caller’s metadata.

The proxy closes over the first opt.serviceName / opt.client, and later getLogger(...) calls just return the cached singleton. If different services initialize diagnostics in the same process, every forwarded console entry is attributed to the first one.

Also applies to: 45-76

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/diagnostics/src/logger.ts` around lines 27 - 38, The current getLogger implementation caches a single logger in global.fsDiagnostics.LOGGER_CLIENT which captures and reuses the first call's opt.serviceName/opt.client for all subsequent callers; change this to cache loggers per unique caller (e.g., key = `${opt.serviceName}:${opt.client}`) instead of a single global value so each service/client pair gets its own LogsClient and console proxy. Update getLogger to look up and return from a map (or object) stored on global.fsDiagnostics (instead of LOGGER_CLIENT), create a new logger and call overrideConsole(logger, opt) only for the missing key, and ensure overrideConsole is passed the current opt so metadata is not closed over by the first initialization. Ensure references to getLogger, global.fsDiagnostics.LOGGER_CLIENT, overrideConsole, opt.serviceName and opt.client are updated accordingly.

coderabbitai · 2026-05-11T15:20:30Z

+  if (global.fsDiagnostics.LOGGER_CLIENT)
+    return global.fsDiagnostics.LOGGER_CLIENT
+
+  const logger = await client.newLogsClient({
+    exporter: await setupLogsExporter(),
+  })
+
+  overrideConsole(logger, opt)
+
+  global.fsDiagnostics.LOGGER_CLIENT ??= logger


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cache the in-flight logger initialization.

Concurrent callers can all pass the LOGGER_CLIENT check before it is assigned, so this can create multiple exporters/clients and wrap globalThis.console more than once. That leads to duplicated log writes and double-formatted console output.

Suggested change

+let loggerPromise: Promise<LogClient> | undefined + export async function getLogger( client: TelemetryClient, opt: { serviceName: string; client: string } ) { if (global.fsDiagnostics.LOGGER_CLIENT) return global.fsDiagnostics.LOGGER_CLIENT - - const logger = await client.newLogsClient({ - exporter: await setupLogsExporter(), - }) - - overrideConsole(logger, opt) - - global.fsDiagnostics.LOGGER_CLIENT ??= logger - - return logger + if (!loggerPromise) { + loggerPromise = (async () => { + const logger = await client.newLogsClient({ + exporter: await setupLogsExporter(), + }) + + overrideConsole(logger, opt) + global.fsDiagnostics.LOGGER_CLIENT = logger + + return logger + })().catch((error) => { + loggerPromise = undefined + throw error + }) + } + + return loggerPromise }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/diagnostics/src/logger.ts` around lines 31 - 40, Multiple concurrent callers can pass the initial global.fsDiagnostics.LOGGER_CLIENT check and each start creating exporters/clients; fix this by caching the in-flight initialization promise in global.fsDiagnostics.LOGGER_CLIENT before awaiting client.newLogsClient so other callers await the same promise. Concretely: when entering the init, assign a promise (that calls setupLogsExporter(), client.newLogsClient(...), then calls overrideConsole(logger) and returns logger) to global.fsDiagnostics.LOGGER_CLIENT immediately; await that promise to get the final logger; on rejection ensure you clear global.fsDiagnostics.LOGGER_CLIENT so future attempts can retry. Use the existing symbols client.newLogsClient, setupLogsExporter, overrideConsole and global.fsDiagnostics.LOGGER_CLIENT to implement this change.

…ot auth at aws

Co-authored-by: Fanny Chien <fanny.chien@vtex.com>

…stry

hellofanny · 2026-05-27T14:04:45Z

+  - `@faststore_account_name: required_trace_account_name`
+  - `@faststore_environment:production|development`
+
+


pesquisar por @faststore_environment="development"

Co-authored-by: Fanny Chien <fanny.chien@vtex.com>

…ge bundle

…nal docs.

ommeirelles requested a review from a team as a code owner May 6, 2026 22:39

ommeirelles requested review from eduardoformiga and lemagnetic and removed request for a team May 6, 2026 22:39

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread packages/diagnostics/src/globals.ts

Comment thread packages/diagnostics/src/logger.ts

Comment thread packages/diagnostics/src/start.ts

Comment thread packages/diagnostics/src/tracer.ts

eduardoformiga force-pushed the dev branch 3 times, most recently from 6c7ef8e to 4d4ac3a Compare May 7, 2026 15:18

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

hellofanny reviewed May 13, 2026

View reviewed changes

Comment thread packages/core/src/instrumentation.ts Outdated

ommeirelles and others added 16 commits May 18, 2026 11:15

feat: refactoring and enabling diagnostics logger and traces

d713a5d

chore: changing CI steps

70103ab

chore: changing preview CI steps

97f8798

chore: changing preview CI steps

3baff9a

chore: changing preview CI steps and adding docs

7e5c03f

chore: removing codesandbox release as it will always fail cause is n…

b1e2cd9

…ot auth at aws

feat!: bump to v4

f06f5bf

refactor: fixing coderabbit comments.

6a423bd

refactor: fixing typos

9b2008f

refactor: refactoring telemetry

c82a832

refactor: refactoring telemetry

2125830

refactor: enabling by default and setting correct url for observability

2aff93a

fix: comsunming from wrong place

ed6d039

fix: logger assign

999a8de

fix: wrong propagation call parameters call

a65f0c7

Update packages/core/src/instrumentation.ts

ffde67e

Co-authored-by: Fanny Chien <fanny.chien@vtex.com>

ommeirelles force-pushed the feat/diagnostics branch from 92e3cc5 to ffde67e Compare May 18, 2026 14:28

chore: sonarqube code quality error in installation from private regi…

131b588

…stry

chore: adding awsAccountId to sonarqube

99f2e32

hellofanny reviewed May 20, 2026

View reviewed changes

Comment thread docs/install.md

ommeirelles added 6 commits May 25, 2026 16:56

refactor:changing env var auth to match sonarqube AWS auth config

9f28d3f

docs: renaming doc

0d4a2d1

chore: testing sonarqube

a41d32f

chore: testing

98c0d63

chore: testing

352cc52

chore: testing

989a2b5

hellofanny reviewed May 27, 2026

View reviewed changes

Comment thread docs/observability.md Outdated

ommeirelles and others added 18 commits June 1, 2026 09:08

chore: testing sonarqube CI

af12992

chore: testing sonarqube CI

9ce5d77

Update docs/observability.md

8ebd338

Co-authored-by: Fanny Chien <fanny.chien@vtex.com>

refactor: making private packages vendored and bundled in final packa…

873487c

…ge bundle

chore: testing preinstall

96f266b

chore: fixing preinstall on CI flow

4177b58

chore: fix logs on preinstall CI flow

8566832

chore: fix installs on workspace to allow override of other deps

2d6ec34

chore: commiting lockfile

6d3b68d

test: fixing failing tests

dc8c30c

Merge remote-tracking branch 'origin/dev' into feat/diagnostics

5f5b5b8

refactor: simplifying ci script

d551039

feat: making CA_TOKEN maks on github

c58355a

doc: removing someway sensitive documentation and linking it in inter…

60af8e5

…nal docs.

chore: changing sonarqube scripts

0153b6a

fix: vendor script

80013a1

chore: privateRegistry: observability in sonarqube

e2c9ad0

chore: rollback privateRegistry to internal-pnpm

219085b

renatomaurovtex mentioned this pull request Jun 9, 2026

fix: paginate listEntries with scroll #3384

Merged

6 tasks

		- `@faststore_account_name: required_trace_account_name`
		- `@faststore_environment:production\|development`

Conversation

ommeirelles commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

pkg-pr-new Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hellofanny May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ommeirelles commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading

pkg-pr-new Bot commented May 6, 2026 •

edited

Loading