Skip to content

[Proposal] Fix breadcrumb race condition#8041

Open
jrodiz wants to merge 1 commit intofirebase:mainfrom
jrodiz:fix/jrc--8034.Fix.log.breadcrumb.race.condition
Open

[Proposal] Fix breadcrumb race condition#8041
jrodiz wants to merge 1 commit intofirebase:mainfrom
jrodiz:fix/jrc--8034.Fix.log.breadcrumb.race.condition

Conversation

@jrodiz
Copy link
Copy Markdown
Contributor

@jrodiz jrodiz commented Apr 16, 2026

PR: Fix breadcrumb race condition — log() entry dropped before logException() reads it

Summary

Fixes: #8034

  • Firebase.crashlytics.log() breadcrumbs were silently dropped when called from a background thread immediately before recordException().
  • Root cause: a race condition between the common and diskWrite Crashlytics workers caused the non-fatal event to snapshot the log before the breadcrumb was written to disk.
  • Fix: one-line change from common.submit() to common.submitTask() in CrashlyticsCore.log(), which suspends the common worker until the disk write completes.

Root Cause

CrashlyticsCore.log() used a double-dispatch pattern:

// BEFORE (buggy)
crashlyticsWorkers.common.submit(
    () -> {
        crashlyticsWorkers.diskWrite.submit(() -> controller.writeToLog(timestamp, msg));
    });

CrashlyticsWorker.submit(Runnable) chains the runnable onto the common queue and marks the task complete as soon as the runnable returns — not when the inner diskWrite task finishes. So the sequence was:

  1. log("breadcrumb") → adds task C1 to common: "enqueue writeToLog on diskWrite"
  2. logException(ex) → adds task C2 to common: "call writeNonFatalException"
  3. common runs C1: calls diskWrite.submit(writeToLog) → C1 completes immediately
  4. common runs C2: calls writeNonFatalException → calls logFileManager.getLogString()
    — the diskWrite task D1 has been queued but not yet run → log is empty → breadcrumb missing
  5. diskWrite eventually runs D1: writeToLog — but the event was already captured without it

The main-thread workaround happened to work because Handler.post {} batched both calls into a single posted block, accidentally serializing them in a way that masked the race.

Fix

// AFTER (fixed)
crashlyticsWorkers.common.submitTask(
    () -> crashlyticsWorkers.diskWrite.submit(() -> controller.writeToLog(timestamp, msg)));

submitTask(Callable<Task<T>>) sets the common worker's internal tail to the Task returned by diskWrite.submit(...). Subsequent common tasks (like logException) are chained after that Task, so they cannot start until the disk write has completed.

Test Plan

New test: CrashlyticsCoreTest#testLog_breadcrumbIsWrittenBeforeLogExceptionReadsIt

  • Calls log("test breadcrumb") immediately followed by logException(exception) — reproducing the exact pattern reported in the issue.
  • Awaits only crashlyticsWorkers.common (not diskWrite separately). With submitTask, awaiting common guarantees the diskWrite has also completed, so the log MUST be on disk.
  • Asserts logFileManager.getLogString() is non-null and contains the breadcrumb.

Without the fix, awaiting common would NOT guarantee that diskWrite finished, making this assertion unreliable (the log might be empty at assertion time).

Existing tests: All existing tests in :firebase-crashlytics compile and run clean.

Risks / Trade-offs

  • Slightly more serialized hot path: common now suspends until each writeToLog disk write completes before processing the next task. In practice, writeToLog is fast (a small QueueFile append), and the previous behavior (fire-and-forget) was already incorrect per the documented contract ("queuing up on common worker to maintain the order").
  • No deadlock risk: diskWrite never submits back to common, so no cycle.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

@jrodiz
Copy link
Copy Markdown
Contributor Author

jrodiz commented Apr 16, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a race condition where breadcrumbs logged immediately before a non-fatal exception might be missing from the crash report. The fix replaces a simple task submission with a task that ensures the common worker waits for the disk write to complete before proceeding. A regression test and a helper method for testing were also added. The review feedback suggests refining the terminology in the code comments to avoid potential confusion with Java's thread suspension mechanisms.

Comment on lines +333 to +334
// submitTask suspends the common worker until the diskWrite task completes, ensuring
// that subsequent tasks on the common worker (e.g. logException) see this log entry.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The term 'suspends' might be slightly misleading in a Java context, as it is often associated with Thread.suspend() or Kotlin coroutines. Since this is a serial worker queue, 'waits for' or 'chains' might be more accurate to describe how submitTask prevents subsequent tasks from starting until the returned task completes.

Suggested change
// submitTask suspends the common worker until the diskWrite task completes, ensuring
// that subsequent tasks on the common worker (e.g. logException) see this log entry.
// submitTask ensures the common worker waits for the diskWrite task to complete, ensuring
// that subsequent tasks on the common worker (e.g. logException) see this log entry.

@lehcar09 lehcar09 requested a review from mrober April 16, 2026 15:14
@jrodiz jrodiz force-pushed the fix/jrc--8034.Fix.log.breadcrumb.race.condition branch from fdc0273 to 585f551 Compare April 16, 2026 15:45
mrober
mrober previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Contributor

@mrober mrober left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the changelog

Comment thread firebase-crashlytics/CHANGELOG.md Outdated
@@ -1,5 +1,7 @@
# Unreleased

- [fixed] Fixed more strict mode violations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy/paste error?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like "Fixed race condition that caused logs from background threads to not be attached to reports in some cases [#8034]"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey yep c/p error

…) reads it

CrashlyticsCore.log() used common.submit() which completed as soon as the
diskWrite task was enqueued, not when it finished. This allowed the subsequent
logException() common task to call logFileManager.getLogString() before
writeToLog() had run on the diskWrite worker, silently dropping the breadcrumb
from the non-fatal report.

Fix: change to common.submitTask() so the common worker suspends until the
diskWrite task resolves before dispatching the next item (e.g. logException).

Adds a regression test that calls log() immediately before logException(),
awaits only the common worker, and asserts the breadcrumb is present on disk.

Fixes firebase#8034
@jrodiz jrodiz force-pushed the fix/jrc--8034.Fix.log.breadcrumb.race.condition branch from 585f551 to 3f046e3 Compare April 16, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants