[Proposal] Fix breadcrumb race condition#8041
Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request addresses a race condition where breadcrumbs logged immediately before a non-fatal exception might be missing from the crash report. The fix replaces a simple task submission with a task that ensures the common worker waits for the disk write to complete before proceeding. A regression test and a helper method for testing were also added. The review feedback suggests refining the terminology in the code comments to avoid potential confusion with Java's thread suspension mechanisms.
| // submitTask suspends the common worker until the diskWrite task completes, ensuring | ||
| // that subsequent tasks on the common worker (e.g. logException) see this log entry. |
There was a problem hiding this comment.
The term 'suspends' might be slightly misleading in a Java context, as it is often associated with Thread.suspend() or Kotlin coroutines. Since this is a serial worker queue, 'waits for' or 'chains' might be more accurate to describe how submitTask prevents subsequent tasks from starting until the returned task completes.
| // submitTask suspends the common worker until the diskWrite task completes, ensuring | |
| // that subsequent tasks on the common worker (e.g. logException) see this log entry. | |
| // submitTask ensures the common worker waits for the diskWrite task to complete, ensuring | |
| // that subsequent tasks on the common worker (e.g. logException) see this log entry. |
fdc0273 to
585f551
Compare
| @@ -1,5 +1,7 @@ | |||
| # Unreleased | |||
|
|
|||
| - [fixed] Fixed more strict mode violations | |||
There was a problem hiding this comment.
Maybe something like "Fixed race condition that caused logs from background threads to not be attached to reports in some cases [#8034]"
…) reads it CrashlyticsCore.log() used common.submit() which completed as soon as the diskWrite task was enqueued, not when it finished. This allowed the subsequent logException() common task to call logFileManager.getLogString() before writeToLog() had run on the diskWrite worker, silently dropping the breadcrumb from the non-fatal report. Fix: change to common.submitTask() so the common worker suspends until the diskWrite task resolves before dispatching the next item (e.g. logException). Adds a regression test that calls log() immediately before logException(), awaits only the common worker, and asserts the breadcrumb is present on disk. Fixes firebase#8034
585f551 to
3f046e3
Compare
PR: Fix breadcrumb race condition — log() entry dropped before logException() reads it
Summary
Fixes: #8034
Firebase.crashlytics.log()breadcrumbs were silently dropped when called from a background thread immediately beforerecordException().commonanddiskWriteCrashlytics workers caused the non-fatal event to snapshot the log before the breadcrumb was written to disk.common.submit()tocommon.submitTask()inCrashlyticsCore.log(), which suspends the common worker until the disk write completes.Root Cause
CrashlyticsCore.log()used a double-dispatch pattern:CrashlyticsWorker.submit(Runnable)chains the runnable onto the common queue and marks the task complete as soon as the runnable returns — not when the innerdiskWritetask finishes. So the sequence was:log("breadcrumb")→ adds task C1 tocommon: "enqueue writeToLog on diskWrite"logException(ex)→ adds task C2 tocommon: "call writeNonFatalException"commonruns C1: callsdiskWrite.submit(writeToLog)→ C1 completes immediatelycommonruns C2: callswriteNonFatalException→ callslogFileManager.getLogString()— the diskWrite task D1 has been queued but not yet run → log is empty → breadcrumb missing
diskWriteeventually runs D1:writeToLog— but the event was already captured without itThe main-thread workaround happened to work because
Handler.post {}batched both calls into a single posted block, accidentally serializing them in a way that masked the race.Fix
submitTask(Callable<Task<T>>)sets the common worker's internal tail to theTaskreturned bydiskWrite.submit(...). Subsequent common tasks (likelogException) are chained after that Task, so they cannot start until the disk write has completed.Test Plan
New test:
CrashlyticsCoreTest#testLog_breadcrumbIsWrittenBeforeLogExceptionReadsItlog("test breadcrumb")immediately followed bylogException(exception)— reproducing the exact pattern reported in the issue.crashlyticsWorkers.common(not diskWrite separately). WithsubmitTask, awaitingcommonguarantees thediskWritehas also completed, so the log MUST be on disk.logFileManager.getLogString()is non-null and contains the breadcrumb.Without the fix, awaiting
commonwould NOT guarantee thatdiskWritefinished, making this assertion unreliable (the log might be empty at assertion time).Existing tests: All existing tests in
:firebase-crashlyticscompile and run clean.Risks / Trade-offs
commonnow suspends until eachwriteToLogdisk write completes before processing the next task. In practice,writeToLogis fast (a small QueueFile append), and the previous behavior (fire-and-forget) was already incorrect per the documented contract ("queuing up on common worker to maintain the order").diskWritenever submits back tocommon, so no cycle.