Skip to content

Fix RotateEvent and flaky tests#553

Merged
morgo merged 8 commits intoblock:mainfrom
morgo:mtocker-fix-racey-test
Jan 9, 2026
Merged

Fix RotateEvent and flaky tests#553
morgo merged 8 commits intoblock:mainfrom
morgo:mtocker-fix-racey-test

Conversation

@morgo
Copy link
Collaborator

@morgo morgo commented Jan 6, 2026

A Pull Request should be associated with an Issue.

We wish to have discussions in Issues. A single issue may be targeted by multiple PRs.
If you're offering a new feature or fixing anything, we'd like to know beforehand in Issues,
and potentially we'll be able to point development in a particular direction.
Further notes in https://github.com/block/spirit/blob/main/.github/CONTRIBUTING.md

The first issue:
The cause is the cancel() running too early and leaving a context cancelled error. If we run it after Close() is ensures a cleaner finish. Fixes #552

The second issue:
Spirit incorrectly handles RotateEvent, leading to events potentially being skipped in resuming from a checkpoint. Fixes #548

This can be a serious issue, but the steps required to reproduce it are quite specific. See #548 for discussion on it. I will merge this, deploy it to our staging environment and then likely cherry pick a release-branch version of it.

@morgo morgo changed the title fix flaky test fix flaky tests Jan 6, 2026
@morgo morgo requested a review from Copilot January 6, 2026 16:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes two flaky test issues: (1) race conditions in tests caused by calling cancel() before Close(), and (2) timeout issues during the cutover phase due to an excessively high trivial threshold for unflushed changes.

  • Reduced binlogTrivialThreshold from 10,000 to 1,000 changes to prevent flush timeouts during the cutover lock phase
  • Reordered Close() to execute before cancel() in all test functions to avoid context cancellation race conditions
  • Added comprehensive documentation explaining the tradeoffs of the lower threshold value

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/repl/client.go Reduced binlogTrivialThreshold constant from 10,000 to 1,000 and added detailed documentation explaining the tradeoff between cutover responsiveness and lock duration
pkg/migration/runner_resume_test.go Fixed test flakiness by calling Close() before cancel() in four test functions (TestCheckpointResumeDuringChecksum, TestResumeFromCheckpointE2E, TestResumeFromCheckpointCompositeVarcharPK, TestResumeFromCheckpointStrict, TestResumeFromCheckpointE2EWithManualSentinel) to prevent race conditions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@morgo morgo force-pushed the mtocker-fix-racey-test branch from 58cc67d to 28240d6 Compare January 6, 2026 17:00
@morgo morgo force-pushed the mtocker-fix-racey-test branch from 28240d6 to 2027a09 Compare January 6, 2026 17:01
@morgo morgo marked this pull request as draft January 6, 2026 17:32
@morgo morgo force-pushed the mtocker-fix-racey-test branch from 5be46a2 to 1926f54 Compare January 7, 2026 18:39
@morgo morgo force-pushed the mtocker-fix-racey-test branch from 1926f54 to c97a0df Compare January 7, 2026 18:48
@morgo morgo force-pushed the mtocker-fix-racey-test branch from c1ee5c0 to e568300 Compare January 7, 2026 19:26
@morgo morgo changed the title fix flaky tests Fix RotateEvent and flaky tests Jan 9, 2026
@morgo morgo marked this pull request as ready for review January 9, 2026 17:22
@morgo morgo force-pushed the mtocker-fix-racey-test branch from 7f58d63 to fcd0a0f Compare January 9, 2026 17:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@morgo morgo merged commit 7cb3809 into block:main Jan 9, 2026
13 checks passed
@morgo morgo deleted the mtocker-fix-racey-test branch January 9, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: TestCheckpointResumeDuringChecksum spirit incorrectly handles replication.RotateEvent

2 participants