test: split Online DDL vreplication stress suite shard#98
Draft
cursor[bot] wants to merge 1 commit intomainfrom
Draft
test: split Online DDL vreplication stress suite shard#98cursor[bot] wants to merge 1 commit intomainfrom
cursor[bot] wants to merge 1 commit intomainfrom
Conversation
Signed-off-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mohamed Hamza <mhamza@fastmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
I inspected the latest successful
mainruns invitessio/vitesswithgh:unit_test.yml: run23124032519cluster_endtoend.yml: run23246552588Across those two workflows, the absolute slowest cluster shard in that successful run was
ers_prs_newfeatures_heavyat1434s, but I found open performance PRs inplanetscale/vitessalready targeting that exact hotspot (#95,#96,#97). To avoid duplicating active work, I moved to the next unresolved outlier in the same inspected run:Run endtoend tests on Cluster (onlineddl_vrepl_stress_suite)at1432s.For that unresolved shard, the upstream CI timings show:
1432sSetup MySQL:955sRun cluster endtoend test:406sgo/test/endtoend/onlineddl/vrepl_stress_suite.TestVreplStressSchemaChangesat375.60sThe core problem is that the shard runs one very large top-level e2e test with dozens of schema-change stress cases in a single matrix entry, so the whole package stays serialized inside one job.
This change keeps the exact same end-to-end assertions and package, but splits that long top-level test into two balanced top-level tests and maps them to two independent CI shards:
onlineddl_vrepl_stress_suite_group1onlineddl_vrepl_stress_suite_group2That preserves coverage while letting GitHub Actions schedule the two halves in parallel. Based on the upstream subtest timings from the inspected successful run, the two groups are balanced at roughly
188sand186sof test body time, which should cut this hotspot below the current~24moutlier level and remove it as the slowest unresolved cluster shard.Related Issue(s)
None.
Checklist
Deployment Notes
No deployment impact. This only changes CI scheduling for an existing e2e package.
Validation
I rebuilt the Vitess binaries and then ran each new shard through
test.gowith local e2e dependencies installed:Results:
onlineddl_vrepl_stress_suite_group1:PASS Package ... (4m52.575s)onlineddl_vrepl_stress_suite_group2:PASS Package ... (4m25.16s)AI Disclosure
This PR was authored with GPT-5 assistance, using upstream CI log analysis plus local validation.