test: split Online DDL vreplication stress suite shard by cursor[bot] · Pull Request #98 · planetscale/vitess

cursor · 2026-03-18T19:41:10Z

Description

I inspected the latest successful main runs in vitessio/vitess with gh:

unit_test.yml: run 23124032519
cluster_endtoend.yml: run 23246552588

Across those two workflows, the absolute slowest cluster shard in that successful run was ers_prs_newfeatures_heavy at 1434s, but I found open performance PRs in planetscale/vitess already targeting that exact hotspot (#95, #96, #97). To avoid duplicating active work, I moved to the next unresolved outlier in the same inspected run: Run endtoend tests on Cluster (onlineddl_vrepl_stress_suite) at 1432s.

For that unresolved shard, the upstream CI timings show:

total job time: 1432s
Setup MySQL: 955s
Run cluster endtoend test: 406s
slowest test in the shard: go/test/endtoend/onlineddl/vrepl_stress_suite.TestVreplStressSchemaChanges at 375.60s

The core problem is that the shard runs one very large top-level e2e test with dozens of schema-change stress cases in a single matrix entry, so the whole package stays serialized inside one job.

This change keeps the exact same end-to-end assertions and package, but splits that long top-level test into two balanced top-level tests and maps them to two independent CI shards:

onlineddl_vrepl_stress_suite_group1
onlineddl_vrepl_stress_suite_group2

That preserves coverage while letting GitHub Actions schedule the two halves in parallel. Based on the upstream subtest timings from the inspected successful run, the two groups are balanced at roughly 188s and 186s of test body time, which should cut this hotspot below the current ~24m outlier level and remove it as the slowest unresolved cluster shard.

Related Issue(s)

None.

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Deployment Notes

No deployment impact. This only changes CI scheduling for an existing e2e package.

Validation

I rebuilt the Vitess binaries and then ran each new shard through test.go with local e2e dependencies installed:

source build.env
rm -rf "$VTDATAROOT"/vtroot_*
go run test.go -docker=false -skip-build -follow -shard onlineddl_vrepl_stress_suite_group1

source build.env
rm -rf "$VTDATAROOT"/vtroot_*
go run test.go -docker=false -skip-build -follow -shard onlineddl_vrepl_stress_suite_group2

Results:

onlineddl_vrepl_stress_suite_group1: PASS Package ... (4m52.575s)
onlineddl_vrepl_stress_suite_group2: PASS Package ... (4m25.16s)

AI Disclosure

This PR was authored with GPT-5 assistance, using upstream CI log analysis plus local validation.

Signed-off-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mohamed Hamza <mhamza@fastmail.com>

test: split onlineddl stress suite shard

0a7d44f

Signed-off-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mohamed Hamza <mhamza@fastmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: split Online DDL vreplication stress suite shard#98

test: split Online DDL vreplication stress suite shard#98
cursor[bot] wants to merge 1 commit intomainfrom
cursor/vitess-ci-performance-eee7

cursor bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cursor bot commented Mar 18, 2026

Description

Related Issue(s)

Checklist

Deployment Notes

Validation

AI Disclosure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant