test: split onlineddl vrepl suite shard by cursor[bot] · Pull Request #99 · planetscale/vitess

cursor · 2026-03-19T14:09:15Z

Description

I inspected the latest successful main runs in vitessio/vitess using gh:

unit_test.yml: run 23294293582
cluster_endtoend.yml: run 23294293585

Across those two workflows, the slowest job was Run endtoend tests on Cluster (onlineddl_vrepl_suite) at 34m22s. The slowest unit-test job in the same investigation window was only 15m38s, so this cluster shard was the real bottleneck.

From the upstream job log for onlineddl_vrepl_suite:

Setup MySQL: 15m27s
Run cluster endtoend test: 17m51s
package time: go/test/endtoend/onlineddl/vrepl_suite in 17m20.031s
slowest top-level test: TestVreplSuiteSchemaChanges in 1023.01s

The root cause is that go/test/endtoend/onlineddl/vrepl_suite puts 129 schema-change cases behind a single top-level e2e test, so all of that work is serialized inside one shard.

I also checked open PRs in both vitessio/vitess and planetscale/vitess before making this change. There is active draft work for onlineddl_vrepl_stress_suite, but nothing open for onlineddl_vrepl_suite itself.

This change keeps the same end-to-end coverage but splits TestVreplSuiteSchemaChanges into two balanced top-level tests and maps them to two independent CI shards:

onlineddl_vrepl_suite_group1
onlineddl_vrepl_suite_group2

Using the upstream per-subtest timings from the inspected successful run, the two groups balance almost exactly evenly at 510.94s and 511.03s of test body time. That should reduce the current ~34m outlier shard to roughly ~24m per shard in CI, which brings it below the current vtorc hotspot while preserving the existing assertions.

Related Issue(s)

None.

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Deployment Notes

No deployment impact. This only changes CI scheduling for an existing end-to-end suite.

AI Disclosure

This PR was authored with GPT-5 assistance. I used AI to inspect the upstream CI timings, prepare the shard split, run the local validation, and draft this PR description.

Local validation

After building Vitess and installing the same local e2e dependencies the cluster workflow expects, I ran both new shards through the Vitess e2e harness with a clean VTDATAROOT before each run:

source build.env
rm -rf "$VTDATAROOT"/vtroot_*
go run test.go -docker=false -skip-build -follow -shard onlineddl_vrepl_suite_group1

source build.env
rm -rf "$VTDATAROOT"/vtroot_*
go run test.go -docker=false -skip-build -follow -shard onlineddl_vrepl_suite_group2

Results:

onlineddl_vrepl_suite_group1: PASS Package ... (8m45.028s) / local.onlineddl_vrepl_suite_group1: PASSED in 8m46.9s
onlineddl_vrepl_suite_group2: PASS Package ... (8m49.361s) / local.onlineddl_vrepl_suite_group2: PASSED in 8m51.6s

Signed-off-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mohamed Hamza <mhamza@fastmail.com>

test: split onlineddl vrepl suite shard

e30ed47

Signed-off-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mohamed Hamza <mhamza@fastmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: split onlineddl vrepl suite shard#99

test: split onlineddl vrepl suite shard#99
cursor[bot] wants to merge 1 commit intomainfrom
cursor/vitess-test-optimization-3933

cursor bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cursor bot commented Mar 19, 2026

Description

Related Issue(s)

Checklist

Deployment Notes

AI Disclosure

Local validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant