[Accuracy Benchmark]Add VBench video accuracy benchmarks t2v and i2v by david6666666 · Pull Request #2209 · vllm-project/vllm-omni

david6666666 · 2026-03-26T03:53:45Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add video accuracy benchmark scaffolding for vLLM-Omni with:

VBench runner for text_to_video
VBench-I2V runner for image_to_video
shared /v1/videos client and benchmark utilities
partial-score summarization for L4 smoke coverage
benchmark unit tests and L4 e2e smoke test entrypoints

Scope of this PR:

add benchmarks/accuracy/common.py
add benchmarks/accuracy/text_to_video/run_vbench.py
add benchmarks/accuracy/image_to_video/run_vbench_i2v.py
add benchmark utility tests
add tests/e2e/accuracy/ smoke test scaffolding for VBench and VBench-I2V

Behavior notes:

text_to_video uses VBench
image_to_video uses VBench-I2V
import is based on a repo checkout via --vbench-root / VBENCH_REPO_ROOT, not vendoring
L4 smoke only reports partial benchmark summaries and does not claim official total benchmark scores
this PR does not modify .buildkite/test-merge.yml or .buildkite/test-ready.yml

Test Plan

Static validation:
- python -m py_compile benchmarks/accuracy/common.py benchmarks/accuracy/text_to_video/run_vbench.py benchmarks/accuracy/image_to_video/run_vbench_i2v.py tests/benchmarks/test_video_accuracy_bench_utils.py tests/e2e/accuracy/conftest.py tests/e2e/accuracy/test_vbench_t2v_l4_smoke.py tests/e2e/accuracy/test_vbench_i2v_l4_smoke.py
Targeted local assertions:
- validate summarize_vbench_results
- validate summarize_vbench_i2v_results
- validate balanced subset selection
- validate vbench_import_context prefers explicit repo checkout over preinstalled modules
Remote validation to run after merge or in a proper benchmark environment:
- pytest tests/benchmarks/test_video_accuracy_bench_utils.py
- pytest tests/e2e/accuracy/test_vbench_t2v_l4_smoke.py
- pytest tests/e2e/accuracy/test_vbench_i2v_l4_smoke.py

Test Result

Passed:
- python -m py_compile on all newly added benchmark and test files
- direct Python validation for partial-summary logic, balanced selection logic, parser construction, and VBench repo import override behavior
Not run locally:
- pytest for the new tests, because the local environment fails during existing repo test bootstrap with ModuleNotFoundError: No module named 'soundfile' from tests/conftest.py
Needs remote verification:
- end-to-end VBench / VBench-I2V execution against a real video serving environment
- L4 smoke runs with actual models and VBench repo checkout / dataset availability

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: David Chen <530634352@qq.com>

david6666666 added 4 commits March 26, 2026 11:49

Add VBench video accuracy benchmarks

9bd308f

Signed-off-by: David Chen <530634352@qq.com>

Switch VBench smoke tests to H100

2442e97

Signed-off-by: David Chen <530634352@qq.com>

Rename VBench smoke tests for H100

4090cf3

Signed-off-by: David Chen <530634352@qq.com>

update

e3eecc0

Signed-off-by: David Chen <530634352@qq.com>

david6666666 mentioned this pull request Mar 26, 2026

[RFC]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#167

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Accuracy Benchmark]Add VBench video accuracy benchmarks t2v and i2v#2209

[Accuracy Benchmark]Add VBench video accuracy benchmarks t2v and i2v#2209
david6666666 wants to merge 4 commits intovllm-project:mainfrom
david6666666:codex/accuracy-vbench-video

david6666666 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

david6666666 commented Mar 26, 2026

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant