Skip to content

[Accuracy Benchmark]Add VBench video accuracy benchmarks t2v and i2v#2209

Draft
david6666666 wants to merge 4 commits intovllm-project:mainfrom
david6666666:codex/accuracy-vbench-video
Draft

[Accuracy Benchmark]Add VBench video accuracy benchmarks t2v and i2v#2209
david6666666 wants to merge 4 commits intovllm-project:mainfrom
david6666666:codex/accuracy-vbench-video

Conversation

@david6666666
Copy link
Collaborator

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add video accuracy benchmark scaffolding for vLLM-Omni with:

  • VBench runner for text_to_video
  • VBench-I2V runner for image_to_video
  • shared /v1/videos client and benchmark utilities
  • partial-score summarization for L4 smoke coverage
  • benchmark unit tests and L4 e2e smoke test entrypoints

Scope of this PR:

  • add benchmarks/accuracy/common.py
  • add benchmarks/accuracy/text_to_video/run_vbench.py
  • add benchmarks/accuracy/image_to_video/run_vbench_i2v.py
  • add benchmark utility tests
  • add tests/e2e/accuracy/ smoke test scaffolding for VBench and VBench-I2V

Behavior notes:

  • text_to_video uses VBench
  • image_to_video uses VBench-I2V
  • import is based on a repo checkout via --vbench-root / VBENCH_REPO_ROOT, not vendoring
  • L4 smoke only reports partial benchmark summaries and does not claim official total benchmark scores
  • this PR does not modify .buildkite/test-merge.yml or .buildkite/test-ready.yml

Test Plan

  • Static validation:
    • python -m py_compile benchmarks/accuracy/common.py benchmarks/accuracy/text_to_video/run_vbench.py benchmarks/accuracy/image_to_video/run_vbench_i2v.py tests/benchmarks/test_video_accuracy_bench_utils.py tests/e2e/accuracy/conftest.py tests/e2e/accuracy/test_vbench_t2v_l4_smoke.py tests/e2e/accuracy/test_vbench_i2v_l4_smoke.py
  • Targeted local assertions:
    • validate summarize_vbench_results
    • validate summarize_vbench_i2v_results
    • validate balanced subset selection
    • validate vbench_import_context prefers explicit repo checkout over preinstalled modules
  • Remote validation to run after merge or in a proper benchmark environment:
    • pytest tests/benchmarks/test_video_accuracy_bench_utils.py
    • pytest tests/e2e/accuracy/test_vbench_t2v_l4_smoke.py
    • pytest tests/e2e/accuracy/test_vbench_i2v_l4_smoke.py

Test Result

  • Passed:
    • python -m py_compile on all newly added benchmark and test files
    • direct Python validation for partial-summary logic, balanced selection logic, parser construction, and VBench repo import override behavior
  • Not run locally:
    • pytest for the new tests, because the local environment fails during existing repo test bootstrap with ModuleNotFoundError: No module named 'soundfile' from tests/conftest.py
  • Needs remote verification:
    • end-to-end VBench / VBench-I2V execution against a real video serving environment
    • L4 smoke runs with actual models and VBench repo checkout / dataset availability

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant