Skip to content

reduce unit test duration for mini test profiler#217

Open
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:cri_ut_fix
Open

reduce unit test duration for mini test profiler#217
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:cri_ut_fix

Conversation

@chaojun-zhang
Copy link
Copy Markdown
Contributor

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Test Plan

export XPU_KERNEL_PYTEST_PROFILER=MINI
pytest -s -v tests/flash_attn/test_flash_attn_varlen_func.py
pytest -s -v tests/gdn_attn/test_gdn_attn.py

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: chzhang <chaojun.zhang@intel.com>
@chaojun-zhang chaojun-zhang marked this pull request as ready for review March 24, 2026 05:11
Copilot AI review requested due to automatic review settings March 24, 2026 05:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces the runtime of “mini” profiler test configurations by shrinking parameter grids used in two attention-related test suites.

Changes:

  • Shrink mini-parameterization for GDN attention tests (token count and batch size).
  • Shrink mini-parameterization for FlashAttention varlen+PagedKV tests (seq lens/head sizes/etc.).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
tests/gdn_attn/test_gdn_attn.py Updates MINI_PYTEST_PARAMS to run a smaller (faster) configuration.
tests/flash_attn/test_flash_attn_varlen_func.py Narrows MINI_PYTEST_PARAMS for the varlen paged-KV test to reduce runtime.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"seq_lens": [[(5, 18)]],
"head_size": [64],
"block_size": [64],
"num_heads": [(2)],
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_heads previously used a 2-tuple ((8, 2)), which strongly suggests the test code expects to unpack (num_q_heads, num_kv_heads) (or similar). Changing it to an int ((2) is just 2 in Python) is likely to break parametrization consumers that expect a tuple. Keep the parameter type consistent (e.g., provide a 2-tuple with reduced values) or update the downstream test logic to accept both scalar and tuple forms.

Suggested change
"num_heads": [(2)],
"num_heads": [(2, 2)],

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +34
"num_actual_tokens": [1],
"batch_size": [4],
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description template is still unfilled (Purpose/Test Result sections are blank), so it’s unclear what runtime reduction is expected and what the before/after results are. Please update the PR description with the intended goal (e.g., target runtime), and paste the measured before/after test timings for the provided test plan commands.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants