reduce unit test duration for mini test profiler#217
reduce unit test duration for mini test profiler#217chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: chzhang <chaojun.zhang@intel.com>
There was a problem hiding this comment.
Pull request overview
Reduces the runtime of “mini” profiler test configurations by shrinking parameter grids used in two attention-related test suites.
Changes:
- Shrink mini-parameterization for GDN attention tests (token count and batch size).
- Shrink mini-parameterization for FlashAttention varlen+PagedKV tests (seq lens/head sizes/etc.).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/gdn_attn/test_gdn_attn.py | Updates MINI_PYTEST_PARAMS to run a smaller (faster) configuration. |
| tests/flash_attn/test_flash_attn_varlen_func.py | Narrows MINI_PYTEST_PARAMS for the varlen paged-KV test to reduce runtime. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "seq_lens": [[(5, 18)]], | ||
| "head_size": [64], | ||
| "block_size": [64], | ||
| "num_heads": [(2)], |
There was a problem hiding this comment.
num_heads previously used a 2-tuple ((8, 2)), which strongly suggests the test code expects to unpack (num_q_heads, num_kv_heads) (or similar). Changing it to an int ((2) is just 2 in Python) is likely to break parametrization consumers that expect a tuple. Keep the parameter type consistent (e.g., provide a 2-tuple with reduced values) or update the downstream test logic to accept both scalar and tuple forms.
| "num_heads": [(2)], | |
| "num_heads": [(2, 2)], |
| "num_actual_tokens": [1], | ||
| "batch_size": [4], |
There was a problem hiding this comment.
The PR description template is still unfilled (Purpose/Test Result sections are blank), so it’s unclear what runtime reduction is expected and what the before/after results are. Please update the PR description with the intended goal (e.g., target runtime), and paste the measured before/after test timings for the provided test plan commands.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.
Purpose
Test Plan
export XPU_KERNEL_PYTEST_PROFILER=MINI
pytest -s -v tests/flash_attn/test_flash_attn_varlen_func.py
pytest -s -v tests/gdn_attn/test_gdn_attn.py
Test Result
(Optional) Documentation Update
BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)