[BugFix][Qwen3-Omni]Fixed the issue of incorrect answers for single words.#2239
[BugFix][Qwen3-Omni]Fixed the issue of incorrect answers for single words.#2239amy-why-3459 wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18feaf0390
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| keepalive_timeout=60, | ||
| enable_cleanup_closed=True, | ||
| force_close=False, | ||
| force_close=True, |
There was a problem hiding this comment.
Keep benchmark HTTP connections reusable
In benchmark() this change sets aiohttp.TCPConnector(..., force_close=True), which closes every connection after each request. That disables connection reuse while this block is explicitly configured to benchmark request throughput/latency and even comments that connections are reused, so HTTPS/TCP handshakes get added to nearly every request and the measured model-serving metrics become systematically inflated/noisy under load.
Useful? React with 👍 / 👎.
|
Curiosity, any explanation about bug and root cause? |
Using the prompt "What is the capital of France? Answer in one word.", the talker's answer will exceed one word. The root cause is that when the answer is only one word, |
|
LGTM. Add some comments as reminder will be better. |
| keepalive_timeout=60, | ||
| enable_cleanup_closed=True, | ||
| force_close=False, | ||
| force_close=True, |
There was a problem hiding this comment.
And why do we need this change?
There was a problem hiding this comment.
To avoid server connect errors during high-concurrency benchmark runs
|
Besides, I will also recommend to add a simple UT to avoid others change the expected behavior mistakenly, if it's possible. Hmm... Maybe comment is enough. |
| keepalive_timeout=60, | ||
| enable_cleanup_closed=True, | ||
| force_close=False, | ||
| force_close=True, |
There was a problem hiding this comment.
When a benchmark sends a large number of concurrent requests, the connection can be reused. Why delete it?
There was a problem hiding this comment.
Reusing connections may result in a server disconnect error.
We added an E2E use case for the caregiving scenario. #2097 |
|
@yenuo26 PTAL |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
pytest -sv test_qwen3_omni_expansion.py::test_one_word_prompt_001 -m "advanced_model" --run-level "advanced_model"
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)