[BugFix][Qwen3-Omni]Fixed the issue of incorrect answers for single words. by amy-why-3459 · Pull Request #2239 · vllm-project/vllm-omni

amy-why-3459 · 2026-03-26T14:27:55Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fixed the issue of incorrect answers for single words.
Fix the benchmark server disconnect error issue

Test Plan

pytest -sv test_qwen3_omni_expansion.py::test_one_word_prompt_001 -m "advanced_model" --run-level "advanced_model"

Test Result

--- Running test: test_one_word_prompt_001[async_chunk]
the avg e2e latency is: 0.8250952590024099
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.0022777009871788323
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.0022065509692765772
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.00216514099156484
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.0025836509885266423
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
PASSEDGPU cleanup disabled

INFO 03-27 02:31:51 [datasets.py:631] Sampling input_len from [100, 100] and output_len from [100, 100]
WARNING: vllm bench serve no longer sets temperature==0 (greedy) in requests by default. The default will be determined on the server side and can be model/API specific. For the old behavior, include --temperature=0.
Starting initial single prompt test run...
Skipping endpoint ready check.
Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 10
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [04:25<00:00,  2.66s/it]
tip: install termplotlib and gnuplot to plot the metrics
============ Serving Benchmark Result ============
Successful requests:                     100
Failed requests:                         0
Maximum request concurrency:             10
Benchmark duration (s):                  265.55
Request throughput (req/s):              0.38
Peak concurrent requests:                14.00

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18feaf0390

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-26T14:30:34Z

vllm_omni/benchmarks/patch/patch.py

-        keepalive_timeout=60,
        enable_cleanup_closed=True,
-        force_close=False,
+        force_close=True,


Keep benchmark HTTP connections reusable

In benchmark() this change sets aiohttp.TCPConnector(..., force_close=True), which closes every connection after each request. That disables connection reuse while this block is explicitly configured to benchmark request throughput/latency and even comments that connections are reused, so HTTPS/TCP handshakes get added to nearly every request and the measured model-serving metrics become systematically inflated/noisy under load.

Useful? React with 👍 / 👎.

gcanlin · 2026-03-26T14:39:09Z

Curiosity, any explanation about bug and root cause?

amy-why-3459 · 2026-03-26T14:58:42Z

Curiosity, any explanation about bug and root cause?

Using the prompt "What is the capital of France? Answer in one word.", the talker's answer will exceed one word. The root cause is that when the answer is only one word, trailing_text_hidden should be concatenated with tts_eos_embed, instead of using torch.zero. If eos is not concatenated correctly, the talker will assume the request has not stopped, causing the talker's output audio to exceed one word.

amy-why-3459 · 2026-03-26T15:00:44Z

@Sy0307 @LJH-LBJ PTAL

Sy0307 · 2026-03-26T15:10:31Z

LGTM. Add some comments as reminder will be better.

gcanlin · 2026-03-26T15:11:03Z

vllm_omni/benchmarks/patch/patch.py

-        keepalive_timeout=60,
        enable_cleanup_closed=True,
-        force_close=False,
+        force_close=True,


And why do we need this change?

To avoid server connect errors during high-concurrency benchmark runs

gcanlin · 2026-03-26T15:14:38Z

Besides, I will also recommend to add a simple UT to avoid others change the expected behavior mistakenly, if it's possible.

Hmm... Maybe comment is enough.

LJH-LBJ · 2026-03-26T15:24:35Z

vllm_omni/benchmarks/patch/patch.py

-        keepalive_timeout=60,
        enable_cleanup_closed=True,
-        force_close=False,
+        force_close=True,


When a benchmark sends a large number of concurrent requests, the connection can be reused. Why delete it?

Reusing connections may result in a server disconnect error.

amy-why-3459 · 2026-03-27T01:17:16Z

Besides, I will also recommend to add a simple UT to avoid others change the expected behavior mistakenly, if it's possible.

Hmm... Maybe comment is enough.

We added an E2E use case for the caregiving scenario. #2097

amy-why-3459 · 2026-03-27T01:50:36Z

@yenuo26 PTAL

amy-why-3459 requested a review from hsliuustc0106 as a code owner March 26, 2026 14:27

Fixed the issue of incorrect answers for single words.

d719a15

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>

amy-why-3459 force-pushed the bugfix branch from 18feaf0 to d719a15 Compare March 26, 2026 14:29

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

amy-why-3459 changed the title ~~【BugFix】Fixed the issue of incorrect answers for single words.~~ [BugFix]Fixed the issue of incorrect answers for single words. Mar 26, 2026

amy-why-3459 changed the title ~~[BugFix]Fixed the issue of incorrect answers for single words.~~ [BugFix][Qwen3-Omni]Fixed the issue of incorrect answers for single words. Mar 26, 2026

gcanlin reviewed Mar 26, 2026

View reviewed changes

gcanlin approved these changes Mar 26, 2026

View reviewed changes

LJH-LBJ reviewed Mar 26, 2026

View reviewed changes

gcanlin added the ready label to trigger buildkite CI label Mar 27, 2026

Conversation

amy-why-3459 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Mar 26, 2026

Uh oh!

amy-why-3459 commented Mar 26, 2026

Uh oh!

amy-why-3459 commented Mar 26, 2026

Uh oh!

Sy0307 commented Mar 26, 2026

Uh oh!

gcanlin Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

amy-why-3459 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LJH-LBJ Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

amy-why-3459 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

amy-why-3459 commented Mar 27, 2026

Uh oh!

amy-why-3459 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amy-why-3459 commented Mar 26, 2026 •

edited

Loading

gcanlin commented Mar 26, 2026 •

edited

Loading