Skip to content

[BugFix][Qwen3-Omni]Fixed the issue of incorrect answers for single words.#2239

Open
amy-why-3459 wants to merge 1 commit intovllm-project:mainfrom
amy-why-3459:bugfix
Open

[BugFix][Qwen3-Omni]Fixed the issue of incorrect answers for single words.#2239
amy-why-3459 wants to merge 1 commit intovllm-project:mainfrom
amy-why-3459:bugfix

Conversation

@amy-why-3459
Copy link
Contributor

@amy-why-3459 amy-why-3459 commented Mar 26, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

  1. Fixed the issue of incorrect answers for single words.
  2. Fix the benchmark server disconnect error issue

Test Plan

pytest -sv test_qwen3_omni_expansion.py::test_one_word_prompt_001 -m "advanced_model" --run-level "advanced_model"

Test Result

--- Running test: test_one_word_prompt_001[async_chunk]
the avg e2e latency is: 0.8250952590024099
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.0022777009871788323
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.0022065509692765772
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.00216514099156484
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
the avg e2e latency is: 0.0025836509885266423
audio content is:  Paris
text content is: Paris
similarity is: 1.0000000000000002
PASSEDGPU cleanup disabled
INFO 03-27 02:31:51 [datasets.py:631] Sampling input_len from [100, 100] and output_len from [100, 100]
WARNING: vllm bench serve no longer sets temperature==0 (greedy) in requests by default. The default will be determined on the server side and can be model/API specific. For the old behavior, include --temperature=0.
Starting initial single prompt test run...
Skipping endpoint ready check.
Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 10
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [04:25<00:00,  2.66s/it]
tip: install termplotlib and gnuplot to plot the metrics
============ Serving Benchmark Result ============
Successful requests:                     100
Failed requests:                         0
Maximum request concurrency:             10
Benchmark duration (s):                  265.55
Request throughput (req/s):              0.38
Peak concurrent requests:                14.00

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18feaf0390

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

keepalive_timeout=60,
enable_cleanup_closed=True,
force_close=False,
force_close=True,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep benchmark HTTP connections reusable

In benchmark() this change sets aiohttp.TCPConnector(..., force_close=True), which closes every connection after each request. That disables connection reuse while this block is explicitly configured to benchmark request throughput/latency and even comments that connections are reused, so HTTPS/TCP handshakes get added to nearly every request and the measured model-serving metrics become systematically inflated/noisy under load.

Useful? React with 👍 / 👎.

@amy-why-3459 amy-why-3459 changed the title 【BugFix】Fixed the issue of incorrect answers for single words. [BugFix]Fixed the issue of incorrect answers for single words. Mar 26, 2026
@gcanlin
Copy link
Collaborator

gcanlin commented Mar 26, 2026

Curiosity, any explanation about bug and root cause?

@amy-why-3459 amy-why-3459 changed the title [BugFix]Fixed the issue of incorrect answers for single words. [BugFix][Qwen3-Omni]Fixed the issue of incorrect answers for single words. Mar 26, 2026
@amy-why-3459
Copy link
Contributor Author

Curiosity, any explanation about bug and root cause?

Using the prompt "What is the capital of France? Answer in one word.", the talker's answer will exceed one word. The root cause is that when the answer is only one word, trailing_text_hidden should be concatenated with tts_eos_embed, instead of using torch.zero. If eos is not concatenated correctly, the talker will assume the request has not stopped, causing the talker's output audio to exceed one word.

@amy-why-3459
Copy link
Contributor Author

@Sy0307 @LJH-LBJ PTAL

@Sy0307
Copy link
Contributor

Sy0307 commented Mar 26, 2026

LGTM. Add some comments as reminder will be better.

keepalive_timeout=60,
enable_cleanup_closed=True,
force_close=False,
force_close=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why do we need this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid server connect errors during high-concurrency benchmark runs

@gcanlin
Copy link
Collaborator

gcanlin commented Mar 26, 2026

Besides, I will also recommend to add a simple UT to avoid others change the expected behavior mistakenly, if it's possible.

Hmm... Maybe comment is enough.

Comment on lines -379 to +380
keepalive_timeout=60,
enable_cleanup_closed=True,
force_close=False,
force_close=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a benchmark sends a large number of concurrent requests, the connection can be reused. Why delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing connections may result in a server disconnect error.

@amy-why-3459
Copy link
Contributor Author

Besides, I will also recommend to add a simple UT to avoid others change the expected behavior mistakenly, if it's possible.

Hmm... Maybe comment is enough.

We added an E2E use case for the caregiving scenario. #2097

@amy-why-3459
Copy link
Contributor Author

@yenuo26 PTAL

@gcanlin gcanlin added the ready label to trigger buildkite CI label Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants