Skip to content

fix(qwen 35): fix qwen35 thd attention mask dtype to bool#3314

Open
li126com wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
li126com:fix/qwen35_thd_attention_mask
Open

fix(qwen 35): fix qwen35 thd attention mask dtype to bool#3314
li126com wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
li126com:fix/qwen35_thd_attention_mask

Conversation

@li126com
Copy link
Copy Markdown

@li126com li126com commented Apr 14, 2026

What does this PR do ?

fix(qwen3-vl): use bool dtype for attention_mask

🐛 Bug Fix

Fix attention_mask dtype causing silent data corruption

  • Change attention_mask dtype from torch.int32 to torch.bool in
    Qwen3VLModel.forward() at two call sites where the mask is created
    via torch.ones_like
  • Add a defensive bool-cast guard in preprocess_packed_seqs() to
    ensure correct advanced indexing semantics (bool mask-select vs
    int fancy-index which silently corrupts data when values are 0/1)

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Bug Fixes
    • Corrected data type handling in packed sequence preprocessing to ensure consistent and reliable batch processing operations
    • Enhanced validation during sequence indexing to prevent token corruption and data inconsistencies
    • Improved overall model stability and correctness when processing sequences, ensuring more reliable performance

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@li126com li126com force-pushed the fix/qwen35_thd_attention_mask branch from 075d530 to c63350a Compare April 14, 2026 06:14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Attention mask dtype handling is updated in Qwen3VL model's packed sequence preprocessing. The model now initializes attention masks with torch.bool dtype instead of torch.int32, and the preprocessing utility enforces boolean dtype before using the mask for indexing operations.

Changes

Cohort / File(s) Summary
Qwen3VL Packed Sequence Preprocessing
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py
Attention mask initialization changed from torch.int32 to torch.bool dtype in both pre-process and MTP/THD conversion paths for packed sequence handling.
Qwen3VL Packed Sequence Utils
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/utils.py
Added explicit dtype enforcement requiring attention_mask to be boolean before computing sequence lengths and performing advanced indexing on input tokens.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: fixing the attention mask dtype to bool for Qwen3.5 THD (tensor-parallelism with packed sequences), which directly matches the code changes in both model.py and utils.py.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR qualifies for passing because changes are minor: only 7 lines modified across two files fixing dtype initialization for attention masks from torch.int32 to torch.bool.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

# 🐛 Bug Fix

## Fix attention_mask dtype causing silent data corruption

- Change attention_mask dtype from `torch.int32` to `torch.bool` in
  `Qwen3VLModel.forward()` at two call sites where the mask is created
  via `torch.ones_like`
- Add a defensive bool-cast guard in `preprocess_packed_seqs()` to
  ensure correct advanced indexing semantics (bool mask-select vs
  int fancy-index which silently corrupts data when values are 0/1)

Signed-off-by: root <li126com2@126.com>
@li126com li126com force-pushed the fix/qwen35_thd_attention_mask branch from c63350a to 3212ba6 Compare April 14, 2026 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants