fix(qwen 35): fix qwen35 thd attention mask dtype to bool by li126com · Pull Request #3314 · NVIDIA-NeMo/Megatron-Bridge

li126com · 2026-04-14T06:10:21Z

What does this PR do ?

fix(qwen3-vl): use bool dtype for attention_mask

🐛 Bug Fix

Fix attention_mask dtype causing silent data corruption

Change attention_mask dtype from torch.int32 to torch.bool in
Qwen3VLModel.forward() at two call sites where the mask is created
via torch.ones_like
Add a defensive bool-cast guard in preprocess_packed_seqs() to
ensure correct advanced indexing semantics (bool mask-select vs
int fancy-index which silently corrupts data when values are 0/1)

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Bug Fixes
- Corrected data type handling in packed sequence preprocessing to ensure consistent and reliable batch processing operations
- Enhanced validation during sequence indexing to prevent token corruption and data inconsistencies
- Improved overall model stability and correctness when processing sequences, ensuring more reliable performance

copy-pr-bot · 2026-04-14T06:10:25Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-14T06:15:46Z

📝 Walkthrough

Walkthrough

Attention mask dtype handling is updated in Qwen3VL model's packed sequence preprocessing. The model now initializes attention masks with torch.bool dtype instead of torch.int32, and the preprocessing utility enforces boolean dtype before using the mask for indexing operations.

Changes

Cohort / File(s)	Summary
Qwen3VL Packed Sequence Preprocessing `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py`	Attention mask initialization changed from `torch.int32` to `torch.bool` dtype in both pre-process and MTP/THD conversion paths for packed sequence handling.
Qwen3VL Packed Sequence Utils `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/utils.py`	Added explicit dtype enforcement requiring `attention_mask` to be boolean before computing sequence lengths and performing advanced indexing on input tokens.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main change: fixing the attention mask dtype to bool for Qwen3.5 THD (tensor-parallelism with packed sequences), which directly matches the code changes in both model.py and utils.py.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR qualifies for passing because changes are minor: only 7 lines modified across two files fixing dtype initialization for attention masks from torch.int32 to torch.bool.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

# 🐛 Bug Fix ## Fix attention_mask dtype causing silent data corruption - Change attention_mask dtype from `torch.int32` to `torch.bool` in `Qwen3VLModel.forward()` at two call sites where the mask is created via `torch.ones_like` - Add a defensive bool-cast guard in `preprocess_packed_seqs()` to ensure correct advanced indexing semantics (bool mask-select vs int fancy-index which silently corrupts data when values are 0/1) Signed-off-by: root <li126com2@126.com>

github-actions bot added the community-request label Apr 14, 2026

li126com force-pushed the fix/qwen35_thd_attention_mask branch from 075d530 to c63350a Compare April 14, 2026 06:14

li126com force-pushed the fix/qwen35_thd_attention_mask branch from c63350a to 3212ba6 Compare April 14, 2026 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qwen 35): fix qwen35 thd attention mask dtype to bool#3314

fix(qwen 35): fix qwen35 thd attention mask dtype to bool#3314
li126com wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
li126com:fix/qwen35_thd_attention_mask

li126com commented Apr 14, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

li126com commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

🐛 Bug Fix

Fix attention_mask dtype causing silent data corruption

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

li126com commented Apr 14, 2026 •

edited

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading