[model] fix: Qwen3.5-VL MTP standard attn specs patch by HollowMan6 · Pull Request #3330 · NVIDIA-NeMo/Megatron-Bridge

HollowMan6 · 2026-04-14T23:16:24Z

What does this PR do ?

The main decoder was patched to use Qwen3VLSelfAttention, but the separately generated MTP spec was still using standard Megatron SelfAttention, so MTP took the wrong RoPE path and threw error in rotary embedding.

Changelog

Recursively patching standard attention specs inside MTP specs as well, and by applying that patch before constructing both dense and MoE Qwen3.5-VL models.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Refactor
- Improved attention specification handling in model configuration to support more flexible and recursive patching patterns.
Tests
- Added comprehensive unit and integration tests to verify attention specification updates across nested model components.

copy-pr-bot · 2026-04-14T23:16:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull request overview

Fixes Qwen3.5-VL MTP (next-token prediction) initialization by ensuring the MTP block spec’s self-attention specs are patched to use Qwen3VLSelfAttention, preventing incorrect RoPE/mRoPE rotary embedding behavior.

Changes:

Patch mtp_block_spec(...) output with _patch_standard_attention_specs(...) before constructing Qwen3.5-VL dense and MoE models.
Extend _patch_standard_attention_specs to recurse into nested MTP layer specs.
Add unit tests validating recursion into MTP specs and that provide() passes patched specs to the model constructor.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py`	Patches MTP block spec attention modules to `Qwen3VLSelfAttention` and adds recursive patching support.
`tests/unit_tests/models/qwen_vl/test_qwen35_vl_provider.py`	Adds unit tests covering recursive patching into MTP specs and `provide()` behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai · 2026-04-14T23:22:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: bb86114d-9b6d-4bb6-bb27-a85d4344fbfd

📥 Commits

Reviewing files that changed from the base of the PR and between 9b38f65 and 0709738.

📒 Files selected for processing (2)

src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py
tests/unit_tests/models/qwen_vl/test_qwen35_vl_provider.py

📝 Walkthrough

Walkthrough

The model provider's provide() method is refactored to construct and patch the multi-token prediction block spec once before passing it to the model constructor, rather than patching inline. The _patch_standard_attention_specs helper is generalized to recursively handle ModuleSpec instances and nested MTP layers in addition to hybrid specs.

Changes

Cohort / File(s)	Summary
Model Provider Implementation `src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py`	Refactored `provide()` to construct `mtp_spec` once and apply `_patch_standard_attention_specs()` before passing to model constructor. Generalized `_patch_standard_attention_specs()` parameter type from `TransformerBlockSubmodules` to `Optional[TransformerBlockSubmodules
Test Suite `tests/unit_tests/models/qwen_vl/test_qwen35_vl_provider.py`	Added unit test verifying `_patch_standard_attention_specs()` patches `ModuleSpec` entries for `self_attention` and handles recursive patching through nested `mtp_model_layer` specs. Added integration test for `Qwen35VLMoEModelProvider.provide()` using monkeypatching to assert both language transformer and nested MTP block spec self-attention modules are patched to `Qwen3VLSelfAttention`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR adds tests but lacks documentation of test execution status and results; changes critical attention/RoPE numerics without confirming the fix resolves the issue.	Document that tests pass, provide evidence RoPE error is resolved, and add CI test results once NVIDIA approval allows execution.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the specific fix: patching Qwen3.5-VL MTP standard attention specs to use the correct self-attention implementation, which aligns with the main objective to resolve the RoPE mismatch issue.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The main decoder was patched to use `Qwen3VLSelfAttention`, but the separately generated MTP spec was still using standard Megatron SelfAttention, so MTP took the wrong RoPE path and threw error in rotary embedding. Fixed that by recursively patching standard attention specs inside MTP specs as well, and by applying that patch before constructing both dense and MoE Qwen3.5-VL models. Signed-off-by: Hollow Man <hollowman@opensuse.org>

…n inputs Signed-off-by: Hollow Man <hollowman@opensuse.org>

Copilot AI review requested due to automatic review settings April 14, 2026 23:16

github-actions bot added the community-request label Apr 14, 2026

Copilot started reviewing on behalf of HollowMan6 April 14, 2026 23:16 View session

HollowMan6 mentioned this pull request Apr 14, 2026

[megatron] fix: Qwen3.5 LoRA & MTP support (with Megatron-Bridge) verl-project/verl#5599

Open

8 tasks

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py

HollowMan6 force-pushed the qwen35_mtp_patch branch from 0709738 to fd9dad0 Compare April 15, 2026 17:04

Fix unbonded var vision_embeds when forward text only without visio…

e0a2e59

…n inputs Signed-off-by: Hollow Man <hollowman@opensuse.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] fix: Qwen3.5-VL MTP standard attn specs patch#3330

[model] fix: Qwen3.5-VL MTP standard attn specs patch#3330
HollowMan6 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
HollowMan6:qwen35_mtp_patch

HollowMan6 commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

coderabbitai bot commented Apr 14, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HollowMan6 commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

coderabbitai bot commented Apr 14, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HollowMan6 commented Apr 14, 2026 •

edited by coderabbitai bot

Loading