Skip to content

[model] fix: Qwen3.5-VL MTP standard attn specs patch#3330

Open
HollowMan6 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
HollowMan6:qwen35_mtp_patch
Open

[model] fix: Qwen3.5-VL MTP standard attn specs patch#3330
HollowMan6 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
HollowMan6:qwen35_mtp_patch

Conversation

@HollowMan6
Copy link
Copy Markdown
Contributor

@HollowMan6 HollowMan6 commented Apr 14, 2026

What does this PR do ?

The main decoder was patched to use Qwen3VLSelfAttention, but the separately generated MTP spec was still using standard Megatron SelfAttention, so MTP took the wrong RoPE path and threw error in rotary embedding.

Changelog

  • Recursively patching standard attention specs inside MTP specs as well, and by applying that patch before constructing both dense and MoE Qwen3.5-VL models.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Refactor

    • Improved attention specification handling in model configuration to support more flexible and recursive patching patterns.
  • Tests

    • Added comprehensive unit and integration tests to verify attention specification updates across nested model components.

Copilot AI review requested due to automatic review settings April 14, 2026 23:16
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Qwen3.5-VL MTP (next-token prediction) initialization by ensuring the MTP block spec’s self-attention specs are patched to use Qwen3VLSelfAttention, preventing incorrect RoPE/mRoPE rotary embedding behavior.

Changes:

  • Patch mtp_block_spec(...) output with _patch_standard_attention_specs(...) before constructing Qwen3.5-VL dense and MoE models.
  • Extend _patch_standard_attention_specs to recurse into nested MTP layer specs.
  • Add unit tests validating recursion into MTP specs and that provide() passes patched specs to the model constructor.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py Patches MTP block spec attention modules to Qwen3VLSelfAttention and adds recursive patching support.
tests/unit_tests/models/qwen_vl/test_qwen35_vl_provider.py Adds unit tests covering recursive patching into MTP specs and provide() behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: bb86114d-9b6d-4bb6-bb27-a85d4344fbfd

📥 Commits

Reviewing files that changed from the base of the PR and between 9b38f65 and 0709738.

📒 Files selected for processing (2)
  • src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py
  • tests/unit_tests/models/qwen_vl/test_qwen35_vl_provider.py

📝 Walkthrough

Walkthrough

The model provider's provide() method is refactored to construct and patch the multi-token prediction block spec once before passing it to the model constructor, rather than patching inline. The _patch_standard_attention_specs helper is generalized to recursively handle ModuleSpec instances and nested MTP layers in addition to hybrid specs.

Changes

Cohort / File(s) Summary
Model Provider Implementation
src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py
Refactored provide() to construct mtp_spec once and apply _patch_standard_attention_specs() before passing to model constructor. Generalized _patch_standard_attention_specs() parameter type from TransformerBlockSubmodules to `Optional[TransformerBlockSubmodules
Test Suite
tests/unit_tests/models/qwen_vl/test_qwen35_vl_provider.py
Added unit test verifying _patch_standard_attention_specs() patches ModuleSpec entries for self_attention and handles recursive patching through nested mtp_model_layer specs. Added integration test for Qwen35VLMoEModelProvider.provide() using monkeypatching to assert both language transformer and nested MTP block spec self-attention modules are patched to Qwen3VLSelfAttention.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR adds tests but lacks documentation of test execution status and results; changes critical attention/RoPE numerics without confirming the fix resolves the issue. Document that tests pass, provide evidence RoPE error is resolved, and add CI test results once NVIDIA approval allows execution.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the specific fix: patching Qwen3.5-VL MTP standard attention specs to use the correct self-attention implementation, which aligns with the main objective to resolve the RoPE mismatch issue.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

The main decoder was patched to use `Qwen3VLSelfAttention`,
but the separately generated MTP spec was still using
standard Megatron SelfAttention, so MTP took the wrong RoPE
path and threw error in rotary embedding.

Fixed that by recursively patching standard attention
specs inside MTP specs as well, and by applying that
patch before constructing both dense and MoE Qwen3.5-VL models.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
…n inputs

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants