Skip to content

[NPUW] Fixed Gemma2 4K sliding window work with short prompts on NPU#32891

Merged
dmatveev merged 4 commits intoopenvinotoolkit:masterfrom
AsyaPronina:gemma2_swa
Jan 21, 2026
Merged

[NPUW] Fixed Gemma2 4K sliding window work with short prompts on NPU#32891
dmatveev merged 4 commits intoopenvinotoolkit:masterfrom
AsyaPronina:gemma2_swa

Conversation

@AsyaPronina
Copy link
Copy Markdown
Contributor

Details:

  • Relaxed Phi3SlidingMask2 pattern in order to allow attention_mask and position_ids to be located more flexibly in the model without breaking the sliding window mask calculation

Tickets:

@AsyaPronina AsyaPronina requested review from a team as code owners November 18, 2025 00:27
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Nov 18, 2025
@dmatveev dmatveev added this to the 2026.0 milestone Nov 20, 2025
Copy link
Copy Markdown
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it is a good chance to ask the transformation team on how to make things right

OPENVINO_MATCHER_PASS_RTTI("npuw::LLMCompiledModel::Phi3SlidingMask2");

Phi3SlidingMask2() {
Phi3SlidingMask2(const std::shared_ptr<ov::Model>& model) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we have explicit here, but what's the reason to pass model to a pass that can, in fact, be applied to any other model?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

Comment on lines +671 to +682
std::shared_ptr<ov::Node> matched_position_ids = nullptr;
std::shared_ptr<ov::Node> matched_attention_mask = nullptr;
for (const auto& i : model->inputs()) {
if (i.get_any_name() == "position_ids") {
matched_position_ids = i.get_node_shared_ptr();
}
if (i.get_any_name() == "attention_mask") {
matched_attention_mask = i.get_node_shared_ptr();
}
}
OPENVINO_ASSERT(matched_position_ids, "position_ids input is not found!");
OPENVINO_ASSERT(matched_attention_mask, "attention_mask input is not found!");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you make it through pass? Or, maybe, outside the pass?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is long chain of operations till the position_ids and attention_mask starting from point of the pattern and this chain can differ between models.

rewr.run_on_model(model);
ov::pass::Manager manager;
manager.register_pass<Phi3SlidingMask2>();
if (!manager.run_passes(model)) {
Copy link
Copy Markdown
Contributor

@AlexanderKalistratov AlexanderKalistratov Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to move this logic into Phi3SlidingMask2 pass to make it handle all the cases.
And probably rename Phi3SlidingMask2 into Phi3SlidingMask and Phi3SlidingMask rename into Phi3SlidingMask_pattern_transformers_451 or something like that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

Copy link
Copy Markdown
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dmatveev dmatveev added this pull request to the merge queue Jan 21, 2026
Merged via the queue into openvinotoolkit:master with commit fcb8b1c Jan 21, 2026
187 checks passed
@dmatveev dmatveev deleted the gemma2_swa branch January 21, 2026 11:13
github-merge-queue bot pushed a commit that referenced this pull request Jan 26, 2026
…33819)

### Details:
- *Hot-fix after #32891
- *Phi3 Sliding window patching is applied before attention mask is
added to Whisper models. But as Whisper model doesn't need SWA patching,
then added an early return if neither `attention_mask` nor
`position_ids` were found*

### Tickets:
 - *N/A*
Naseer-010 pushed a commit to Naseer-010/openvino that referenced this pull request Feb 18, 2026
…penvinotoolkit#32891)

### Details:
- *Relaxed Phi3SlidingMask2 pattern in order to allow `attention_mask`
and `position_ids` to be located more flexibly in the model without
breaking the sliding window mask calculation*

### Tickets:
 - *EISW-190610*
Naseer-010 pushed a commit to Naseer-010/openvino that referenced this pull request Feb 18, 2026
…penvinotoolkit#33819)

### Details:
- *Hot-fix after openvinotoolkit#32891
- *Phi3 Sliding window patching is applied before attention mask is
added to Whisper models. But as Whisper model doesn't need SWA patching,
then added an early return if neither `attention_mask` nor
`position_ids` were found*

### Tickets:
 - *N/A*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants