[NPUW] Fixed Gemma2 4K sliding window work with short prompts on NPU#32891
[NPUW] Fixed Gemma2 4K sliding window work with short prompts on NPU#32891dmatveev merged 4 commits intoopenvinotoolkit:masterfrom
Conversation
77fe1c0 to
86896a6
Compare
dmatveev
left a comment
There was a problem hiding this comment.
Probably it is a good chance to ask the transformation team on how to make things right
| OPENVINO_MATCHER_PASS_RTTI("npuw::LLMCompiledModel::Phi3SlidingMask2"); | ||
|
|
||
| Phi3SlidingMask2() { | ||
| Phi3SlidingMask2(const std::shared_ptr<ov::Model>& model) { |
There was a problem hiding this comment.
Normally we have explicit here, but what's the reason to pass model to a pass that can, in fact, be applied to any other model?
| std::shared_ptr<ov::Node> matched_position_ids = nullptr; | ||
| std::shared_ptr<ov::Node> matched_attention_mask = nullptr; | ||
| for (const auto& i : model->inputs()) { | ||
| if (i.get_any_name() == "position_ids") { | ||
| matched_position_ids = i.get_node_shared_ptr(); | ||
| } | ||
| if (i.get_any_name() == "attention_mask") { | ||
| matched_attention_mask = i.get_node_shared_ptr(); | ||
| } | ||
| } | ||
| OPENVINO_ASSERT(matched_position_ids, "position_ids input is not found!"); | ||
| OPENVINO_ASSERT(matched_attention_mask, "attention_mask input is not found!"); |
There was a problem hiding this comment.
Why can't you make it through pass? Or, maybe, outside the pass?
There was a problem hiding this comment.
Because it is long chain of operations till the position_ids and attention_mask starting from point of the pattern and this chain can differ between models.
44b0932 to
569aaad
Compare
| rewr.run_on_model(model); | ||
| ov::pass::Manager manager; | ||
| manager.register_pass<Phi3SlidingMask2>(); | ||
| if (!manager.run_passes(model)) { |
There was a problem hiding this comment.
I think it would be better to move this logic into Phi3SlidingMask2 pass to make it handle all the cases.
And probably rename Phi3SlidingMask2 into Phi3SlidingMask and Phi3SlidingMask rename into Phi3SlidingMask_pattern_transformers_451 or something like that.
569aaad to
91a477e
Compare
…penvinotoolkit#32891) ### Details: - *Relaxed Phi3SlidingMask2 pattern in order to allow `attention_mask` and `position_ids` to be located more flexibly in the model without breaking the sliding window mask calculation* ### Tickets: - *EISW-190610*
…penvinotoolkit#33819) ### Details: - *Hot-fix after openvinotoolkit#32891 - *Phi3 Sliding window patching is applied before attention mask is added to Whisper models. But as Whisper model doesn't need SWA patching, then added an early return if neither `attention_mask` nor `position_ids` were found* ### Tickets: - *N/A*
Details:
attention_maskandposition_idsto be located more flexibly in the model without breaking the sliding window mask calculationTickets: