[NPUW] Moving passes from llm_compiled_model to separate files#34934
[NPUW] Moving passes from llm_compiled_model to separate files#34934AlexanderKalistratov wants to merge 5 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the Intel NPUW plugin implementation by extracting previously inline LLM/Whisper/Embedding/MoE transformation passes and related logic from the large llm_compiled_model*.cpp/hpp and llm_compiled_model_utils.* files into dedicated translation units under clearer subfolders (npuw_transformations/, embedding/, whisper/, moe_transformations/). This primarily improves code organization and compilation-unit boundaries while aiming to preserve existing behavior.
Changes:
- Moved multiple OpenVINO
ModelPass/ matcher-pass implementations into new dedicated.cpp/.hppfiles undernpuw_transformations/,embedding/,whisper/, andmoe_transformations/. - Updated
LLMCompiledModeland unit tests to include the new headers and compile the new sources. - Introduced
KVAxesPositionas a shared struct header (kv_axes_position.hpp) instead of an anonymous-namespace definition.
Reviewed changes
Copilot reviewed 39 out of 40 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/plugins/intel_npu/tests/unit/npuw/transpose_vt.cpp | Adjusts includes to use the extracted OptimizeValueTensors header. |
| src/plugins/intel_npu/tests/unit/npuw/llm_compiled_model_factory_options_test.cpp | Updates includes to use extracted Whisper preparation passes. |
| src/plugins/intel_npu/tests/unit/CMakeLists.txt | Adds newly extracted plugin sources to the unit test target build. |
| src/plugins/intel_npu/src/plugin/npuw/whisper/whisper_infer_request.hpp | Fixes include paths after moving Whisper code into a subfolder. |
| src/plugins/intel_npu/src/plugin/npuw/whisper/whisper_infer_request.cpp | Fixes include paths after moving Whisper code into a subfolder. |
| src/plugins/intel_npu/src/plugin/npuw/whisper/prepare_whisper_model.hpp | New header for Whisper model preparation passes. |
| src/plugins/intel_npu/src/plugin/npuw/whisper/prepare_whisper_model.cpp | New implementation of Whisper model preparation transformations. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.hpp | New header for SliceOutEmbeds pass. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.cpp | New implementation for SliceOutEmbeds pass. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_to_static.hpp | New header for ReshapeToStatic pass and KVAxesPosition usage. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_to_static.cpp | New implementation for ReshapeToStatic logic. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_sliced_head_to_static.hpp | New header for ReshapeSlicedHeadToStatic pass. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_sliced_head_to_static.cpp | New implementation for reshaping sliced LM head inputs. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/phi3_sliding_mask.hpp | New header for Phi3SlidingMask transformation pass. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/phi3_sliding_mask.cpp | New implementation of Phi-3 sliding window attention mask patching. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/patch_phi3_sliding_mask.hpp | New wrapper pass header to apply Phi-3 sliding mask transformations. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/patch_phi3_sliding_mask.cpp | New wrapper pass implementation. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/optimize_value_tensors.hpp | New header for OptimizeValueTensors pass (SDPA decomposition + V tensor transpose). |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/optimize_value_tensors.cpp | New implementation for OptimizeValueTensors. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/lora_stateful_to_stateless.hpp | New header for LoRA stateful→stateless conversion pass. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/lora_stateful_to_stateless.cpp | New implementation for LoRA state conversion. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/kv_axes_position.hpp | New shared KVAxesPosition struct header. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/decompose_gqa.hpp | New header for DecomposeGQA pass. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/decompose_gqa.cpp | New implementation for GroupQueryAttention decomposition. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/convert_kvcache_to_precision.hpp | New header for KV-cache low-precision conversion and type selection helper. |
| src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/convert_kvcache_to_precision.cpp | New implementation for KV-cache precision conversion + FP8 optimization helper. |
| src/plugins/intel_npu/src/plugin/npuw/moe_transformations/apply_moe_device_routed_transforms.hpp | New header for applying DEVICE_ROUTED MoE transformations. |
| src/plugins/intel_npu/src/plugin/npuw/moe_transformations/apply_moe_device_routed_transforms.cpp | New implementation applying MoE transforms in sequence. |
| src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model_utils.hpp | Removes pass class declarations that were moved out. |
| src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model_utils.cpp | Removes moved transformation logic, leaving shared helpers like has_input. |
| src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.hpp | Replaces anonymous KVAxesPosition with shared header include. |
| src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp | Switches to includes of the extracted passes/transforms and removes inlined implementations. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/remove_empty_kv_inputs.hpp | New header for embedding-model pass to drop empty KV inputs. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/remove_empty_kv_inputs.cpp | New implementation for removing empty KV inputs in embedding models. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/redirect_new_kv_to_output.hpp | New header for embedding-model pass to redirect new KV tensors to outputs. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/redirect_new_kv_to_output.cpp | New implementation for redirecting new KV tensors to outputs. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/prepare_embedding_model.hpp | New header for embedding model preparation pass. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/prepare_embedding_model.cpp | Updated implementation moved from old embedding utils, with updated include paths. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/embedding_infer_request.hpp | Fixes include paths after moving embedding infer request into a subfolder. |
| src/plugins/intel_npu/src/plugin/npuw/embedding/embedding_infer_request.cpp | Fixes include paths after moving embedding infer request into a subfolder. |
| #pragma once | ||
|
|
||
| #include "openvino/pass/pass.hpp" | ||
|
|
||
| class SliceOutEmbeds : public ov::pass::ModelPass { | ||
| uint32_t m_batch_dim; | ||
| std::size_t m_max_generation_token_len; | ||
|
|
There was a problem hiding this comment.
[BLOCKER] This header uses uint32_t and std::size_t but does not include the headers that define them. It currently relies on transitive includes from OpenVINO headers, which is brittle and can break with include-order changes. Add explicit includes (e.g., <cstdint> and <cstddef> or <cstdint> and <cstdlib> as appropriate).
| #pragma once | ||
|
|
||
| #include "openvino/pass/pass.hpp" | ||
|
|
||
| class ReshapeSlicedHeadToStatic : public ov::pass::ModelPass { | ||
| uint32_t m_batch_dim; | ||
| std::size_t m_max_generation_token_len; | ||
|
|
There was a problem hiding this comment.
[BLOCKER] This header uses uint32_t / std::size_t but doesn't include <cstdint> / <cstddef> (or equivalent) directly. Depending on transitive includes can make builds fail on different compilers or after unrelated include changes; please add the needed standard includes here.
| class SliceOutEmbeds : public ov::pass::ModelPass { | ||
| uint32_t m_batch_dim; | ||
| std::size_t m_max_generation_token_len; | ||
|
|
||
| public: | ||
| OPENVINO_MODEL_PASS_RTTI("ov::npuw::SliceOutEmbeds"); | ||
| explicit SliceOutEmbeds(uint32_t batch_dim, std::size_t max_generation_token_len); | ||
| bool run_on_model(const std::shared_ptr<ov::Model>& model) override; |
There was a problem hiding this comment.
[MEDIUM] Several newly extracted transformation passes are now declared in the global namespace in public headers (starting with this one, and similarly in reshape_to_static.hpp, reshape_sliced_head_to_static.hpp, phi3_sliding_mask.hpp, convert_kvcache_to_precision.hpp, moe_transformations/apply_moe_device_routed_transforms.hpp, etc.). This increases collision risk and is inconsistent with existing NPUW transformation headers that scope passes under ov::npuw::pass / ov::npuw::... (e.g., src/plugins/intel_npu/src/plugin/npuw/moe_transformations/gather_to_2d_gather.hpp:10-42). Consider wrapping these class declarations in the appropriate ov::npuw (and/or ov::npuw::pass / ov::npuw::util) namespace to keep symbols scoped to the plugin and match established conventions.
Extract all inline OpenVINO passes and transformation logic from
llm_compiled_model.cppandllm_compiled_model_utils.cppinto dedicated source files. This PR intentionally doesn't change names or namespaces and not doing any other changes. Just moving code.All common transformation are put into
npuw_transformations/folder:ConvertKVCacheToPrecision:convert_kvcache_to_precision.cpp/.hppDecomposeGQA:decompose_gqa.cpp/.hppPhi3SlidingMask:phi3_sliding_mask.cpp/.hppPatchPhi3SlidingMask:patch_phi3_sliding_mask.cpp/.hppReshapeToStatic:reshape_to_static.cpp/.hppReshapeSlicedHeadToStatic:reshape_sliced_head_to_static.cpp/.hppSliceOutEmbeds:slice_out_embeds.cpp/.hppLoraStatefulToStatelessPass:lora_stateful_to_stateless.cpp/.hppOptimizeValueTensors:optimize_value_tensors.cpp/.hpp(previously inllm_compiled_model_utils.cpp)kv_axes_position.hpp—KVAxesPositionstruct (previously in anonymous namespace inllm_compiled_model.hpp)All code related to embedding models are moved to
embedding/folder:RemoveEmptyKVInputs:remove_empty_kv_inputs.cpp/.hppRedirectNewKvToOutput:redirect_new_kv_to_output.cpp/.hppembedding_infer_request.cpp/.hppandprepare_embedding_model.cpp/.hppare movedMoE related transformation are moved to
moe_transformations/folder:ApplyMoEDeviceRoutedTransforms:apply_moe_device_routed_transforms.cpp/.hppWhisper related transformations are mode to
whisper/folder:PrepareWhisperPrefillModel,PrepareWhisperKVCacheModel:prepare_whisper_model.cpp/.hpp(prevoiusly were inllm_compiled_model_utils.cpp)whisper_infer_request.cpp/.hppwere movedAI Assistance:
AI assistance used yes
AI moved most of the code, reviewed and check verified manually