[NPUW] Moving passes from llm_compiled_model to separate files by AlexanderKalistratov · Pull Request #34934 · openvinotoolkit/openvino

AlexanderKalistratov · 2026-03-25T21:57:02Z

Extract all inline OpenVINO passes and transformation logic from llm_compiled_model.cpp and llm_compiled_model_utils.cpp into dedicated source files. This PR intentionally doesn't change names or namespaces and not doing any other changes. Just moving code.

All common transformation are put into npuw_transformations/ folder:

ConvertKVCacheToPrecision: convert_kvcache_to_precision.cpp/.hpp
DecomposeGQA: decompose_gqa.cpp/.hpp
Phi3SlidingMask: phi3_sliding_mask.cpp/.hpp
PatchPhi3SlidingMask: patch_phi3_sliding_mask.cpp/.hpp
ReshapeToStatic: reshape_to_static.cpp/.hpp
ReshapeSlicedHeadToStatic: reshape_sliced_head_to_static.cpp/.hpp
SliceOutEmbeds: slice_out_embeds.cpp/.hpp
LoraStatefulToStatelessPass: lora_stateful_to_stateless.cpp/.hpp
OptimizeValueTensors: optimize_value_tensors.cpp/.hpp (previously in llm_compiled_model_utils.cpp)
kv_axes_position.hpp — KVAxesPosition struct (previously in anonymous namespace in llm_compiled_model.hpp)

All code related to embedding models are moved to embedding/ folder:

RemoveEmptyKVInputs: remove_empty_kv_inputs.cpp/.hpp
RedirectNewKvToOutput: redirect_new_kv_to_output.cpp/.hpp
embedding_infer_request.cpp/.hpp and prepare_embedding_model.cpp/.hpp are moved

MoE related transformation are moved to moe_transformations/ folder:

ApplyMoEDeviceRoutedTransforms: apply_moe_device_routed_transforms.cpp/.hpp
The rest of MoE related code was already there

Whisper related transformations are mode to whisper/ folder:

PrepareWhisperPrefillModel, PrepareWhisperKVCacheModel: prepare_whisper_model.cpp/.hpp (prevoiusly were in llm_compiled_model_utils.cpp)
whisper_infer_request.cpp/.hpp were moved

AI Assistance:

AI assistance used yes
AI moved most of the code, reviewed and check verified manually

Copilot

Pull request overview

This PR refactors the Intel NPUW plugin implementation by extracting previously inline LLM/Whisper/Embedding/MoE transformation passes and related logic from the large llm_compiled_model*.cpp/hpp and llm_compiled_model_utils.* files into dedicated translation units under clearer subfolders (npuw_transformations/, embedding/, whisper/, moe_transformations/). This primarily improves code organization and compilation-unit boundaries while aiming to preserve existing behavior.

Changes:

Moved multiple OpenVINO ModelPass / matcher-pass implementations into new dedicated .cpp/.hpp files under npuw_transformations/, embedding/, whisper/, and moe_transformations/.
Updated LLMCompiledModel and unit tests to include the new headers and compile the new sources.
Introduced KVAxesPosition as a shared struct header (kv_axes_position.hpp) instead of an anonymous-namespace definition.

Reviewed changes

Copilot reviewed 39 out of 40 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/plugins/intel_npu/tests/unit/npuw/transpose_vt.cpp	Adjusts includes to use the extracted `OptimizeValueTensors` header.
src/plugins/intel_npu/tests/unit/npuw/llm_compiled_model_factory_options_test.cpp	Updates includes to use extracted Whisper preparation passes.
src/plugins/intel_npu/tests/unit/CMakeLists.txt	Adds newly extracted plugin sources to the unit test target build.
src/plugins/intel_npu/src/plugin/npuw/whisper/whisper_infer_request.hpp	Fixes include paths after moving Whisper code into a subfolder.
src/plugins/intel_npu/src/plugin/npuw/whisper/whisper_infer_request.cpp	Fixes include paths after moving Whisper code into a subfolder.
src/plugins/intel_npu/src/plugin/npuw/whisper/prepare_whisper_model.hpp	New header for Whisper model preparation passes.
src/plugins/intel_npu/src/plugin/npuw/whisper/prepare_whisper_model.cpp	New implementation of Whisper model preparation transformations.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.hpp	New header for `SliceOutEmbeds` pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.cpp	New implementation for `SliceOutEmbeds` pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_to_static.hpp	New header for `ReshapeToStatic` pass and `KVAxesPosition` usage.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_to_static.cpp	New implementation for `ReshapeToStatic` logic.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_sliced_head_to_static.hpp	New header for `ReshapeSlicedHeadToStatic` pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_sliced_head_to_static.cpp	New implementation for reshaping sliced LM head inputs.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/phi3_sliding_mask.hpp	New header for `Phi3SlidingMask` transformation pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/phi3_sliding_mask.cpp	New implementation of Phi-3 sliding window attention mask patching.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/patch_phi3_sliding_mask.hpp	New wrapper pass header to apply Phi-3 sliding mask transformations.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/patch_phi3_sliding_mask.cpp	New wrapper pass implementation.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/optimize_value_tensors.hpp	New header for `OptimizeValueTensors` pass (SDPA decomposition + V tensor transpose).
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/optimize_value_tensors.cpp	New implementation for `OptimizeValueTensors`.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/lora_stateful_to_stateless.hpp	New header for LoRA stateful→stateless conversion pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/lora_stateful_to_stateless.cpp	New implementation for LoRA state conversion.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/kv_axes_position.hpp	New shared `KVAxesPosition` struct header.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/decompose_gqa.hpp	New header for `DecomposeGQA` pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/decompose_gqa.cpp	New implementation for GroupQueryAttention decomposition.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/convert_kvcache_to_precision.hpp	New header for KV-cache low-precision conversion and type selection helper.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/convert_kvcache_to_precision.cpp	New implementation for KV-cache precision conversion + FP8 optimization helper.
src/plugins/intel_npu/src/plugin/npuw/moe_transformations/apply_moe_device_routed_transforms.hpp	New header for applying DEVICE_ROUTED MoE transformations.
src/plugins/intel_npu/src/plugin/npuw/moe_transformations/apply_moe_device_routed_transforms.cpp	New implementation applying MoE transforms in sequence.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model_utils.hpp	Removes pass class declarations that were moved out.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model_utils.cpp	Removes moved transformation logic, leaving shared helpers like `has_input`.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.hpp	Replaces anonymous `KVAxesPosition` with shared header include.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp	Switches to includes of the extracted passes/transforms and removes inlined implementations.
src/plugins/intel_npu/src/plugin/npuw/embedding/remove_empty_kv_inputs.hpp	New header for embedding-model pass to drop empty KV inputs.
src/plugins/intel_npu/src/plugin/npuw/embedding/remove_empty_kv_inputs.cpp	New implementation for removing empty KV inputs in embedding models.
src/plugins/intel_npu/src/plugin/npuw/embedding/redirect_new_kv_to_output.hpp	New header for embedding-model pass to redirect new KV tensors to outputs.
src/plugins/intel_npu/src/plugin/npuw/embedding/redirect_new_kv_to_output.cpp	New implementation for redirecting new KV tensors to outputs.
src/plugins/intel_npu/src/plugin/npuw/embedding/prepare_embedding_model.hpp	New header for embedding model preparation pass.
src/plugins/intel_npu/src/plugin/npuw/embedding/prepare_embedding_model.cpp	Updated implementation moved from old embedding utils, with updated include paths.
src/plugins/intel_npu/src/plugin/npuw/embedding/embedding_infer_request.hpp	Fixes include paths after moving embedding infer request into a subfolder.
src/plugins/intel_npu/src/plugin/npuw/embedding/embedding_infer_request.cpp	Fixes include paths after moving embedding infer request into a subfolder.

Copilot · 2026-03-25T22:02:11Z

src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.hpp

+#pragma once
+
+#include "openvino/pass/pass.hpp"
+
+class SliceOutEmbeds : public ov::pass::ModelPass {
+    uint32_t m_batch_dim;
+    std::size_t m_max_generation_token_len;
+


[BLOCKER] This header uses uint32_t and std::size_t but does not include the headers that define them. It currently relies on transitive includes from OpenVINO headers, which is brittle and can break with include-order changes. Add explicit includes (e.g., <cstdint> and <cstddef> or <cstdint> and <cstdlib> as appropriate).

Copilot · 2026-03-25T22:02:12Z

src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_sliced_head_to_static.hpp

+#pragma once
+
+#include "openvino/pass/pass.hpp"
+
+class ReshapeSlicedHeadToStatic : public ov::pass::ModelPass {
+    uint32_t m_batch_dim;
+    std::size_t m_max_generation_token_len;
+


[BLOCKER] This header uses uint32_t / std::size_t but doesn't include <cstdint> / <cstddef> (or equivalent) directly. Depending on transitive includes can make builds fail on different compilers or after unrelated include changes; please add the needed standard includes here.

Copilot · 2026-03-25T22:02:12Z

src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.hpp

+class SliceOutEmbeds : public ov::pass::ModelPass {
+    uint32_t m_batch_dim;
+    std::size_t m_max_generation_token_len;
+
+public:
+    OPENVINO_MODEL_PASS_RTTI("ov::npuw::SliceOutEmbeds");
+    explicit SliceOutEmbeds(uint32_t batch_dim, std::size_t max_generation_token_len);
+    bool run_on_model(const std::shared_ptr<ov::Model>& model) override;


[MEDIUM] Several newly extracted transformation passes are now declared in the global namespace in public headers (starting with this one, and similarly in reshape_to_static.hpp, reshape_sliced_head_to_static.hpp, phi3_sliding_mask.hpp, convert_kvcache_to_precision.hpp, moe_transformations/apply_moe_device_routed_transforms.hpp, etc.). This increases collision risk and is inconsistent with existing NPUW transformation headers that scope passes under ov::npuw::pass / ov::npuw::... (e.g., src/plugins/intel_npu/src/plugin/npuw/moe_transformations/gather_to_2d_gather.hpp:10-42). Consider wrapping these class declarations in the appropriate ov::npuw (and/or ov::npuw::pass / ov::npuw::util) namespace to keep symbols scoped to the plugin and match established conventions.

AlexanderKalistratov added 5 commits March 20, 2026 00:29

Moving passes to separate files

aeed188

Merge branch 'master' into npuw_llm_compiled_model_refactoring_pt2

ed25744

Merge branch 'master' into npuw_llm_compiled_model_refactoring_pt2

98faa4b

Merge branch 'master' into npuw_llm_compiled_model_refactoring_pt2

b03bc03

Merge branch 'master' into npuw_llm_compiled_model_refactoring_pt2

2522610

AlexanderKalistratov requested review from a team as code owners March 25, 2026 21:57

AlexanderKalistratov requested a review from Copilot March 25, 2026 21:57

github-actions bot added category: build OpenVINO cmake script / infra category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Mar 25, 2026

Copilot started reviewing on behalf of AlexanderKalistratov March 25, 2026 21:58 View session

dmatveev added this to the 2026.2 milestone Mar 25, 2026

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPUW] Moving passes from llm_compiled_model to separate files#34934

[NPUW] Moving passes from llm_compiled_model to separate files#34934
AlexanderKalistratov wants to merge 5 commits intoopenvinotoolkit:masterfrom
AlexanderKalistratov:npuw_llm_compiled_model_refactoring_pt2

AlexanderKalistratov commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AlexanderKalistratov commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants