Skip to content

[NPUW] Moving passes from llm_compiled_model to separate files#34934

Open
AlexanderKalistratov wants to merge 5 commits intoopenvinotoolkit:masterfrom
AlexanderKalistratov:npuw_llm_compiled_model_refactoring_pt2
Open

[NPUW] Moving passes from llm_compiled_model to separate files#34934
AlexanderKalistratov wants to merge 5 commits intoopenvinotoolkit:masterfrom
AlexanderKalistratov:npuw_llm_compiled_model_refactoring_pt2

Conversation

@AlexanderKalistratov
Copy link
Contributor

Extract all inline OpenVINO passes and transformation logic from llm_compiled_model.cpp and llm_compiled_model_utils.cpp into dedicated source files. This PR intentionally doesn't change names or namespaces and not doing any other changes. Just moving code.

All common transformation are put into npuw_transformations/ folder:

  • ConvertKVCacheToPrecision: convert_kvcache_to_precision.cpp/.hpp
  • DecomposeGQA: decompose_gqa.cpp/.hpp
  • Phi3SlidingMask: phi3_sliding_mask.cpp/.hpp
  • PatchPhi3SlidingMask: patch_phi3_sliding_mask.cpp/.hpp
  • ReshapeToStatic: reshape_to_static.cpp/.hpp
  • ReshapeSlicedHeadToStatic: reshape_sliced_head_to_static.cpp/.hpp
  • SliceOutEmbeds: slice_out_embeds.cpp/.hpp
  • LoraStatefulToStatelessPass: lora_stateful_to_stateless.cpp/.hpp
  • OptimizeValueTensors: optimize_value_tensors.cpp/.hpp (previously in llm_compiled_model_utils.cpp)
  • kv_axes_position.hppKVAxesPosition struct (previously in anonymous namespace in llm_compiled_model.hpp)

All code related to embedding models are moved to embedding/ folder:

  • RemoveEmptyKVInputs: remove_empty_kv_inputs.cpp/.hpp
  • RedirectNewKvToOutput: redirect_new_kv_to_output.cpp/.hpp
  • embedding_infer_request.cpp/.hpp and prepare_embedding_model.cpp/.hpp are moved

MoE related transformation are moved to moe_transformations/ folder:

  • ApplyMoEDeviceRoutedTransforms: apply_moe_device_routed_transforms.cpp/.hpp
  • The rest of MoE related code was already there

Whisper related transformations are mode to whisper/ folder:

  • PrepareWhisperPrefillModel, PrepareWhisperKVCacheModel: prepare_whisper_model.cpp/.hpp (prevoiusly were in llm_compiled_model_utils.cpp)
  • whisper_infer_request.cpp/.hpp were moved

AI Assistance:

AI assistance used yes
AI moved most of the code, reviewed and check verified manually

@AlexanderKalistratov AlexanderKalistratov requested review from a team as code owners March 25, 2026 21:57
@github-actions github-actions bot added category: build OpenVINO cmake script / infra category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Mar 25, 2026
@dmatveev dmatveev added this to the 2026.2 milestone Mar 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Intel NPUW plugin implementation by extracting previously inline LLM/Whisper/Embedding/MoE transformation passes and related logic from the large llm_compiled_model*.cpp/hpp and llm_compiled_model_utils.* files into dedicated translation units under clearer subfolders (npuw_transformations/, embedding/, whisper/, moe_transformations/). This primarily improves code organization and compilation-unit boundaries while aiming to preserve existing behavior.

Changes:

  • Moved multiple OpenVINO ModelPass / matcher-pass implementations into new dedicated .cpp/.hpp files under npuw_transformations/, embedding/, whisper/, and moe_transformations/.
  • Updated LLMCompiledModel and unit tests to include the new headers and compile the new sources.
  • Introduced KVAxesPosition as a shared struct header (kv_axes_position.hpp) instead of an anonymous-namespace definition.

Reviewed changes

Copilot reviewed 39 out of 40 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/plugins/intel_npu/tests/unit/npuw/transpose_vt.cpp Adjusts includes to use the extracted OptimizeValueTensors header.
src/plugins/intel_npu/tests/unit/npuw/llm_compiled_model_factory_options_test.cpp Updates includes to use extracted Whisper preparation passes.
src/plugins/intel_npu/tests/unit/CMakeLists.txt Adds newly extracted plugin sources to the unit test target build.
src/plugins/intel_npu/src/plugin/npuw/whisper/whisper_infer_request.hpp Fixes include paths after moving Whisper code into a subfolder.
src/plugins/intel_npu/src/plugin/npuw/whisper/whisper_infer_request.cpp Fixes include paths after moving Whisper code into a subfolder.
src/plugins/intel_npu/src/plugin/npuw/whisper/prepare_whisper_model.hpp New header for Whisper model preparation passes.
src/plugins/intel_npu/src/plugin/npuw/whisper/prepare_whisper_model.cpp New implementation of Whisper model preparation transformations.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.hpp New header for SliceOutEmbeds pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/slice_out_embeds.cpp New implementation for SliceOutEmbeds pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_to_static.hpp New header for ReshapeToStatic pass and KVAxesPosition usage.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_to_static.cpp New implementation for ReshapeToStatic logic.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_sliced_head_to_static.hpp New header for ReshapeSlicedHeadToStatic pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/reshape_sliced_head_to_static.cpp New implementation for reshaping sliced LM head inputs.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/phi3_sliding_mask.hpp New header for Phi3SlidingMask transformation pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/phi3_sliding_mask.cpp New implementation of Phi-3 sliding window attention mask patching.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/patch_phi3_sliding_mask.hpp New wrapper pass header to apply Phi-3 sliding mask transformations.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/patch_phi3_sliding_mask.cpp New wrapper pass implementation.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/optimize_value_tensors.hpp New header for OptimizeValueTensors pass (SDPA decomposition + V tensor transpose).
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/optimize_value_tensors.cpp New implementation for OptimizeValueTensors.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/lora_stateful_to_stateless.hpp New header for LoRA stateful→stateless conversion pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/lora_stateful_to_stateless.cpp New implementation for LoRA state conversion.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/kv_axes_position.hpp New shared KVAxesPosition struct header.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/decompose_gqa.hpp New header for DecomposeGQA pass.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/decompose_gqa.cpp New implementation for GroupQueryAttention decomposition.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/convert_kvcache_to_precision.hpp New header for KV-cache low-precision conversion and type selection helper.
src/plugins/intel_npu/src/plugin/npuw/npuw_transformations/convert_kvcache_to_precision.cpp New implementation for KV-cache precision conversion + FP8 optimization helper.
src/plugins/intel_npu/src/plugin/npuw/moe_transformations/apply_moe_device_routed_transforms.hpp New header for applying DEVICE_ROUTED MoE transformations.
src/plugins/intel_npu/src/plugin/npuw/moe_transformations/apply_moe_device_routed_transforms.cpp New implementation applying MoE transforms in sequence.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model_utils.hpp Removes pass class declarations that were moved out.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model_utils.cpp Removes moved transformation logic, leaving shared helpers like has_input.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.hpp Replaces anonymous KVAxesPosition with shared header include.
src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp Switches to includes of the extracted passes/transforms and removes inlined implementations.
src/plugins/intel_npu/src/plugin/npuw/embedding/remove_empty_kv_inputs.hpp New header for embedding-model pass to drop empty KV inputs.
src/plugins/intel_npu/src/plugin/npuw/embedding/remove_empty_kv_inputs.cpp New implementation for removing empty KV inputs in embedding models.
src/plugins/intel_npu/src/plugin/npuw/embedding/redirect_new_kv_to_output.hpp New header for embedding-model pass to redirect new KV tensors to outputs.
src/plugins/intel_npu/src/plugin/npuw/embedding/redirect_new_kv_to_output.cpp New implementation for redirecting new KV tensors to outputs.
src/plugins/intel_npu/src/plugin/npuw/embedding/prepare_embedding_model.hpp New header for embedding model preparation pass.
src/plugins/intel_npu/src/plugin/npuw/embedding/prepare_embedding_model.cpp Updated implementation moved from old embedding utils, with updated include paths.
src/plugins/intel_npu/src/plugin/npuw/embedding/embedding_infer_request.hpp Fixes include paths after moving embedding infer request into a subfolder.
src/plugins/intel_npu/src/plugin/npuw/embedding/embedding_infer_request.cpp Fixes include paths after moving embedding infer request into a subfolder.

Comment on lines +5 to +12
#pragma once

#include "openvino/pass/pass.hpp"

class SliceOutEmbeds : public ov::pass::ModelPass {
uint32_t m_batch_dim;
std::size_t m_max_generation_token_len;

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] This header uses uint32_t and std::size_t but does not include the headers that define them. It currently relies on transitive includes from OpenVINO headers, which is brittle and can break with include-order changes. Add explicit includes (e.g., <cstdint> and <cstddef> or <cstdint> and <cstdlib> as appropriate).

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +12
#pragma once

#include "openvino/pass/pass.hpp"

class ReshapeSlicedHeadToStatic : public ov::pass::ModelPass {
uint32_t m_batch_dim;
std::size_t m_max_generation_token_len;

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] This header uses uint32_t / std::size_t but doesn't include <cstdint> / <cstddef> (or equivalent) directly. Depending on transitive includes can make builds fail on different compilers or after unrelated include changes; please add the needed standard includes here.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +16
class SliceOutEmbeds : public ov::pass::ModelPass {
uint32_t m_batch_dim;
std::size_t m_max_generation_token_len;

public:
OPENVINO_MODEL_PASS_RTTI("ov::npuw::SliceOutEmbeds");
explicit SliceOutEmbeds(uint32_t batch_dim, std::size_t max_generation_token_len);
bool run_on_model(const std::shared_ptr<ov::Model>& model) override;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Several newly extracted transformation passes are now declared in the global namespace in public headers (starting with this one, and similarly in reshape_to_static.hpp, reshape_sliced_head_to_static.hpp, phi3_sliding_mask.hpp, convert_kvcache_to_precision.hpp, moe_transformations/apply_moe_device_routed_transforms.hpp, etc.). This increases collision risk and is inconsistent with existing NPUW transformation headers that scope passes under ov::npuw::pass / ov::npuw::... (e.g., src/plugins/intel_npu/src/plugin/npuw/moe_transformations/gather_to_2d_gather.hpp:10-42). Consider wrapping these class declarations in the appropriate ov::npuw (and/or ov::npuw::pass / ov::npuw::util) namespace to keep symbols scoped to the plugin and match established conventions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants