[NPUW] Fix multiple MatMul matching issue in NPUW LM head cutting (#32475)

GuoliangShiIntel · web-flow · commit ca968e4d115a · 2025-12-10T18:52:31.000Z
### Details: **Background:** Eagle 3 pipeline will add new output in target model to get the intermedium feature embeddings. The `cut_lm_head` function separates the vocabulary matrix (LM head) from LLM models for efficient inference. It needs to identify the correct `MatMul` operation among multiple candidates in the model graph. **Problem:** When multiple `MatMul` operations match the pattern (common in LLMs), the callback executes multiple times, with each execution overwriting the previous result. Only the last matched `MatMul` is used, often missing the actual vocabulary matrix. **Solution:** Replaced `MatcherPass` with direct traversal and intelligent selection: 1. Collect all candidates instead of using last match 2. Select MatMul with largest matrix size (vocabulary size heuristic) 3. Optimize traversal - iterate Result nodes directly instead of all nodes ### Tickets: - [*CVS-175198*](https://jira.devtools.intel.com/browse/CVS-175198)
diff --git a/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp b/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp
@@ -881,6 +881,14 @@ class CutLMHead : public ov::pass::MatcherPass {
             auto matched_matmul = std::static_pointer_cast<ov::op::v0::MatMul>(matched_node_matmul);
             auto matched_result = std::static_pointer_cast<ov::op::v0::Result>(matched_node_result);
 
+            // Some LLMs add intermediate hidden state outputs that can interfere with LM head detection.
+            // Skip Result nodes that were manually added (marked with "manually_added_output" in RT_INFO).
+            // For example, Eagle-3 target/draft models add "last_hidden_state" output which should be skipped.
+            const auto& rt_info = matched_result->get_rt_info();
+            if (rt_info.count("manually_added_output")) {
+                return false;
+            }
+
             // Cut point:
             auto matmul_first_source = matched_matmul->input(0).get_source_output();