Commit ca968e4
authored
[NPUW] Fix multiple MatMul matching issue in NPUW LM head cutting (#32475)
### Details:
**Background:**
Eagle 3 pipeline will add new output in target model to get the
intermedium feature embeddings.
The `cut_lm_head` function separates the vocabulary matrix (LM head)
from LLM models for efficient inference. It needs to identify the
correct `MatMul` operation among multiple candidates in the model graph.
**Problem:**
When multiple `MatMul` operations match the pattern (common in LLMs),
the callback executes multiple times, with each execution overwriting
the previous result. Only the last matched `MatMul` is used, often
missing the actual vocabulary matrix.
**Solution:**
Replaced `MatcherPass` with direct traversal and intelligent selection:
1. Collect all candidates instead of using last match
2. Select MatMul with largest matrix size (vocabulary size heuristic)
3. Optimize traversal - iterate Result nodes directly instead of all
nodes
### Tickets:
- [*CVS-175198*](https://jira.devtools.intel.com/browse/CVS-175198)1 parent 4832243 commit ca968e4
1 file changed
+8
-0
lines changedLines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
881 | 881 | | |
882 | 882 | | |
883 | 883 | | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
884 | 892 | | |
885 | 893 | | |
886 | 894 | | |
| |||
0 commit comments