[CPU] Cache FullyConnected weigths in scope of compile model by EgorDuplensky · Pull Request #34492 · openvinotoolkit/openvino

EgorDuplensky · 2026-03-04T13:56:43Z

This moves it out of the first inference and significantly improves
first token latency on LLMs

Also, it reduces peak memory usage a lot, since most of the weights
where transposed in scope of compile_model call anyway

The current state of test infrastructure does not allow to have a test for this logic

Details:

item1
...

Tickets:

ticket-id

AI Assistance:

AI assistance used: no / yes
If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

This moves into out of the first inference and significanlty improves first token latency on LLMs Also it reduces peak memory usage alot, since most of the weights where transposed in scope of compile_model call anyway

EgorDuplensky · 2026-03-04T13:57:21Z

@maxnick Could you please take a look

[CPU] Cache FullyConnected weigths in scope of compile model

3a9ce3c

This moves into out of the first inference and significanlty improves first token latency on LLMs Also it reduces peak memory usage alot, since most of the weights where transposed in scope of compile_model call anyway

EgorDuplensky requested review from a team as code owners March 4, 2026 13:56

github-actions bot added the category: CPU OpenVINO CPU plugin label Mar 4, 2026

EgorDuplensky assigned maxnick Mar 4, 2026

maxnick added this to the 2026.1 milestone Mar 4, 2026

maxnick approved these changes Mar 4, 2026

View reviewed changes

EgorDuplensky added 2 commits March 5, 2026 23:02

Use actual dst memory in case of tensor parallelism

9518d7a

Correct makeDummyOutputShape shape inference

50e46a1

praasz modified the milestones: 2026.1, 2026.2 Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Cache FullyConnected weigths in scope of compile model#34492

[CPU] Cache FullyConnected weigths in scope of compile model#34492
EgorDuplensky wants to merge 3 commits intoopenvinotoolkit:masterfrom
EgorDuplensky:cache_weights_in_scope_of_compile_model

EgorDuplensky commented Mar 4, 2026 •

edited

Loading

Uh oh!

EgorDuplensky commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EgorDuplensky commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

AI Assistance:

Uh oh!

EgorDuplensky commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EgorDuplensky commented Mar 4, 2026 •

edited

Loading