Skip to content

[CPU] Cache FullyConnected weigths in scope of compile model#34492

Open
EgorDuplensky wants to merge 3 commits intoopenvinotoolkit:masterfrom
EgorDuplensky:cache_weights_in_scope_of_compile_model
Open

[CPU] Cache FullyConnected weigths in scope of compile model#34492
EgorDuplensky wants to merge 3 commits intoopenvinotoolkit:masterfrom
EgorDuplensky:cache_weights_in_scope_of_compile_model

Conversation

@EgorDuplensky
Copy link
Contributor

@EgorDuplensky EgorDuplensky commented Mar 4, 2026

This moves it out of the first inference and significantly improves
first token latency on LLMs

Also, it reduces peak memory usage a lot, since most of the weights
where transposed in scope of compile_model call anyway

The current state of test infrastructure does not allow to have a test for this logic

Details:

  • item1
  • ...

Tickets:

  • ticket-id

AI Assistance:

  • AI assistance used: no / yes
  • If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

This moves into out of the first inference and significanlty improves
first token latency on LLMs

Also it reduces peak memory usage alot, since most of the weights
where transposed in scope of compile_model call anyway
@EgorDuplensky EgorDuplensky requested review from a team as code owners March 4, 2026 13:56
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Mar 4, 2026
@EgorDuplensky
Copy link
Contributor Author

@maxnick Could you please take a look

@maxnick maxnick added this to the 2026.1 milestone Mar 4, 2026
@praasz praasz modified the milestones: 2026.1, 2026.2 Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants