Skip to content

[ONNX FE][Transformation] GroupQueryAttention refine work for NPU static shape design#34980

Draft
bopeng1234 wants to merge 2 commits intoopenvinotoolkit:masterfrom
bopeng1234:onnx_gqa_refine
Draft

[ONNX FE][Transformation] GroupQueryAttention refine work for NPU static shape design#34980
bopeng1234 wants to merge 2 commits intoopenvinotoolkit:masterfrom
bopeng1234:onnx_gqa_refine

Conversation

@bopeng1234
Copy link
Copy Markdown
Contributor

initial refine work for ONNX GQA, insert new kv into the past kv, valid data is in front of the buffer as MLAS does

Details:

The old logic for NPU static shape KV design is:

image The valid KV is in the end of the buffer.

But, refer to MLAS implementation, for static shape KV, the valid data is in the front. onnxruntime gqa

So, this PR applies the changes accordingly.

ScatterUpdate the current KV into the buffer, keep valid data in the front as below shown.
image

Tickets:

AI Assistance:

  • AI assistance used: yes
  • AI was used for new op selection and human valid it with build and test

…id data is in front of the buffer as MLAS does
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: transformations OpenVINO Runtime library - Transformations do not merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant