Minor improvements to `token_type_ids` extension for PA by p-wysocki · Pull Request #34661 · openvinotoolkit/openvino

p-wysocki · 2026-03-12T11:34:01Z

Details:

Implements code review from [PagedAttention] Add bidirectional attention mask within image groups #34111

Tickets:

N/A

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

…nto attn_idea_2 Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

…into attn_idea_2 Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

…into attn_fixes Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

CuriousPanCake · 2026-03-12T12:09:27Z

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp

    // Shared flag to track whether the model is Gemma3, set when any layer matches
    // the gptoss_gemma3 sliding window pattern. Combined with the token_type_ids check,
    // this uniquely identifies Gemma3 (gpt-oss shares the pattern but lacks token_type_ids).
-    auto is_gptoss_gemma3 = std::make_shared<bool>(false);


Can we define this variable inside the callback?

Agree it looks strange is required to define it outside

Gemma3 has a repeating sequence of attention layers: 5x sliding window attention, 1x full attention. The pattern we currently have detects sliding window, but token_type_ids has to be passed to full attention layers as well.

has_token_type_ids is defined outside of the callback as shared_ptr, because it has to stay consistent between all lambda callbacks - since lambda's capture is =, it gets a new shared pointer to the object. Without it, the token_type_ids would be routed to PA only for sliding window PAa, and not for full attention PAs.

Technically we could detect full attention pattern to avoid gpt-oss/gemma3 mixup and do the same trick, but then the first 5x sliding window attentions would not receive token_type_ids input, because only the first full attention layer (6th in line) would set the variable to true.

Summing up, it may not be as clean as I'd like it to be, but it works. If you insist that this piece of code will cause issues I can keep looking for a universal pattern which would:

separate gpt-oss and gemma3

work for both sliding window and full attention layers

Yeah, it's a little dirty solution, but if there's no other option, I believe we can live with it.

I can keep looking for a universal pattern

Any ideas of what this could be?

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp

mitruska · 2026-03-12T11:59:54Z

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp

            sliding_window = std::make_shared<v1::Subtract>(v0::Constant::create(element::i32, Shape{}, {2}), offset);
        } else if (pattern_map.count(gptoss_gemma3_offset)) {
-            *is_gptoss_gemma3 = true;
+            is_gemma3 = optional_model_wide_params.count("token_type_ids");


In fact any model with token_type_ids and matching sliding window pattern will set this is_gemma3 flag true, why not simply name this variable has_token_type_ids?
Or set has_sliding_window here instead, and use below.
Also currently is_gemma3 will be false for causal mask case (no sliding window) within the same model.

I renamed the variable, and regarding the no sliding window case, the explanation is provided in #34661 (comment).

mitruska · 2026-03-12T12:19:55Z

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp

+        if (is_gemma3) {
            pa_arguments.insert(pa_arguments.begin() + 25, handle_gemma3_token_type_ids(optional_model_wide_params));
        } else {
            pa_arguments.insert(pa_arguments.begin() + 25, v0::Constant::create(element::i32, Shape{0}, {}));


The variable naming is tight to gemma3 but it can be generic for any model having has_token_type_ids and has_sliding_window true.
It is currently applied for sliding_window case only, but as a next step it could be extended to causal case as well then this if else will be reduced to single case:

pa_arguments.insert(pa_arguments.begin() + 25, handle_token_type_ids(optional_model_wide_params));

Suggested change

if (is_gemma3) {

pa_arguments.insert(pa_arguments.begin() + 25, handle_gemma3_token_type_ids(optional_model_wide_params));

} else {

pa_arguments.insert(pa_arguments.begin() + 25, v0::Constant::create(element::i32, Shape{0}, {}));

if (has_sliding_window) {

pa_arguments.insert(pa_arguments.begin() + 25, handle_token_type_ids(optional_model_wide_params));

} else {

pa_arguments.insert(pa_arguments.begin() + 25, v0::Constant::create(element::i32, Shape{0}, {}));

I changed the variable name. The token_type_ids is currently working also for causal case, see #34661 (comment).

…into attn_fixes Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Copilot

Pull request overview

This PR refines Gemma3 token_type_ids handling for the SDPA→PagedAttention transformation and strengthens PagedAttentionExtension type-propagation coverage around the newly-supported token_type_ids ranks.

Changes:

Add type-prop tests validating token_type_ids acceptance for rank-1/rank-2, dynamic shape, and invalid type/rank cases.
Simplify token_type_ids retrieval/conversion in the Gemma3 path by assuming presence when the Gemma3 condition is met and avoiding an internal fallback.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/core/tests/type_prop/paged_attention.cpp`	Adds dedicated type-prop tests for `token_type_ids` rank/type validation.
`src/common/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp`	Adjusts Gemma3 detection flagging and streamlines `token_type_ids` handling (convert-to-i32 when needed).

Copilot · 2026-03-31T05:21:10Z

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp

+    // Set to true once a sliding_attention layer matching the gptoss_gemma3 pattern is found
+    // alongside a token_type_ids model input - the combination that uniquely identifies Gemma3
+    // since pattern for full attention mask in Gemma3 is different than sliding window
+    // it has to be persistent in the callback, so shared_ptr is used
+    auto has_token_type_ids = std::make_shared<bool>(false);


[LOW] has_token_type_ids is used as a persisted “Gemma3 detected / enable token_type_ids wiring” flag (it’s only updated when the gptoss_gemma3 sliding-window pattern matches), so the name is misleading—there are cases where the model may have a token_type_ids input but this flag stays false until that pattern is seen. Consider renaming it to something like is_gemma3 / enable_gemma3_token_type_ids to reflect the actual semantics and reduce the chance of future misuse.

p-wysocki added 30 commits February 9, 2026 14:49

WIP

1abe8ec

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

WIP

34763dd

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

WIP

e7f8238

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

WIP

daef794

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Add tests

fdb3a73

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

initial clenaup

829c430

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Set input as optional

a04d165

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Correct tests

96188fb

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Remove reshape from graph

4d9f607

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Remove debug prints

1f6a6d1

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Clenaup

2bf68b5

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Merge branch 'master' into attn_idea_2

bcbb855

Sliding window working

ed3374d

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Move sw to gptoss logic

0fd5001

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Working, with debug prints

5fbbf5e

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Cleanup

81ab320

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Merge branch 'attn_idea_2' of https://github.com/p-wysocki/openvino i…

4e0d5ac

…nto attn_idea_2 Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Cleanup

5f5af24

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

update copyright

362cb80

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Fix transformation tests, add new one

7841c70

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Fix convert input tests

9810dc3

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Fix clang

890804b

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Fix smoke tests

6a62dda

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Fix smoke test

dfc6e1f

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Update GPU input count

a412f6c

CR

810130d

Add token_type_ids to gemma only

acbd73e

Fix gpu test

2da2303

Merge branch 'master' into attn_idea_2

03e9935

Merge branch 'master' of https://github.com/openvinotoolkit/openvino …

e2347e1

…into attn_idea_2 Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

p-wysocki added 3 commits March 11, 2026 10:02

Fix conflict issues

7176099

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Apply CR

af07897

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Merge branch 'master' of https://github.com/openvinotoolkit/openvino …

c99b9c5

…into attn_fixes Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

p-wysocki requested review from CuriousPanCake and mitruska March 12, 2026 11:34

p-wysocki requested review from a team as code owners March 12, 2026 11:34

github-actions bot added category: Core OpenVINO Core (aka ngraph) category: transformations OpenVINO Runtime library - Transformations labels Mar 12, 2026

p-wysocki mentioned this pull request Mar 12, 2026

[PagedAttention] Add bidirectional attention mask within image groups #34111

Merged

CuriousPanCake reviewed Mar 12, 2026

View reviewed changes

...mon/transformations/src/transformations/sdpa_to_paged_attention/state_management_pattern.cpp Outdated Show resolved Hide resolved

mitruska reviewed Mar 12, 2026

View reviewed changes

p-wysocki added 3 commits March 19, 2026 11:03

Merge branch 'master' of https://github.com/openvinotoolkit/openvino …

7e6ba0e

…into attn_fixes Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

working

84042e8

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Apply CR

6bc0315

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

p-wysocki requested review from CuriousPanCake, mitruska and praasz March 19, 2026 12:58

CuriousPanCake requested a review from mryzhov March 20, 2026 11:54

mlukasze requested a review from Copilot March 31, 2026 05:16

Copilot started reviewing on behalf of mlukasze March 31, 2026 05:17 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor improvements to `token_type_ids` extension for PA#34661

Minor improvements to `token_type_ids` extension for PA#34661
p-wysocki wants to merge 36 commits intoopenvinotoolkit:masterfrom
p-wysocki:attn_fixes

p-wysocki commented Mar 12, 2026

Uh oh!

CuriousPanCake Mar 12, 2026

Uh oh!

praasz Mar 17, 2026

Uh oh!

p-wysocki Mar 19, 2026

Uh oh!

CuriousPanCake Mar 20, 2026

Uh oh!

Uh oh!

mitruska Mar 12, 2026

Uh oh!

p-wysocki Mar 19, 2026

Uh oh!

mitruska Mar 12, 2026

Uh oh!

p-wysocki Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

p-wysocki commented Mar 12, 2026

Details:

Tickets:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants