[SPEC] PagedAttention by PiotrKrzem · Pull Request #30879 · openvinotoolkit/openvino

PiotrKrzem · 2025-06-05T21:05:10Z

Details:

Add Paged Attention spec (25 inputs, 3 outputs version)

Tickets:

None

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst

PiotrKrzem · 2026-03-05T11:55:54Z

Updated spec to explain implementation of reference plugin, and comparison of behavior vs the CPU plugin. Plus I added some relevant notes I'd find useful if anyone ever dares to further explore the topic.
#28815

Copilot

Pull request overview

Adds an internal OpenVINO IR operation specification document for the PagedAttentionExtension op (25 inputs / 3 outputs), describing semantics, shapes, platform-specific behavior, and an IR XML example.

Changes:

Introduces a new RST spec for PagedAttentionExtension under internal operation specs.
Documents inputs/outputs/types, shape inference, known limitations, and CPU vs reference behavior.
Provides a detailed algorithm walkthrough and an IR XML example.

Copilot · 2026-03-05T21:11:41Z

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst

+  * Allowed for ``key_cache`` (input 4): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``.
+  * Allowed for ``value_cache`` (input 5): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``.


In the Types section, the cache input indices are off by one: key_cache is input 3 and value_cache is input 4 (as listed in the Inputs section/pseudocode), but this text labels them as inputs 4 and 5. Please align the input numbers to avoid confusion for spec consumers.

Suggested change

* Allowed for ``key_cache`` (input 4): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``.

* Allowed for ``value_cache`` (input 5): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``.

* Allowed for ``key_cache`` (input 3): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``.

* Allowed for ``value_cache`` (input 4): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``.

Copilot · 2026-03-05T21:11:42Z

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst

+  input.  **Optional.**
+
+* **24**: ``adaptive_rkv_diversity_block_set_indices_begins`` - 1D tensor of type ``i32``,
+  shape ``[B_seq+1]``, or empty.  Per-sequence offsets into input 24.  **Optional.**


For input 24 (adaptive_rkv_diversity_block_set_indices_begins), the description says it provides offsets into “input 24”, but these offsets should index into the indices list input (23). Please fix the referenced input number so the block-set table definition is correct.

Suggested change

shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 24. **Optional.**

shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 23. **Optional.**

Copilot · 2026-03-05T21:11:42Z

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst

+           for s in range(B_seq):
+               evict_size  = adaptive_rkv_evictable_sizes[s]
+               total_toks  = past_lens[s] + new_len_s
+               start_size  = int(adaptive_rkv_start_size)
+               if total_toks < start_size + evict_size:


The pseudocode for output 2 computation uses new_len_s (e.g., total_toks = past_lens[s] + new_len_s), but new_len_s is not defined in the pseudocode (earlier it defines new_len inside the per-sequence loop). Please either carry per-sequence new_len forward (e.g., store it in an array) or rename the variable consistently so the pseudocode is self-contained.

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst

Copilot · 2026-03-05T21:11:42Z

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst

+**Versioned name**: *PagedAttentionExtension*
+
+**Category**: *Internal*
+
+**Short description**: *PagedAttentionExtension* implements causal multi-head attention


This new spec file doesn’t appear to be included in the docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs.rst toctree (that file currently lists only a subset of operation-specs/internal/*). If the intent is for PagedAttentionExtension to be discoverable in the generated docs, please add it to the toctree; otherwise readers won’t be able to navigate to it from the operation specs index.

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

github-actions · 2026-03-26T00:40:17Z

This PR will be closed in a week because of 2 weeks of no activity.

Extend spec with token_type_ids

[ADD] PA spec demo

1c1524e

github-actions bot added the category: docs OpenVINO documentation label Jun 5, 2025

mlukasze requested review from itikhono, luo-cheng2021, maxnick, mitruska and vshampor June 6, 2025 04:47

Update paged-attention.rst

e7f7ce7

rkazants self-requested a review June 8, 2025 18:08

luo-cheng2021 reviewed Jun 13, 2025

View reviewed changes

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst Outdated Show resolved Hide resolved

luo-cheng2021 reviewed Jun 13, 2025

View reviewed changes

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst Outdated Show resolved Hide resolved

luo-cheng2021 reviewed Jun 13, 2025

View reviewed changes

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst Outdated Show resolved Hide resolved

luo-cheng2021 reviewed Jun 13, 2025

View reviewed changes

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst Outdated Show resolved Hide resolved

luo-cheng2021 reviewed Jun 13, 2025

View reviewed changes

...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst Show resolved Hide resolved

Update paged-attention.rst

9a189a9

github-actions bot added the Stale label Aug 31, 2025

github-actions bot closed this Sep 8, 2025

PiotrKrzem reopened this Mar 5, 2026

[ADD] 2026.1 updated spec

041e332

Merge branch 'master' into spec/paged_attention

765d6f0

openvinotoolkit deleted a comment from github-actions bot Mar 5, 2026

PiotrKrzem marked this pull request as ready for review March 5, 2026 11:58

PiotrKrzem requested a review from a team as a code owner March 5, 2026 11:58

PiotrKrzem requested review from zKulesza and removed request for a team, itikhono and vshampor March 5, 2026 11:58

Update paged-attention.rst

410ab64

PiotrKrzem requested a review from Copilot March 5, 2026 21:07

Copilot started reviewing on behalf of PiotrKrzem March 5, 2026 21:08 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

github-actions bot removed the Stale label Mar 6, 2026

Update paged-attention.rst

6f6d3eb

mlukasze requested a review from luo-cheng2021 March 11, 2026 05:46

mitruska mentioned this pull request Mar 11, 2026

[PagedAttention] Add bidirectional attention mask within image groups #34111

Merged

p-wysocki added 2 commits March 12, 2026 13:08

Extend spec with token_type_ids

a4ed99b

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

Update known limitations

e77e83b

Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>

github-actions bot added the Stale label Mar 26, 2026

Merge pull request #1 from p-wysocki/update_pa_spec

ff6e3e5

Extend spec with token_type_ids

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPEC] PagedAttention #30879

[SPEC] PagedAttention #30879
PiotrKrzem wants to merge 10 commits intoopenvinotoolkit:masterfrom
PiotrKrzem:spec/paged_attention

PiotrKrzem commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PiotrKrzem commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		* Allowed for ``key_cache`` (input 4): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``.
		* Allowed for ``value_cache`` (input 5): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``.

	shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 24. Optional.
	shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 23. Optional.

Conversation

PiotrKrzem commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PiotrKrzem commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PiotrKrzem commented Jun 5, 2025 •

edited

Loading