Skip to content

[SPEC] PagedAttention #30879

Open
PiotrKrzem wants to merge 10 commits intoopenvinotoolkit:masterfrom
PiotrKrzem:spec/paged_attention
Open

[SPEC] PagedAttention #30879
PiotrKrzem wants to merge 10 commits intoopenvinotoolkit:masterfrom
PiotrKrzem:spec/paged_attention

Conversation

@PiotrKrzem
Copy link
Copy Markdown
Contributor

@PiotrKrzem PiotrKrzem commented Jun 5, 2025

Details:

  • Add Paged Attention spec (25 inputs, 3 outputs version)

Tickets:

  • None

@github-actions github-actions bot added the category: docs OpenVINO documentation label Jun 5, 2025
@rkazants rkazants self-requested a review June 8, 2025 18:08
@github-actions github-actions bot added the Stale label Aug 31, 2025
@github-actions github-actions bot closed this Sep 8, 2025
@PiotrKrzem PiotrKrzem reopened this Mar 5, 2026
@PiotrKrzem
Copy link
Copy Markdown
Contributor Author

Updated spec to explain implementation of reference plugin, and comparison of behavior vs the CPU plugin. Plus I added some relevant notes I'd find useful if anyone ever dares to further explore the topic.
#28815

@openvinotoolkit openvinotoolkit deleted a comment from github-actions bot Mar 5, 2026
@openvinotoolkit openvinotoolkit deleted a comment from github-actions bot Mar 5, 2026
@PiotrKrzem PiotrKrzem marked this pull request as ready for review March 5, 2026 11:58
@PiotrKrzem PiotrKrzem requested a review from a team as a code owner March 5, 2026 11:58
@PiotrKrzem PiotrKrzem requested review from zKulesza and removed request for a team, itikhono and vshampor March 5, 2026 11:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an internal OpenVINO IR operation specification document for the PagedAttentionExtension op (25 inputs / 3 outputs), describing semantics, shapes, platform-specific behavior, and an IR XML example.

Changes:

  • Introduces a new RST spec for PagedAttentionExtension under internal operation specs.
  • Documents inputs/outputs/types, shape inference, known limitations, and CPU vs reference behavior.
  • Provides a detailed algorithm walkthrough and an IR XML example.

Comment on lines +478 to +479
* Allowed for ``key_cache`` (input 4): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``.
* Allowed for ``value_cache`` (input 5): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``.
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Types section, the cache input indices are off by one: key_cache is input 3 and value_cache is input 4 (as listed in the Inputs section/pseudocode), but this text labels them as inputs 4 and 5. Please align the input numbers to avoid confusion for spec consumers.

Suggested change
* Allowed for ``key_cache`` (input 4): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``.
* Allowed for ``value_cache`` (input 5): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``.
* Allowed for ``key_cache`` (input 3): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``.
* Allowed for ``value_cache`` (input 4): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``.

Copilot uses AI. Check for mistakes.
input. **Optional.**

* **24**: ``adaptive_rkv_diversity_block_set_indices_begins`` - 1D tensor of type ``i32``,
shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 24. **Optional.**
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For input 24 (adaptive_rkv_diversity_block_set_indices_begins), the description says it provides offsets into “input 24”, but these offsets should index into the indices list input (23). Please fix the referenced input number so the block-set table definition is correct.

Suggested change
shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 24. **Optional.**
shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 23. **Optional.**

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +286
for s in range(B_seq):
evict_size = adaptive_rkv_evictable_sizes[s]
total_toks = past_lens[s] + new_len_s
start_size = int(adaptive_rkv_start_size)
if total_toks < start_size + evict_size:
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pseudocode for output 2 computation uses new_len_s (e.g., total_toks = past_lens[s] + new_len_s), but new_len_s is not defined in the pseudocode (earlier it defines new_len inside the per-sequence loop). Please either carry per-sequence new_len forward (e.g., store it in an array) or rename the variable consistently so the pseudocode is self-contained.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +13
**Versioned name**: *PagedAttentionExtension*

**Category**: *Internal*

**Short description**: *PagedAttentionExtension* implements causal multi-head attention
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new spec file doesn’t appear to be included in the docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs.rst toctree (that file currently lists only a subset of operation-specs/internal/*). If the intent is for PagedAttentionExtension to be discoverable in the generated docs, please add it to the toctree; otherwise readers won’t be able to navigate to it from the operation specs index.

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot removed the Stale label Mar 6, 2026
Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>
Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>
@github-actions
Copy link
Copy Markdown
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Mar 26, 2026
Extend spec with token_type_ids
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: docs OpenVINO documentation Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants