Conversation
...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst
Outdated
Show resolved
Hide resolved
...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst
Outdated
Show resolved
Hide resolved
...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst
Outdated
Show resolved
Hide resolved
...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst
Outdated
Show resolved
Hide resolved
...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst
Show resolved
Hide resolved
|
Updated spec to explain implementation of reference plugin, and comparison of behavior vs the CPU plugin. Plus I added some relevant notes I'd find useful if anyone ever dares to further explore the topic. |
There was a problem hiding this comment.
Pull request overview
Adds an internal OpenVINO IR operation specification document for the PagedAttentionExtension op (25 inputs / 3 outputs), describing semantics, shapes, platform-specific behavior, and an IR XML example.
Changes:
- Introduces a new RST spec for
PagedAttentionExtensionunder internal operation specs. - Documents inputs/outputs/types, shape inference, known limitations, and CPU vs reference behavior.
- Provides a detailed algorithm walkthrough and an IR XML example.
| * Allowed for ``key_cache`` (input 4): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``. | ||
| * Allowed for ``value_cache`` (input 5): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``. |
There was a problem hiding this comment.
In the Types section, the cache input indices are off by one: key_cache is input 3 and value_cache is input 4 (as listed in the Inputs section/pseudocode), but this text labels them as inputs 4 and 5. Please align the input numbers to avoid confusion for spec consumers.
| * Allowed for ``key_cache`` (input 4): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``. | |
| * Allowed for ``value_cache`` (input 5): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``. | |
| * Allowed for ``key_cache`` (input 3): ``u4``, ``i8``, ``u8``, ``f16``, ``f32``, ``bf16``. | |
| * Allowed for ``value_cache`` (input 4): ``u4``, ``u8``, ``f16``, ``f32``, ``bf16``. |
| input. **Optional.** | ||
|
|
||
| * **24**: ``adaptive_rkv_diversity_block_set_indices_begins`` - 1D tensor of type ``i32``, | ||
| shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 24. **Optional.** |
There was a problem hiding this comment.
For input 24 (adaptive_rkv_diversity_block_set_indices_begins), the description says it provides offsets into “input 24”, but these offsets should index into the indices list input (23). Please fix the referenced input number so the block-set table definition is correct.
| shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 24. **Optional.** | |
| shape ``[B_seq+1]``, or empty. Per-sequence offsets into input 23. **Optional.** |
| for s in range(B_seq): | ||
| evict_size = adaptive_rkv_evictable_sizes[s] | ||
| total_toks = past_lens[s] + new_len_s | ||
| start_size = int(adaptive_rkv_start_size) | ||
| if total_toks < start_size + evict_size: |
There was a problem hiding this comment.
The pseudocode for output 2 computation uses new_len_s (e.g., total_toks = past_lens[s] + new_len_s), but new_len_s is not defined in the pseudocode (earlier it defines new_len inside the per-sequence loop). Please either carry per-sequence new_len forward (e.g., store it in an array) or rename the variable consistently so the pseudocode is self-contained.
...documentation/openvino-ir-format/operation-sets/operation-specs/internal/paged-attention.rst
Outdated
Show resolved
Hide resolved
| **Versioned name**: *PagedAttentionExtension* | ||
|
|
||
| **Category**: *Internal* | ||
|
|
||
| **Short description**: *PagedAttentionExtension* implements causal multi-head attention |
There was a problem hiding this comment.
This new spec file doesn’t appear to be included in the docs/articles_en/documentation/openvino-ir-format/operation-sets/operation-specs.rst toctree (that file currently lists only a subset of operation-specs/internal/*). If the intent is for PagedAttentionExtension to be discoverable in the generated docs, please add it to the toctree; otherwise readers won’t be able to navigate to it from the operation specs index.
Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>
Signed-off-by: p-wysocki <przemyslaw.wysocki@intel.com>
|
This PR will be closed in a week because of 2 weeks of no activity. |
Extend spec with token_type_ids
Details:
Tickets: