Skip to content

[GPU] Add optimized permute kernel for B↔F axis swap (order [1,0,2,3])#34905

Open
andrew-k-park wants to merge 1 commit intoopenvinotoolkit:masterfrom
andrew-k-park:permute_b_f_opt
Open

[GPU] Add optimized permute kernel for B↔F axis swap (order [1,0,2,3])#34905
andrew-k-park wants to merge 1 commit intoopenvinotoolkit:masterfrom
andrew-k-park:permute_b_f_opt

Conversation

@andrew-k-park
Copy link
Contributor

Details:

  • Add optimized GPU permute kernel (permute_b_f_axes) for B↔F axis swap pattern (order [1,0,2,3] and higher-dim equivalents)
  • Vectorize memory access along the contiguous X dimension using vload/vstore (16-byte transactions per work-item)
  • 43 unit tests added covering various shapes, data types, and edge cases — each test verifies the new kernel produces identical results to the reference implementation

Tickets:

AI Assistance:

  • AI assistance used: yes
  • assist with kernel implementation, test case generation

@andrew-k-park andrew-k-park requested review from a team as code owners March 25, 2026 05:46
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Mar 25, 2026
@andrew-k-park andrew-k-park force-pushed the permute_b_f_opt branch 2 times, most recently from 1e30ee0 to 8cd4d82 Compare March 25, 2026 23:59
Signed-off-by: Andrew Park <andrew.park@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant