Skip to content

SwizzleHintOp blocks double-buffering in ROCDLPrefetchSharedMemoryPass #23919

@Yu-Zhewen

Description

@Yu-Zhewen

When XOR swizzle is enabled alongside DMA, ROCDLPrefetchSharedMemoryPass currently fails to apply double-buffering.

With swizzle enabled, the alloc becomes flat 1D and swizzle_hint + expand_shape sit between the alloc and the K-loop:

%alloc = memref.alloc() : memref<4096xbf16, #gpu.address_space<workgroup>>
// swizzle_hint + expand_shape sit OUTSIDE the K-loop
%hint = iree_codegen.swizzle_hint %alloc[#iree_codegen.xor_shuffle<128, 8>]
  : memref<4096xbf16, #gpu.address_space<workgroup>>
%expand = memref.expand_shape %hint [[0, 1]] output_shape [128, 32]
  : memref<4096xbf16, ...> into memref<128x32xbf16, ...>
scf.for %iv = %c0 to %c16 step %c4 {           // K-loop
  %sub = memref.subview %expand[%off, 0] ...
  amdgpu.gather_to_lds ..., %sub               // DMA write
  vector.transfer_read %expand ...              // MMA read
}

memref::multiBuffer produces memref<2x4096xbf16> with per-iteration subviews (memref<4096xbf16, strided<[1], offset: ?>>), but swizzle_hint requires a 1D non-strided memref. Double-buffering silently fails.

Should we:
(a) Clone SwizzleHintOp into the K-loop so it operates on per-iteration subviews, or
(b) Change multi-buffering to produce a flat 1D alloc (memref<8192xbf16> instead of memref<2x4096xbf16>) so swizzle can stay outside the loop?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions