SwizzleHintOp blocks double-buffering in ROCDLPrefetchSharedMemoryPass

When XOR swizzle is enabled alongside DMA, ROCDLPrefetchSharedMemoryPass currently fails to apply double-buffering.

With swizzle enabled, the alloc becomes flat 1D and swizzle_hint + expand_shape sit between the alloc and the K-loop:
```
%alloc = memref.alloc() : memref<4096xbf16, #gpu.address_space<workgroup>>
// swizzle_hint + expand_shape sit OUTSIDE the K-loop
%hint = iree_codegen.swizzle_hint %alloc[#iree_codegen.xor_shuffle<128, 8>]
  : memref<4096xbf16, #gpu.address_space<workgroup>>
%expand = memref.expand_shape %hint [[0, 1]] output_shape [128, 32]
  : memref<4096xbf16, ...> into memref<128x32xbf16, ...>
scf.for %iv = %c0 to %c16 step %c4 {           // K-loop
  %sub = memref.subview %expand[%off, 0] ...
  amdgpu.gather_to_lds ..., %sub               // DMA write
  vector.transfer_read %expand ...              // MMA read
}
```

`memref::multiBuffer` produces `memref<2x4096xbf16>` with per-iteration subviews (`memref<4096xbf16, strided<[1], offset: ?>>`), but swizzle_hint requires a 1D non-strided memref. Double-buffering silently fails.

Should we:
(a) Clone SwizzleHintOp into the K-loop so it operates on per-iteration subviews, or
(b) Change multi-buffering to produce a flat 1D alloc (memref<8192xbf16> instead of memref<2x4096xbf16>) so swizzle can stay outside the loop?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SwizzleHintOp blocks double-buffering in ROCDLPrefetchSharedMemoryPass #23919

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SwizzleHintOp blocks double-buffering in ROCDLPrefetchSharedMemoryPass #23919

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions