Skip to content

[LLVMGPU] Outer Reductions should use TileAndFuse Pipeline for better performance#23988

Open
YashDeshpande25 wants to merge 1 commit intoiree-org:mainfrom
YashDeshpande25:set_TAF_for_outerReductions
Open

[LLVMGPU] Outer Reductions should use TileAndFuse Pipeline for better performance#23988
YashDeshpande25 wants to merge 1 commit intoiree-org:mainfrom
YashDeshpande25:set_TAF_for_outerReductions

Conversation

@YashDeshpande25
Copy link
Copy Markdown
Contributor

@YashDeshpande25 YashDeshpande25 commented Apr 1, 2026

This PR adds a check to select LLVMGPUTileAndFuse Pipeline when dealing with an outer reduction. This change is based on observed performance improvement when using TileAndFuse against VectorDistribute for outer reductions.

Summary of Stats for Context

On MI300X

  • 21 shapes improved (by avg 105us)
  • 0 regressions
  • 10 shapes showed minimal change

On MI355

  • 20 shapes improved (by avg 101us ~43%)
  • 0 regressions
  • 11 shapes showed minimal change

All comparisons were made against the VectorDistribute timings on the respective machines.

@Groverkss
Copy link
Copy Markdown
Contributor

I think we need to talk about this a bit more

@Groverkss
Copy link
Copy Markdown
Contributor

Groverkss commented Apr 1, 2026

On MI355
20 shapes improved (by avg 101us ~43%)
0 regressions
11 shapes showed minimal change
All comparisons were made against the default TileAndFuse timings on the respective machines.

Can you post specific inputs?

@YashDeshpande25 YashDeshpande25 force-pushed the set_TAF_for_outerReductions branch 3 times, most recently from c1e0653 to 9d4dc1b Compare April 2, 2026 21:09
@YashDeshpande25
Copy link
Copy Markdown
Contributor Author

On MI355
20 shapes improved (by avg 101us ~43%)
0 regressions
11 shapes showed minimal change
All comparisons were made against the default TileAndFuse timings on the respective machines.

Can you post specific inputs?

Discussed in meeting.

Signed-off-by: Yash Deshpande <ydeshpan@amd.com>
@YashDeshpande25 YashDeshpande25 force-pushed the set_TAF_for_outerReductions branch from 9d4dc1b to 91ef99c Compare April 2, 2026 21:23
@YashDeshpande25 YashDeshpande25 marked this pull request as ready for review April 2, 2026 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants