Skip to content

Conversation

@Yancey0623
Copy link
Collaborator

@Yancey0623 Yancey0623 commented Mar 1, 2024

add scalar-reduction codegen template , the algorithm comes from https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf

benchmark with PyTorch:

$bsx$seqlenx151936 disc Pytorch
2x768x151936xf32 0.53 ms 0.55ms
2x1024x151936xf32 0.67 ms 0.7 ms
2x2048x151936xf32 1.38 ms 1.4 ms

@Yancey0623 Yancey0623 changed the title [WIP]support scalar reduction support scalar reduction Mar 8, 2024
eedalong
eedalong previously approved these changes Mar 12, 2024
Copy link
Collaborator

@eedalong eedalong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yancey0623 Yancey0623 changed the title support scalar reduction Add scalar reduction codegen schedule Mar 20, 2024
@eedalong eedalong self-requested a review March 22, 2024 02:08
eedalong
eedalong previously approved these changes Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants