Replies: 2 comments
-
|
Sorry just found #692. But maybe good to hear an update on this issue, given the recent interest in small |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
I just learned that TRT-LLM has implemented split-K grouped gemm in https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/splitk_gemm_grouped.h |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is grouped GEMM with split k supported? I'm interested in both 2.x and 3.x API.
The motivation is the blog by Meta https://pytorch.org/blog/accelerating-moe-model/#30-work-decomposition---splitk, which found that for small
Mproblems typical in LLM inference, split K works really well.@hwu36
Beta Was this translation helpful? Give feedback.
All reactions