Skip to content

fuse dynamic per group padding into cutedsl 2d MXFP8 quantization kernel #4184

@danielvegamyhre

Description

@danielvegamyhre

For the non-EP case, we need to fuse per group padding to nearest multiple of 32/128 into the MXFP8 quantization kernel added in #4156 , to avoid the expensive extra copy in the standalone padding kernel.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions