We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 6462a41 commit aa61c38Copy full SHA for aa61c38
torchao/kernel/intmm.py
@@ -136,9 +136,9 @@ def _int_scaled_matmul_cpu(
136
) -> torch.Tensor:
137
"""
138
CPU-optimized path for scaled integer matrix multiplication.
139
- It goes to u8s8 or s8s8 path based on ISA support for
140
- hardware. The selection is for performance only and both paths
141
- should work regardless of ISA support.
+ CPU prefers decomposed version to leverage the fusion capability of Inductor.
+ It goes to u8s8 or s8s8 path based on ISA support for hardware. The selection
+ is for performance only and both paths should work regardless of ISA support.
142
143
Args:
144
a (torch.Tensor): The first matrix to multiply (int8).
0 commit comments