- GPU MODE Lecture 14: Practitioners Guide to Triton
- Flash-Decoding for long-context inference
- Deep Dive on the Hopper TMA Unit for FP8 GEMMs
- Persistent Matmul
- Matrix Multiplication Background User's Guide
- Deep Dive on CUTLASS Ping-Pong GEMM Kernel
- Accelerating 2D Dynamic Block Quantized Float8 GEMMs in Triton
- SmoothQuant paper
- SmoothQuant repo
week09_inference_algorithms
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||