This developer guide covers debugging and optimization topics not covered in our other documentations: user guide, tutorials, and FAQ. In this guide’s context, developers are the users of the Hexagon-MLIR compiler who want to debug at the IR level and improve performance for their Triton kernels or PyTorch models.
Prerequisites: You have built the Hexagon-MLIR compiler and successfully run the basic vector addition tutorial.
You can enable several IR dumps to trace the compilation lowering process:
TRITON_ALWAYS_COMPILE=1 MLIR_ENABLE_DUMP=1 LLVM_IR_ENABLE_DUMP=1 \
TRITON_ENABLE_LLVM_DEBUG=1 pytest -sv test/python/triton/test_vec_add.pyWe provide ways to dump the final assembly code along with the object binary code:
HEXAGON_ASM_DUMP: Enable assembly code dump. By default, outputs to a random filepath.HEXAGON_ASM_DUMP_FILE: Control the output location for assembly dumps.HEXAGON_ASM_TO_OBJ: Enable object binary code dump.
Note: The filepath environment variable should be set without a file extension suffix, as the system adds the appropriate extension based on the output type.
For additional debugging options, see Triton documentation.
One of the executables built is linalg-hexagon-opt (similar to mlir-opt and includes Hexagon-MLIR passes).
linalg-hexagon-opt --help | grep hexagon
USAGE: linalg-hexagon-opt [options] <input file>
...
--hexagon-add-fastmath - Add fast math flag to selected ops.
--hexagon-fusion - Pass to fuse linalg generic ops
--hexagon-lwp-instrumentation - Pass to instrument and profile with hexagon instrinsic
--hexagon-punt-buffer - erase copies and forward buffer directly to user
...
--hexagonmem-to-llvm - Convert HexagonMem dialect to LLVM
...You can take a valid MLIR IR and lower it to LLVM IR, and see all the passes that get triggered:
linalg-hexagon-opt test.mlir -linalg-to-llvm --mlir-print-ir-after-allThe following Triton language features are currently not supported:
-
Data Types:
- Scalar/boolean return values
- Complex number operations
-
Operations:
- Shape manipulation operations
tl.interleave()functiontl.join()function- Dynamic shape operations
-
Memory:
- Some advanced memory access patterns
- Certain atomic operations
-
Control Flow:
- Complex nested control structures
- Some loop constructs
- MLIR Documentation - MLIR project documentation
- MLIR Dialects - Available MLIR dialects
- OpenAI Triton - Main Triton repository
- Triton Language Reference - Language documentation
- Triton Tutorials - Learning resources
- Triton-Shared Middle Layer - Triton to Linalg toolchain