[Codegen] Remove ROCDL index==i32; add indexIsI64 to OptimizeIntArithmetic by krzysz00 · Pull Request #23948 · iree-org/iree

krzysz00 · 2026-03-27T16:09:41Z

Integer range analysis now handles narrowing to i32 where safe, making the --iree-rocm-index-bits option (which lowered all ROCDL indices to 32-bit) obsolete. Remove it so the ROCDL path matches NVVM (which always has 64-bit indices at the LLVM conversion level).

Add an indexIsI64 option to OptimizeIntArithmeticPass that relaxes the SAFE_INDEX_UNSIGNED_MAX_VALUE guard on signed-to-unsigned conversions for index values. On LLVMGPU targets where index is always 64-bit, this guard is unnecessarily conservative and blocks valid optimizations. For-loop IV narrowing (NarrowSCFForIvToI32 retains its own range checks unconditionally.)

Performance impact: on whole models, within the noise floor (as expected, this killed off a few instructions) but there is a consistent minor trend on the torch_models CI that gives a 1.01x geometric mean speedup, so there's not much reason not to do this. Table below.

…metic Integer range analysis now handles narrowing to i32 where safe, making the --iree-rocm-index-bits option (which lowered all ROCDL indices to 32-bit) obsolete. Remove it so the ROCDL path matches NVVM (which always has 64-bit indices at the LLVM conversion level). Add an indexIsI64 option to OptimizeIntArithmeticPass that relaxes the SAFE_INDEX_UNSIGNED_MAX_VALUE guard on signed-to-unsigned conversions for index values. On LLVMGPU targets where index is always 64-bit, this guard is unnecessarily conservative and blocks valid optimizations. For-loop IV narrowing (NarrowSCFForIvToI32 retains its own range checks unconditionally.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

krzysz00 · 2026-03-27T16:10:45Z

Benchmark	Baseline (ms)	Test (ms)	Speedup
llama_8b_fp16/decode_benchmark_seq128_mi325	7.638	7.468	1.02x
llama_8b_fp16/decode_benchmark_seq2048_mi325	9.076	8.915	1.02x
llama_8b_fp16/prefill_benchmark_seq128_mi325	31.835	31.821	1.00x
llama_8b_fp16/prefill_benchmark_seq2048_mi325	279.081	277.750	1.00x
llama_8b_fp8/decode_benchmark_seq128_mi325	8.219	7.986	1.03x
llama_8b_fp8/decode_benchmark_seq128_mi325_data_tiling	17.252	17.244	1.00x
llama_8b_fp8/decode_benchmark_seq2048_mi325	11.054	11.034	1.00x
llama_8b_fp8/decode_benchmark_seq2048_mi325_data_tiling	19.907	20.085	0.99x
llama_8b_fp8/prefill_benchmark_seq128_mi325	25.691	25.748	1.00x
llama_8b_fp8/prefill_benchmark_seq128_mi325_data_tiling	24.886	24.987	1.00x
llama_8b_fp8/prefill_benchmark_seq2048_mi325	180.207	180.133	1.00x
llama_8b_fp8/prefill_benchmark_seq2048_mi325_data_tiling	197.122	196.607	1.00x
sdxl/clip_benchmark_mi325	7.266	7.215	1.01x
sdxl/punet_benchmark_mi325	46.146	46.054	1.00x
sdxl/punet_benchmark_mi325_v2	43.660	43.507	1.00x

krzysz00 requested review from Groverkss, Max191, benvanik, kuhar, nirvedhmeshram and qedawkins as code owners March 27, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] Remove ROCDL index==i32; add indexIsI64 to OptimizeIntArithmetic#23948

[Codegen] Remove ROCDL index==i32; add indexIsI64 to OptimizeIntArithmetic#23948
krzysz00 wants to merge 1 commit intoiree-org:mainfrom
krzysz00:index-i64-rocm

krzysz00 commented Mar 27, 2026

Uh oh!

krzysz00 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krzysz00 commented Mar 27, 2026

Uh oh!

krzysz00 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant