[AIE] Add TTI overrides to prevent sub-512-bit vector arithmetic by FIM43-Redeye · Pull Request #839 · Xilinx/llvm-aie

FIM43-Redeye · 2026-03-11T01:17:23Z

Summary

Fixes #480. The GlobalISel Legalizer only has rules for 512-bit vector
arithmetic (V64S8, V32S16, V16S32), but middle-end vectorizers create
sub-512-bit vector ops that crash during legalization at -O2:

fatal error: unable to legalize instruction:
  %14:_(<64 x s8>) = G_SHL %4:_, %12:_(<64 x s8>)

Three TTI overrides in AIEBaseTargetTransformInfo.h prevent this:

getRegisterBitWidth(RGK_FixedWidthVector) = 512 -- stops the loop
vectorizer from choosing sub-512-bit VF (e.g., VF=4 for i8)
getMinVectorRegisterBitWidth() = 512 -- prevents SLP/VectorCombine
from creating sub-512-bit vectors
getArithmeticInstrCost() returns Invalid for sub-512-bit vectors --
stops SLP's cost model from treating e.g. <8 x i8> add as profitable

Test plan

All 1824 AIE CodeGen lit tests pass (no regressions)
New regression test covers all three vectorizer paths:
- Large trip count (LoopVectorize creates <64 x i8> not <4 x i8>)
- Small trip count after unrolling (SLP does not create <8 x i8>)
- i16 variant (LoopVectorize creates <32 x i16> not <2 x i16>)
MWE from error in backend: unable to legalize instruction #480 compiles cleanly at -O2

Copilot

Pull request overview

This PR addresses llvm-aie#480 by updating AIE TargetTransformInfo (TTI) to steer LLVM’s middle-end vectorizers away from producing sub-512-bit vector arithmetic ops that AIE2’s GlobalISel Legalizer cannot legalize, avoiding backend crashes at -O2.

Changes:

Add AIE TTI overrides to force fixed-width vector register width/min vector register bit width to 512 bits.
Make arithmetic cost modeling return “Invalid” for sub-512-bit fixed vectors to discourage SLP from forming illegal vector ops.
Add a new opt-level regression test to ensure vectorizers don’t form sub-512-bit vector adds on AIE2 and that the full pipeline compiles.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
llvm/lib/Target/AIE/AIEBaseTargetTransformInfo.h	Adds TTI overrides to bias vectorization toward 512-bit vectors and block sub-512-bit arithmetic vectorization.
llvm/test/CodeGen/AIE/opt/no-subregister-vectorize-arithmetic.ll	New regression test covering loop vectorization and SLP/unroll scenarios to ensure no sub-512-bit vector arithmetic is formed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T01:24:23Z

+    if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+      if (VTy->getPrimitiveSizeInBits() < 512)
+        return InstructionCost::getInvalid();
+    }


getArithmeticInstrCost only returns Invalid for fixed vectors with total width < 512. But AIE2 GlobalISel legalization for G_ADD/G_SUB is limited to specific 512-bit element types (<64 x i8>, <32 x i16>, <16 x i32>); other 512-bit vectors (e.g. <8 x i64>) are still illegal and can trigger the same legalization failure. Consider returning Invalid unless the fixed vector type is one of the actually-legal arithmetic vector types, not just based on total bit width.

Good catch -- I checked all four AIE legalizer implementations to make sure the bit-width threshold is correct across the board:

Target Legal G_ADD/G_SUB vector types Width

AIE1 None (scalar S32 only) -

AIE2 V16S32, V32S16, V64S8 All 512-bit

AIE2P V16S32, V32S16, V64S8 All 512-bit

AIE2PS V16S32, V32S16, V64S8 All 512-bit

AIE2P and AIE2PS even have the comment "AIE ISA supports only 512-bit vector add/sub/xor" in their legalizer source. So the >= 512 check is correct for every current target -- no AIE variant legalizes <8 x i64> or any other non-standard 512-bit type.

Querying LegalizerInfo directly from TTI would be the ideal solution, but it is a cross-layer violation (TTI operates on LLVM IR types during mid-level optimization; LegalizerInfo uses LLT machine-level types and is only initialized when GlobalISel is active). No upstream LLVM target queries LegalizerInfo from TTI -- they all duplicate the type knowledge.

Happy to add an inline comment referencing AIE2LegalizerInfo.cpp (and the other three) as the source of truth, so future maintainers know where to look if a new target adds different legal widths.

martien-de-jong · 2026-03-11T08:49:15Z

-  call void @llvm.aie2.release(i32 48, i32 1)
-  call void @llvm.aie2.release(i32 51, i32 1)
+  call void @llvm.aie2.release(i32 49, i32 1)
+  call void @llvm.aie2.release(i32 50, i32 1)


Why is this a simplification?

The original lock IDs (acq 49/50 and rel 48/51) were simulating a double-buffer producer/consumer pattern, but the test only uses the lock intrinsics as optimization barriers to make the scheduling fences that give SLP something to vectorize. Actual lock semantics should be irrelevant to what teh test validates.

Copilot flagged the mismatch and suggested matching the release and acquire IDs for readability, and since the test passes either way and the lock behavior isn't under test, it seemed reasonable. Simplification was the wrong word, readability would have been more accurate.

I can revert this if the original IDs should be preserved.

Not on my behalf. I just didn't see a connection between lock IDs and vectorization.

…inx#480) The GlobalISel Legalizer only has rules for 512-bit vector arithmetic (V64S8, V32S16, V16S32). Without TTI guidance, middle-end vectorizers create sub-512-bit vector ops that crash during legalization: fatal error: unable to legalize instruction: %14:_(<64 x s8>) = G_SHL %4:_, %12:_(<64 x s8>) Three TTI overrides in AIEBaseTargetTransformInfo prevent this: - getRegisterBitWidth(RGK_FixedWidthVector) = 512: stops the loop vectorizer from choosing sub-512-bit VF (e.g., VF=4 for i8). - getMinVectorRegisterBitWidth() = 512: prevents SLP/VectorCombine from creating sub-512-bit vectors. - getArithmeticInstrCost() returns Invalid for sub-512-bit vectors: stops SLP's cost model from treating e.g. <8 x i8> add as profitable (vector cost 1 vs scalar cost 8). Fixes Xilinx#480.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

FIM43-Redeye · 2026-04-07T18:40:21Z

Rebased onto latest aie-public (clean, no conflicts).

Just checking in -- the technical discussion from March seems resolved (martien-de-jong's question about lock IDs was answered). This fix prevents real crashes at -O2 when the loop vectorizer creates sub-512-bit vectors that GlobalISel can't legalize. We've been carrying this patch locally and it's been solid.

Happy to make any changes if there's feedback.

FIM43-Redeye requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, andcarminati, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, mludevid, niwinanto and stephenneuendorffer as code owners March 11, 2026 01:17

Copilot AI review requested due to automatic review settings March 11, 2026 01:17

Copilot started reviewing on behalf of FIM43-Redeye March 11, 2026 01:20 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

martien-de-jong reviewed Mar 11, 2026

View reviewed changes

FIM43-Redeye and others added 4 commits April 7, 2026 14:36

Simplify lock IDs for minimal regression test

fdcab1f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Swap from getFixed to getScalable TypeSize to avoid confusing callers

c23b335

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[NFC] Fix clang-format violations in AIEBaseTargetTransformInfo.h

eeedf33

FIM43-Redeye force-pushed the fix/tti-vector-register-width branch from 1fb92ff to eeedf33 Compare April 7, 2026 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIE] Add TTI overrides to prevent sub-512-bit vector arithmetic#839

[AIE] Add TTI overrides to prevent sub-512-bit vector arithmetic#839
FIM43-Redeye wants to merge 4 commits intoXilinx:aie-publicfrom
FIM43-Redeye:fix/tti-vector-register-width

FIM43-Redeye commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

FIM43-Redeye Mar 11, 2026

Uh oh!

Uh oh!

martien-de-jong Mar 11, 2026

Uh oh!

FIM43-Redeye Mar 11, 2026

Uh oh!

martien-de-jong Mar 13, 2026

Uh oh!

FIM43-Redeye commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Target	Legal G_ADD/G_SUB vector types	Width
AIE1	None (scalar S32 only)	-
AIE2	V16S32, V32S16, V64S8	All 512-bit
AIE2P	V16S32, V32S16, V64S8	All 512-bit
AIE2PS	V16S32, V32S16, V64S8	All 512-bit

Conversation

FIM43-Redeye commented Mar 11, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

FIM43-Redeye Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martien-de-jong Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

FIM43-Redeye Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

martien-de-jong Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

FIM43-Redeye commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants