Skip to content

[AIE] Add TTI overrides to prevent sub-512-bit vector arithmetic#839

Open
FIM43-Redeye wants to merge 4 commits intoXilinx:aie-publicfrom
FIM43-Redeye:fix/tti-vector-register-width
Open

[AIE] Add TTI overrides to prevent sub-512-bit vector arithmetic#839
FIM43-Redeye wants to merge 4 commits intoXilinx:aie-publicfrom
FIM43-Redeye:fix/tti-vector-register-width

Conversation

@FIM43-Redeye
Copy link
Copy Markdown
Contributor

Summary

Fixes #480. The GlobalISel Legalizer only has rules for 512-bit vector
arithmetic (V64S8, V32S16, V16S32), but middle-end vectorizers create
sub-512-bit vector ops that crash during legalization at -O2:

fatal error: unable to legalize instruction:
  %14:_(<64 x s8>) = G_SHL %4:_, %12:_(<64 x s8>)

Three TTI overrides in AIEBaseTargetTransformInfo.h prevent this:

  • getRegisterBitWidth(RGK_FixedWidthVector) = 512 -- stops the loop
    vectorizer from choosing sub-512-bit VF (e.g., VF=4 for i8)
  • getMinVectorRegisterBitWidth() = 512 -- prevents SLP/VectorCombine
    from creating sub-512-bit vectors
  • getArithmeticInstrCost() returns Invalid for sub-512-bit vectors --
    stops SLP's cost model from treating e.g. <8 x i8> add as profitable

Test plan

  • All 1824 AIE CodeGen lit tests pass (no regressions)
  • New regression test covers all three vectorizer paths:
    • Large trip count (LoopVectorize creates <64 x i8> not <4 x i8>)
    • Small trip count after unrolling (SLP does not create <8 x i8>)
    • i16 variant (LoopVectorize creates <32 x i16> not <2 x i16>)
  • MWE from error in backend: unable to legalize instruction #480 compiles cleanly at -O2

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses llvm-aie#480 by updating AIE TargetTransformInfo (TTI) to steer LLVM’s middle-end vectorizers away from producing sub-512-bit vector arithmetic ops that AIE2’s GlobalISel Legalizer cannot legalize, avoiding backend crashes at -O2.

Changes:

  • Add AIE TTI overrides to force fixed-width vector register width/min vector register bit width to 512 bits.
  • Make arithmetic cost modeling return “Invalid” for sub-512-bit fixed vectors to discourage SLP from forming illegal vector ops.
  • Add a new opt-level regression test to ensure vectorizers don’t form sub-512-bit vector adds on AIE2 and that the full pipeline compiles.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
llvm/lib/Target/AIE/AIEBaseTargetTransformInfo.h Adds TTI overrides to bias vectorization toward 512-bit vectors and block sub-512-bit arithmetic vectorization.
llvm/test/CodeGen/AIE/opt/no-subregister-vectorize-arithmetic.ll New regression test covering loop vectorization and SLP/unroll scenarios to ensure no sub-512-bit vector arithmetic is formed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread llvm/lib/Target/AIE/AIEBaseTargetTransformInfo.h Outdated
Comment on lines +126 to +129
if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
if (VTy->getPrimitiveSizeInBits() < 512)
return InstructionCost::getInvalid();
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getArithmeticInstrCost only returns Invalid for fixed vectors with total width < 512. But AIE2 GlobalISel legalization for G_ADD/G_SUB is limited to specific 512-bit element types (<64 x i8>, <32 x i16>, <16 x i32>); other 512-bit vectors (e.g. <8 x i64>) are still illegal and can trigger the same legalization failure. Consider returning Invalid unless the fixed vector type is one of the actually-legal arithmetic vector types, not just based on total bit width.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- I checked all four AIE legalizer implementations to make sure the bit-width threshold is correct across the board:

Target Legal G_ADD/G_SUB vector types Width
AIE1 None (scalar S32 only) -
AIE2 V16S32, V32S16, V64S8 All 512-bit
AIE2P V16S32, V32S16, V64S8 All 512-bit
AIE2PS V16S32, V32S16, V64S8 All 512-bit

AIE2P and AIE2PS even have the comment "AIE ISA supports only 512-bit vector add/sub/xor" in their legalizer source. So the >= 512 check is correct for every current target -- no AIE variant legalizes <8 x i64> or any other non-standard 512-bit type.

Querying LegalizerInfo directly from TTI would be the ideal solution, but it is a cross-layer violation (TTI operates on LLVM IR types during mid-level optimization; LegalizerInfo uses LLT machine-level types and is only initialized when GlobalISel is active). No upstream LLVM target queries LegalizerInfo from TTI -- they all duplicate the type knowledge.

Happy to add an inline comment referencing AIE2LegalizerInfo.cpp (and the other three) as the source of truth, so future maintainers know where to look if a new target adds different legal widths.

Comment thread llvm/test/CodeGen/AIE/opt/no-subregister-vectorize-arithmetic.ll Outdated
call void @llvm.aie2.release(i32 48, i32 1)
call void @llvm.aie2.release(i32 51, i32 1)
call void @llvm.aie2.release(i32 49, i32 1)
call void @llvm.aie2.release(i32 50, i32 1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a simplification?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original lock IDs (acq 49/50 and rel 48/51) were simulating a double-buffer producer/consumer pattern, but the test only uses the lock intrinsics as optimization barriers to make the scheduling fences that give SLP something to vectorize. Actual lock semantics should be irrelevant to what teh test validates.

Copilot flagged the mismatch and suggested matching the release and acquire IDs for readability, and since the test passes either way and the lock behavior isn't under test, it seemed reasonable. Simplification was the wrong word, readability would have been more accurate.

I can revert this if the original IDs should be preserved.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on my behalf. I just didn't see a connection between lock IDs and vectorization.

FIM43-Redeye and others added 4 commits April 7, 2026 14:36
…inx#480)

The GlobalISel Legalizer only has rules for 512-bit vector arithmetic
(V64S8, V32S16, V16S32). Without TTI guidance, middle-end vectorizers
create sub-512-bit vector ops that crash during legalization:

  fatal error: unable to legalize instruction:
    %14:_(<64 x s8>) = G_SHL %4:_, %12:_(<64 x s8>)

Three TTI overrides in AIEBaseTargetTransformInfo prevent this:

- getRegisterBitWidth(RGK_FixedWidthVector) = 512: stops the loop
  vectorizer from choosing sub-512-bit VF (e.g., VF=4 for i8).
- getMinVectorRegisterBitWidth() = 512: prevents SLP/VectorCombine
  from creating sub-512-bit vectors.
- getArithmeticInstrCost() returns Invalid for sub-512-bit vectors:
  stops SLP's cost model from treating e.g. <8 x i8> add as
  profitable (vector cost 1 vs scalar cost 8).

Fixes Xilinx#480.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@FIM43-Redeye FIM43-Redeye force-pushed the fix/tti-vector-register-width branch from 1fb92ff to eeedf33 Compare April 7, 2026 18:36
@FIM43-Redeye
Copy link
Copy Markdown
Contributor Author

Rebased onto latest aie-public (clean, no conflicts).

Just checking in -- the technical discussion from March seems resolved (martien-de-jong's question about lock IDs was answered). This fix prevents real crashes at -O2 when the loop vectorizer creates sub-512-bit vectors that GlobalISel can't legalize. We've been carrying this patch locally and it's been solid.

Happy to make any changes if there's feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

error in backend: unable to legalize instruction

3 participants