Skip to content

gemm x86 support out_elemtype, multiheadattention and sdpa x86 support bf16 storage, skip mha bf16 tests#6623

Open
nihui wants to merge 18 commits intoTencent:masterfrom
nihui:sdpa-x86-bf16s
Open

gemm x86 support out_elemtype, multiheadattention and sdpa x86 support bf16 storage, skip mha bf16 tests#6623
nihui wants to merge 18 commits intoTencent:masterfrom
nihui:sdpa-x86-bf16s

Conversation

@nihui
Copy link
Copy Markdown
Member

@nihui nihui commented Mar 30, 2026

No description provided.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.10%. Comparing base (18a7ad1) to head (c87a5e1).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6623      +/-   ##
==========================================
+ Coverage   93.45%   94.10%   +0.65%     
==========================================
  Files         874      667     -207     
  Lines      280098   238244   -41854     
==========================================
- Hits       261758   224199   -37559     
+ Misses      18340    14045    -4295     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tencent-adm
Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends x86 compute paths to better support bf16 storage and Gemm output element type selection, and updates the test suite accordingly (including temporarily skipping MultiHeadAttention bf16 variants).

Changes:

  • Add output_elemtype handling to the x86 bf16 Gemm implementation so bf16 inputs can produce fp32 outputs.
  • Enable bf16 storage support flags for x86 MultiHeadAttention and SDPA, adjusting internal execution to accommodate bf16 storage.
  • Add a new Gemm test (test_gemm_5.cpp) and update test utilities to skip MultiHeadAttention bf16 testing.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/testutil.cpp Skips MultiHeadAttention bf16 tests; adds missing delete op on Vulkan skip paths (but early-return cleanup still incomplete).
tests/test_gemm_5.cpp New Gemm test covering output_elemtype=fp32 across shapes/transposes.
src/layer/x86/sdpa_x86.cpp Enables bf16 storage and updates intermediate/output allocations and memcpy sizes to respect bf16 elemsize.
src/layer/x86/multiheadattention_x86.cpp Enables bf16 storage; forces certain sublayers to fp32 and adds a bf16→fp32 cast for V before qkv gemm.
src/layer/x86/gemm_x86.cpp Threads output_elemtype through bf16 Gemm path and allocates/stores fp32 when requested.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants