gemm x86 support out_elemtype, multiheadattention and sdpa x86 support bf16 storage, skip mha bf16 tests#6623
gemm x86 support out_elemtype, multiheadattention and sdpa x86 support bf16 storage, skip mha bf16 tests#6623nihui wants to merge 18 commits intoTencent:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6623 +/- ##
==========================================
+ Coverage 93.45% 94.10% +0.65%
==========================================
Files 874 667 -207
Lines 280098 238244 -41854
==========================================
- Hits 261758 224199 -37559
+ Misses 18340 14045 -4295 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
There was a problem hiding this comment.
Pull request overview
This PR extends x86 compute paths to better support bf16 storage and Gemm output element type selection, and updates the test suite accordingly (including temporarily skipping MultiHeadAttention bf16 variants).
Changes:
- Add
output_elemtypehandling to the x86 bf16 Gemm implementation so bf16 inputs can produce fp32 outputs. - Enable bf16 storage support flags for x86 MultiHeadAttention and SDPA, adjusting internal execution to accommodate bf16 storage.
- Add a new Gemm test (
test_gemm_5.cpp) and update test utilities to skip MultiHeadAttention bf16 testing.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testutil.cpp | Skips MultiHeadAttention bf16 tests; adds missing delete op on Vulkan skip paths (but early-return cleanup still incomplete). |
| tests/test_gemm_5.cpp | New Gemm test covering output_elemtype=fp32 across shapes/transposes. |
| src/layer/x86/sdpa_x86.cpp | Enables bf16 storage and updates intermediate/output allocations and memcpy sizes to respect bf16 elemsize. |
| src/layer/x86/multiheadattention_x86.cpp | Enables bf16 storage; forces certain sublayers to fp32 and adds a bf16→fp32 cast for V before qkv gemm. |
| src/layer/x86/gemm_x86.cpp | Threads output_elemtype through bf16 Gemm path and allocates/stores fp32 when requested. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.