Conversation
|
|
There was a problem hiding this comment.
Pull request overview
Adds bfloat16 (bf16) storage support for the x86 InnerProduct layer, including bf16 kernel/weight transforms and AVX512-BF16 runtime dispatch stubs.
Changes:
- Add bf16 pipeline + forward entrypoints to
InnerProduct_x86and enablesupport_bf16_storage. - Introduce bf16 innerproduct and GEMM implementations (
innerproduct_bf16s.h,innerproduct_gemm_bf16s.h). - Add AVX512-BF16 dispatch wrapper TU (
innerproduct_x86_avx512bf16.cpp).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/layer/x86/innerproduct_x86.h | Declares bf16 pipeline/forward methods behind NCNN_BF16. |
| src/layer/x86/innerproduct_x86.cpp | Wires bf16 create/forward dispatch and adds bf16 implementations. |
| src/layer/x86/innerproduct_x86_avx512bf16.cpp | Adds AVX512-BF16 runtime dispatch wrapper symbols. |
| src/layer/x86/innerproduct_bf16s.h | Adds bf16-packed innerproduct implementation + weight transform. |
| src/layer/x86/innerproduct_gemm_bf16s.h | Adds bf16 GEMM innerproduct implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (opt.use_int8_inference && int8_scale_term) | ||
| { | ||
| #if NCNN_BF16 | ||
| if (bottom_blob.elembits() == 16) | ||
| { | ||
| Mat bottom_blob_fp32; | ||
| cast_bfloat16_to_float32(bottom_blob, bottom_blob_fp32, opt); | ||
| return forward_int8_x86(bottom_blob_fp32, top_blob, opt); | ||
| } |
There was a problem hiding this comment.
In the int8 inference path, the bottom_blob.elembits() == 16 check is ambiguous (fp16 and bf16 are both 16-bit). When NCNN_BF16 is enabled this will cast any 16-bit tensor with cast_bfloat16_to_float32(), which will produce incorrect results if the input is fp16 (or any non-bf16 16-bit format).
Consider gating this cast on opt.use_bf16_storage (and/or the actual storage type discriminator used elsewhere in x86 code) so only bf16 inputs go through the bf16->fp32 conversion, and leave other 16-bit inputs to the existing path (or a proper fp16->fp32 cast if needed).
src/layer/x86/innerproduct_x86.cpp
Outdated
| #endif | ||
|
|
||
| #if NCNN_BF16 | ||
| if (opt.use_bf16_storage) |
There was a problem hiding this comment.
forward() dispatches to forward_bf16s() solely based on opt.use_bf16_storage, without confirming the input blob is actually bf16 (elembits()==16). Most other x86 bf16 paths gate on both opt.use_bf16_storage and bottom_blob.elembits()==16, which avoids accidentally interpreting fp32/fp16 data as bf16 if options are mixed or the input wasn’t cast as expected.
Please add an elembits()==16 guard here (and fall back to the fp32 path otherwise) to prevent misinterpreting the input storage type.
| if (opt.use_bf16_storage) | |
| if (opt.use_bf16_storage && bottom_blob.elembits() == 16) |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6625 +/- ##
==========================================
- Coverage 93.53% 93.21% -0.32%
==========================================
Files 874 877 +3
Lines 281162 281758 +596
==========================================
- Hits 262977 262648 -329
- Misses 18185 19110 +925 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
No description provided.