Skip to content

innerproduct x86 support bf16 storage#6625

Open
nihui wants to merge 2 commits intoTencent:masterfrom
nihui:innerproduct-x86-bf16s
Open

innerproduct x86 support bf16 storage#6625
nihui wants to merge 2 commits intoTencent:masterfrom
nihui:innerproduct-x86-bf16s

Conversation

@nihui
Copy link
Copy Markdown
Member

@nihui nihui commented Mar 31, 2026

No description provided.

@github-actions github-actions bot added the x86 label Mar 31, 2026
@tencent-adm
Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@nihui nihui requested a review from Copilot March 31, 2026 10:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds bfloat16 (bf16) storage support for the x86 InnerProduct layer, including bf16 kernel/weight transforms and AVX512-BF16 runtime dispatch stubs.

Changes:

  • Add bf16 pipeline + forward entrypoints to InnerProduct_x86 and enable support_bf16_storage.
  • Introduce bf16 innerproduct and GEMM implementations (innerproduct_bf16s.h, innerproduct_gemm_bf16s.h).
  • Add AVX512-BF16 dispatch wrapper TU (innerproduct_x86_avx512bf16.cpp).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/layer/x86/innerproduct_x86.h Declares bf16 pipeline/forward methods behind NCNN_BF16.
src/layer/x86/innerproduct_x86.cpp Wires bf16 create/forward dispatch and adds bf16 implementations.
src/layer/x86/innerproduct_x86_avx512bf16.cpp Adds AVX512-BF16 runtime dispatch wrapper symbols.
src/layer/x86/innerproduct_bf16s.h Adds bf16-packed innerproduct implementation + weight transform.
src/layer/x86/innerproduct_gemm_bf16s.h Adds bf16 GEMM innerproduct implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 109 to +117
if (opt.use_int8_inference && int8_scale_term)
{
#if NCNN_BF16
if (bottom_blob.elembits() == 16)
{
Mat bottom_blob_fp32;
cast_bfloat16_to_float32(bottom_blob, bottom_blob_fp32, opt);
return forward_int8_x86(bottom_blob_fp32, top_blob, opt);
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the int8 inference path, the bottom_blob.elembits() == 16 check is ambiguous (fp16 and bf16 are both 16-bit). When NCNN_BF16 is enabled this will cast any 16-bit tensor with cast_bfloat16_to_float32(), which will produce incorrect results if the input is fp16 (or any non-bf16 16-bit format).

Consider gating this cast on opt.use_bf16_storage (and/or the actual storage type discriminator used elsewhere in x86 code) so only bf16 inputs go through the bf16->fp32 conversion, and leave other 16-bit inputs to the existing path (or a proper fp16->fp32 cast if needed).

Copilot uses AI. Check for mistakes.
#endif

#if NCNN_BF16
if (opt.use_bf16_storage)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forward() dispatches to forward_bf16s() solely based on opt.use_bf16_storage, without confirming the input blob is actually bf16 (elembits()==16). Most other x86 bf16 paths gate on both opt.use_bf16_storage and bottom_blob.elembits()==16, which avoids accidentally interpreting fp32/fp16 data as bf16 if options are mixed or the input wasn’t cast as expected.

Please add an elembits()==16 guard here (and fall back to the fp32 path otherwise) to prevent misinterpreting the input storage type.

Suggested change
if (opt.use_bf16_storage)
if (opt.use_bf16_storage && bottom_blob.elembits() == 16)

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 99.12458% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.21%. Comparing base (371bbad) to head (8d6cf75).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/x86/innerproduct_bf16s.h 98.60% 10 Missing ⚠️
src/layer/x86/innerproduct_x86.cpp 93.18% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6625      +/-   ##
==========================================
- Coverage   93.53%   93.21%   -0.32%     
==========================================
  Files         874      877       +3     
  Lines      281162   281758     +596     
==========================================
- Hits       262977   262648     -329     
- Misses      18185    19110     +925     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants