innerproduct x86 support bf16 storage by nihui · Pull Request #6625 · Tencent/ncnn

nihui · 2026-03-31T10:44:34Z

No description provided.

tencent-adm · 2026-03-31T10:44:50Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull request overview

Adds bfloat16 (bf16) storage support for the x86 InnerProduct layer, including bf16 kernel/weight transforms and AVX512-BF16 runtime dispatch stubs.

Changes:

Add bf16 pipeline + forward entrypoints to InnerProduct_x86 and enable support_bf16_storage.
Introduce bf16 innerproduct and GEMM implementations (innerproduct_bf16s.h, innerproduct_gemm_bf16s.h).
Add AVX512-BF16 dispatch wrapper TU (innerproduct_x86_avx512bf16.cpp).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/layer/x86/innerproduct_x86.h	Declares bf16 pipeline/forward methods behind `NCNN_BF16`.
src/layer/x86/innerproduct_x86.cpp	Wires bf16 create/forward dispatch and adds bf16 implementations.
src/layer/x86/innerproduct_x86_avx512bf16.cpp	Adds AVX512-BF16 runtime dispatch wrapper symbols.
src/layer/x86/innerproduct_bf16s.h	Adds bf16-packed innerproduct implementation + weight transform.
src/layer/x86/innerproduct_gemm_bf16s.h	Adds bf16 GEMM innerproduct implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T11:00:46Z

src/layer/x86/innerproduct_x86.cpp

    if (opt.use_int8_inference && int8_scale_term)
    {
+#if NCNN_BF16
+        if (bottom_blob.elembits() == 16)
+        {
+            Mat bottom_blob_fp32;
+            cast_bfloat16_to_float32(bottom_blob, bottom_blob_fp32, opt);
+            return forward_int8_x86(bottom_blob_fp32, top_blob, opt);
+        }


In the int8 inference path, the bottom_blob.elembits() == 16 check is ambiguous (fp16 and bf16 are both 16-bit). When NCNN_BF16 is enabled this will cast any 16-bit tensor with cast_bfloat16_to_float32(), which will produce incorrect results if the input is fp16 (or any non-bf16 16-bit format).

Consider gating this cast on opt.use_bf16_storage (and/or the actual storage type discriminator used elsewhere in x86 code) so only bf16 inputs go through the bf16->fp32 conversion, and leave other 16-bit inputs to the existing path (or a proper fp16->fp32 cast if needed).

Copilot · 2026-03-31T11:00:47Z

src/layer/x86/innerproduct_x86.cpp

 #endif

+#if NCNN_BF16
+    if (opt.use_bf16_storage)


forward() dispatches to forward_bf16s() solely based on opt.use_bf16_storage, without confirming the input blob is actually bf16 (elembits()==16). Most other x86 bf16 paths gate on both opt.use_bf16_storage and bottom_blob.elembits()==16, which avoids accidentally interpreting fp32/fp16 data as bf16 if options are mixed or the input wasn’t cast as expected.

Please add an elembits()==16 guard here (and fall back to the fp32 path otherwise) to prevent misinterpreting the input storage type.

Suggested change

if (opt.use_bf16_storage)

if (opt.use_bf16_storage && bottom_blob.elembits() == 16)

codecov-commenter · 2026-03-31T11:04:52Z

Codecov Report

❌ Patch coverage is 99.12458% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.21%. Comparing base (371bbad) to head (8d6cf75).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/layer/x86/innerproduct_bf16s.h	98.60%	10 Missing ⚠️
src/layer/x86/innerproduct_x86.cpp	93.18%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6625      +/-   ##
==========================================
- Coverage   93.53%   93.21%   -0.32%     
==========================================
  Files         874      877       +3     
  Lines      281162   281758     +596     
==========================================
- Hits       262977   262648     -329     
- Misses      18185    19110     +925

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

innerproduct x86 support bf16 storage

f771090

github-actions bot added the x86 label Mar 31, 2026

nihui requested a review from Copilot March 31, 2026 10:54

Copilot AI reviewed Mar 31, 2026

View reviewed changes

c

8d6cf75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

innerproduct x86 support bf16 storage#6625

innerproduct x86 support bf16 storage#6625
nihui wants to merge 2 commits intoTencent:masterfrom
nihui:innerproduct-x86-bf16s

nihui commented Mar 31, 2026

Uh oh!

tencent-adm commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

codecov-commenter commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (opt.use_bf16_storage)
	if (opt.use_bf16_storage && bottom_blob.elembits() == 16)

Conversation

nihui commented Mar 31, 2026

Uh oh!

tencent-adm commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Mar 31, 2026 •

edited

Loading