rotaryembed/tanh/selu/mish/hardswish/hardsigmoid/gelu/erf/elu/eltwise/dropout/quantize/dequantize/bnll x86 support bf16 storage#6624
Conversation
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6624 +/- ##
==========================================
- Coverage 93.53% 93.17% -0.36%
==========================================
Files 874 902 +28
Lines 281162 281968 +806
==========================================
- Hits 262977 262720 -257
- Misses 18185 19248 +1063 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds x86-side bf16 storage support for the RotaryEmbed layer, including a bf16 implementation and an AVX512BF16 runtime-dispatched wrapper.
Changes:
- Enable
support_bf16_storageforRotaryEmbed_x86and route bf16 inputs to a dedicatedforward_bf16s()path. - Introduce
rotaryembed_bf16s.himplementing the bf16 rotary-embedding kernel (with SIMD paths and optional AVX512BF16 runtime dispatch). - Add
rotaryembed_x86_avx512bf16.cppwrapper entrypoint for the AVX512BF16-dispatched bf16 kernel.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/layer/x86/rotaryembed_x86.h | Declares bf16 forward helper behind NCNN_BF16. |
| src/layer/x86/rotaryembed_x86.cpp | Enables bf16 storage and dispatches to bf16 kernel when applicable. |
| src/layer/x86/rotaryembed_bf16s.h | Adds bf16 kernel implementation + runtime AVX512BF16 dispatch hook. |
| src/layer/x86/rotaryembed_x86_avx512bf16.cpp | Adds AVX512BF16 wrapper function for the runtime-dispatched bf16 path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 56 out of 56 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…/dropout/quantize/dequantize/bnll x86 support bf16 storage (Tencent#6624)
No description provided.