Add RVV support for sdpa operator#6557
Conversation
|
|
1461b7b to
78b9ae6
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6557 +/- ##
==========================================
+ Coverage 93.18% 93.42% +0.24%
==========================================
Files 832 764 -68
Lines 266714 257359 -9355
==========================================
- Hits 248545 240448 -8097
+ Misses 18169 16911 -1258 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds a RISC-V architecture-specific implementation of the Scaled Dot-Product Attention (SDPA) operator. The implementation follows the existing architectural pattern used in ncnn, where arch-specific SDPA layers delegate computational work to optimized Gemm and Softmax layers rather than implementing SIMD intrinsics directly in the SDPA layer itself. The claimed 5.9x speedup comes from leveraging existing RVV-optimized Gemm and Softmax implementations.
Changes:
- Adds SDPA_riscv class that extends the base SDPA layer
- Implements elempack > 1 handling not present in the x86 version
- Delegates QK^T and AttnV matrix multiplications to Gemm layers and softmax to Softmax layer
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/layer/riscv/sdpa_riscv.h | Header file defining the SDPA_riscv class with pipeline management and forward methods |
| src/layer/riscv/sdpa_riscv.cpp | Implementation that creates and manages Gemm/Softmax sub-layers, includes elempack fallback logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/layer/riscv/sdpa_riscv.cpp
Outdated
| #if __riscv_vector | ||
| #include <riscv_vector.h> | ||
| #endif | ||
|
|
There was a problem hiding this comment.
The RVV header is included but never used in this file. The actual RVV optimizations come from delegating to RVV-optimized Gemm and Softmax layers. Consider removing this unused include or adding a comment explaining that RVV acceleration is achieved through the underlying layers (Gemm, Softmax) rather than direct intrinsics in this file.
| #if __riscv_vector | |
| #include <riscv_vector.h> | |
| #endif |
src/layer/riscv/sdpa_riscv.cpp
Outdated
| packing->forward(bottom_blobs[i], unpacked_bottom_blobs[i], opt); | ||
| delete packing; |
There was a problem hiding this comment.
The return value from packing->forward is not checked. If the packing operation fails, the error will go undetected. Consider checking the return value and propagating the error appropriately.
src/layer/riscv/sdpa_riscv.cpp
Outdated
| packing->forward(unpacked_top_blobs[i], top_blobs[i], _opt); // use original allocator for output | ||
| delete packing; |
There was a problem hiding this comment.
The return value from packing->forward is not checked. If the repacking operation fails, the error will go undetected. Consider checking the return value and propagating the error appropriately.
| if (elempack > 1) | ||
| { | ||
| // fallback: unpack -> forward -> repack | ||
| Option opt = _opt; | ||
| opt.blob_allocator = _opt.workspace_allocator; | ||
|
|
||
| std::vector<Mat> unpacked_bottom_blobs(bottom_blobs.size()); | ||
| for (size_t i = 0; i < bottom_blobs.size(); i++) | ||
| { | ||
| if (bottom_blobs[i].empty()) continue; | ||
|
|
||
| if (bottom_blobs[i].elempack == 1) | ||
| { | ||
| unpacked_bottom_blobs[i] = bottom_blobs[i]; | ||
| } | ||
| else | ||
| { | ||
| ncnn::Layer* packing = ncnn::create_layer_cpu(ncnn::LayerType::Packing); | ||
| ncnn::ParamDict pd; | ||
| pd.set(0, 1); // out_elempack | ||
| packing->load_param(pd); | ||
| packing->forward(bottom_blobs[i], unpacked_bottom_blobs[i], opt); | ||
| delete packing; | ||
| } | ||
| } | ||
|
|
||
| std::vector<Mat> unpacked_top_blobs(top_blobs.size()); | ||
|
|
||
| // call forward with elempack=1 | ||
| int ret = forward(unpacked_bottom_blobs, unpacked_top_blobs, _opt); | ||
| if (ret != 0) return ret; | ||
|
|
||
| // repack outputs | ||
| for (size_t i = 0; i < top_blobs.size(); i++) | ||
| { | ||
| if (unpacked_top_blobs[i].empty()) continue; | ||
|
|
||
| ncnn::Layer* packing = ncnn::create_layer_cpu(ncnn::LayerType::Packing); | ||
| ncnn::ParamDict pd; | ||
| pd.set(0, elempack); // out_elempack | ||
| packing->load_param(pd); | ||
| packing->forward(unpacked_top_blobs[i], top_blobs[i], _opt); // use original allocator for output | ||
| delete packing; | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
dead code block, elempack will always be 1 if the layer does not support packing
There was a problem hiding this comment.
the code has been changed.
* drop out pad
Signed-off-by: ihb2032 <hebome@foxmail.com>
* Downgrade actions/cache from v5 to v4
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v5...v6) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…t bf16 storage, skip mha bf16 tests (Tencent#6623)
…/dropout/quantize/dequantize/bnll x86 support bf16 storage (Tencent#6624)
This PR implements the sdpa operator for the RISC-V backend using RISC-V Vector (RVV) intrinsics.
Performance:
The RVV implementation provides a up to 5.9x speedup compared to the existing C++ scalar implementation.
Performance Test Environment: BananaPi (VLEN=256bit)
Correctness: correct on BananaPi, MusePi and K230(VLEN=128bit).