Skip to content

Add RVV support for sdpa operator#6557

Open
chenglimin wants to merge 37 commits intoTencent:masterfrom
chenglimin:patch-sdpa
Open

Add RVV support for sdpa operator#6557
chenglimin wants to merge 37 commits intoTencent:masterfrom
chenglimin:patch-sdpa

Conversation

@chenglimin
Copy link
Copy Markdown
Contributor

This PR implements the sdpa operator for the RISC-V backend using RISC-V Vector (RVV) intrinsics.

Performance:
The RVV implementation provides a up to 5.9x speedup compared to the existing C++ scalar implementation.
Performance Test Environment: BananaPi (VLEN=256bit)
Correctness: correct on BananaPi, MusePi and K230(VLEN=128bit).

@github-actions github-actions bot added the riscv label Feb 26, 2026
@tencent-adm
Copy link
Copy Markdown
Member

tencent-adm commented Feb 26, 2026

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 7 committers have signed the CLA.

✅ ihb2032
✅ NKID00
✅ chenglimin
✅ futz12
✅ MouriNaruto
❌ nihui
❌ dependabot[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.42%. Comparing base (081f2b8) to head (ec9ef0e).
⚠️ Report is 28 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6557      +/-   ##
==========================================
+ Coverage   93.18%   93.42%   +0.24%     
==========================================
  Files         832      764      -68     
  Lines      266714   257359    -9355     
==========================================
- Hits       248545   240448    -8097     
+ Misses      18169    16911    -1258     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a RISC-V architecture-specific implementation of the Scaled Dot-Product Attention (SDPA) operator. The implementation follows the existing architectural pattern used in ncnn, where arch-specific SDPA layers delegate computational work to optimized Gemm and Softmax layers rather than implementing SIMD intrinsics directly in the SDPA layer itself. The claimed 5.9x speedup comes from leveraging existing RVV-optimized Gemm and Softmax implementations.

Changes:

  • Adds SDPA_riscv class that extends the base SDPA layer
  • Implements elempack > 1 handling not present in the x86 version
  • Delegates QK^T and AttnV matrix multiplications to Gemm layers and softmax to Softmax layer

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/layer/riscv/sdpa_riscv.h Header file defining the SDPA_riscv class with pipeline management and forward methods
src/layer/riscv/sdpa_riscv.cpp Implementation that creates and manages Gemm/Softmax sub-layers, includes elempack fallback logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +9
#if __riscv_vector
#include <riscv_vector.h>
#endif

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RVV header is included but never used in this file. The actual RVV optimizations come from delegating to RVV-optimized Gemm and Softmax layers. Consider removing this unused include or adding a comment explaining that RVV acceleration is achieved through the underlying layers (Gemm, Softmax) rather than direct intrinsics in this file.

Suggested change
#if __riscv_vector
#include <riscv_vector.h>
#endif

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +158
packing->forward(bottom_blobs[i], unpacked_bottom_blobs[i], opt);
delete packing;
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value from packing->forward is not checked. If the packing operation fails, the error will go undetected. Consider checking the return value and propagating the error appropriately.

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +178
packing->forward(unpacked_top_blobs[i], top_blobs[i], _opt); // use original allocator for output
delete packing;
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value from packing->forward is not checked. If the repacking operation fails, the error will go undetected. Consider checking the return value and propagating the error appropriately.

Copilot uses AI. Check for mistakes.
Comment on lines +136 to +182
if (elempack > 1)
{
// fallback: unpack -> forward -> repack
Option opt = _opt;
opt.blob_allocator = _opt.workspace_allocator;

std::vector<Mat> unpacked_bottom_blobs(bottom_blobs.size());
for (size_t i = 0; i < bottom_blobs.size(); i++)
{
if (bottom_blobs[i].empty()) continue;

if (bottom_blobs[i].elempack == 1)
{
unpacked_bottom_blobs[i] = bottom_blobs[i];
}
else
{
ncnn::Layer* packing = ncnn::create_layer_cpu(ncnn::LayerType::Packing);
ncnn::ParamDict pd;
pd.set(0, 1); // out_elempack
packing->load_param(pd);
packing->forward(bottom_blobs[i], unpacked_bottom_blobs[i], opt);
delete packing;
}
}

std::vector<Mat> unpacked_top_blobs(top_blobs.size());

// call forward with elempack=1
int ret = forward(unpacked_bottom_blobs, unpacked_top_blobs, _opt);
if (ret != 0) return ret;

// repack outputs
for (size_t i = 0; i < top_blobs.size(); i++)
{
if (unpacked_top_blobs[i].empty()) continue;

ncnn::Layer* packing = ncnn::create_layer_cpu(ncnn::LayerType::Packing);
ncnn::ParamDict pd;
pd.set(0, elempack); // out_elempack
packing->load_param(pd);
packing->forward(unpacked_top_blobs[i], top_blobs[i], _opt); // use original allocator for output
delete packing;
}

return 0;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code block, elempack will always be 1 if the layer does not support packing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code has been changed.

@chenglimin chenglimin requested a review from nihui March 3, 2026 11:11
@nihui nihui closed this Apr 1, 2026
@nihui nihui reopened this Apr 1, 2026
nihui and others added 16 commits April 1, 2026 15:54
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v5...v6)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…/dropout/quantize/dequantize/bnll x86 support bf16 storage (Tencent#6624)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants