Skip to content

Add support for QNN stateful models#2012

Open
qti-ashimaj wants to merge 5 commits intomicrosoft:mainfrom
CodeLinaro:dev/ashimaj/qnn-stateful
Open

Add support for QNN stateful models#2012
qti-ashimaj wants to merge 5 commits intomicrosoft:mainfrom
CodeLinaro:dev/ashimaj/qnn-stateful

Conversation

@qti-ashimaj
Copy link
Contributor

@qti-ashimaj qti-ashimaj commented Mar 6, 2026

This PR enhances support for QNN stateful models that execute entirely on Qualcomm's Neural Processing Unit (NPU), including attention operations and internal KV cache management. This significant improvement eliminates previous limitations where certain operations would fall back to CPU execution, resulting in performance bottlenecks.

The implementation identifies QNN stateful models through the genie_model=True provider option. For these models:

  • The ONNX Runtime GenAI framework delegates KV cache management entirely to the QNN backend
  • The model architecture is optimized to return logits only for the last token in the sequence

@qti-ashimaj qti-ashimaj force-pushed the dev/ashimaj/qnn-stateful branch from 9d44197 to ff71f6a Compare March 13, 2026 06:36
@qti-ashimaj qti-ashimaj marked this pull request as ready for review March 13, 2026 06:36
@qti-ashimaj
Copy link
Contributor Author

@gnedanur @gtonpe
@jambayk Please review!

@baijumeswani
Copy link
Collaborator

@qti-ashimaj Thanks for submitting this pull-request. Could you please add more details in the description to explain why this changes are necessary and what value they bring?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds detection and handling for QNN “GenAI” stateful models so KV cache can be managed internally by the backend (avoiding exposed past/present tensors and CPU fallbacks).

Changes:

  • Introduces IsQNNStatefulModel(const Model&) helper to detect QNN stateful models via provider options.
  • Updates KV cache creation logic to use ModelManagedKeyValueCache for QNN stateful models (in addition to OpenVINO).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/qnn/interface.h Declares IsQNNStatefulModel for QNN stateful model detection.
src/qnn/interface.cpp Implements provider-option based detection for QNN stateful models.
src/models/kv_cache.cpp Creates ModelManagedKeyValueCache when QNN stateful models are detected.

Comment on lines +88 to +101
if (model.p_device_->GetType() == DeviceType::QNN || model.p_device_->GetType() == DeviceType::CPU) {
const auto& provider_options = model.config_->model.decoder.session_options.provider_options;
for (const auto& po : provider_options) {
if (po.name == "QNN") {
for (const auto& option : po.options) {
// For QNN, if session option 'genai_model' is set, the session will encapsulate
// a stateful model, so KVCache will be managed internally.
if (option.first == "genai_model" && option.second == "True") {
return true;
}
}
}
}
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device-type gate contradicts the comment that the model is stateful ‘regardless of device type’ when genai_model=True. As written, the function will return false for any non-CPU/non-QNN DeviceType, even if the QNN provider options indicate a stateful GenAI model. Consider removing the device-type check (or basing the decision solely on presence of the QNN provider + genai_model flag) so the detection matches the intended behavior.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is only relevant for QNN models, hence it should rightly return false for non-CPU/non-QNN device type. The check is valid.

for (const auto& option : po.options) {
// For QNN, if session option 'genai_model' is set, the session will encapsulate
// a stateful model, so KVCache will be managed internally.
if (option.first == "genai_model" && option.second == "True") {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provider options are commonly encoded as lowercase (\"true\"), numeric (\"1\"), or otherwise case-insensitive values. Comparing strictly to \"True\" is brittle and can cause false negatives. Prefer normalizing the value (e.g., case-insensitive compare) and/or accepting common truthy encodings (true/1/yes) to make detection robust.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to first convert the value to lower case and then compare with \"true\"


DeviceInterface* GetQNNInterface();

struct Model;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure the forward declaration uses the same class-key as the actual Model declaration (i.e., class Model; vs struct Model;). If Model is defined with a different class-key in model.h, some toolchains can treat this as an incompatible redeclaration and fail to compile. Align the forward declaration with model.h.

Suggested change
struct Model;
class Model;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is declared as struct Model in model.h

@baijumeswani
Copy link
Collaborator

@qti-ashimaj could you please address copilot's feedback?

@qti-ashimaj qti-ashimaj force-pushed the dev/ashimaj/qnn-stateful branch from 4dd590d to 7911437 Compare March 26, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants