Add support for QNN stateful models by qti-ashimaj · Pull Request #2012 · microsoft/onnxruntime-genai

qti-ashimaj · 2026-03-06T09:51:38Z

This PR enhances support for QNN stateful models that execute entirely on Qualcomm's Neural Processing Unit (NPU), including attention operations and internal KV cache management. This significant improvement eliminates previous limitations where certain operations would fall back to CPU execution, resulting in performance bottlenecks.

The implementation identifies QNN stateful models through the genie_model=True provider option. For these models:

The ONNX Runtime GenAI framework delegates KV cache management entirely to the QNN backend
The model architecture is optimized to return logits only for the last token in the sequence

qti-ashimaj · 2026-03-13T06:40:58Z

@gnedanur @gtonpe
@jambayk Please review!

src/models/logits.cpp

src/qnn/interface.cpp

baijumeswani · 2026-03-18T01:02:59Z

@qti-ashimaj Thanks for submitting this pull-request. Could you please add more details in the description to explain why this changes are necessary and what value they bring?

Copilot

Pull request overview

Adds detection and handling for QNN “GenAI” stateful models so KV cache can be managed internally by the backend (avoiding exposed past/present tensors and CPU fallbacks).

Changes:

Introduces IsQNNStatefulModel(const Model&) helper to detect QNN stateful models via provider options.
Updates KV cache creation logic to use ModelManagedKeyValueCache for QNN stateful models (in addition to OpenVINO).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
src/qnn/interface.h	Declares `IsQNNStatefulModel` for QNN stateful model detection.
src/qnn/interface.cpp	Implements provider-option based detection for QNN stateful models.
src/models/kv_cache.cpp	Creates `ModelManagedKeyValueCache` when QNN stateful models are detected.

Copilot · 2026-03-20T05:29:34Z

src/qnn/interface.cpp

+  if (model.p_device_->GetType() == DeviceType::QNN || model.p_device_->GetType() == DeviceType::CPU) {
+    const auto& provider_options = model.config_->model.decoder.session_options.provider_options;
+    for (const auto& po : provider_options) {
+      if (po.name == "QNN") {
+        for (const auto& option : po.options) {
+          // For QNN, if session option 'genai_model' is set, the session will encapsulate
+          // a stateful model, so KVCache will be managed internally.
+          if (option.first == "genai_model" && option.second == "True") {
+            return true;
+          }
+        }
+      }
+    }
+  }


The device-type gate contradicts the comment that the model is stateful ‘regardless of device type’ when genai_model=True. As written, the function will return false for any non-CPU/non-QNN DeviceType, even if the QNN provider options indicate a stateful GenAI model. Consider removing the device-type check (or basing the decision solely on presence of the QNN provider + genai_model flag) so the detection matches the intended behavior.

This check is only relevant for QNN models, hence it should rightly return false for non-CPU/non-QNN device type. The check is valid.

Copilot · 2026-03-20T05:29:35Z

src/qnn/interface.cpp

+        for (const auto& option : po.options) {
+          // For QNN, if session option 'genai_model' is set, the session will encapsulate
+          // a stateful model, so KVCache will be managed internally.
+          if (option.first == "genai_model" && option.second == "True") {


Provider options are commonly encoded as lowercase (\"true\"), numeric (\"1\"), or otherwise case-insensitive values. Comparing strictly to \"True\" is brittle and can cause false negatives. Prefer normalizing the value (e.g., case-insensitive compare) and/or accepting common truthy encodings (true/1/yes) to make detection robust.

Changed it to first convert the value to lower case and then compare with \"true\"

Copilot · 2026-03-20T05:29:35Z

src/qnn/interface.h


 DeviceInterface* GetQNNInterface();

+struct Model;


Please ensure the forward declaration uses the same class-key as the actual Model declaration (i.e., class Model; vs struct Model;). If Model is defined with a different class-key in model.h, some toolchains can treat this as an incompatible redeclaration and fail to compile. Align the forward declaration with model.h.

Suggested change

struct Model;

class Model;

It is declared as struct Model in model.h

baijumeswani · 2026-03-26T00:57:07Z

@qti-ashimaj could you please address copilot's feedback?

qti-ashimaj force-pushed the dev/ashimaj/qnn-stateful branch from 9d44197 to ff71f6a Compare March 13, 2026 06:36

qti-ashimaj marked this pull request as ready for review March 13, 2026 06:36

baijumeswani reviewed Mar 17, 2026

View reviewed changes

src/models/logits.cpp Outdated Show resolved Hide resolved

baijumeswani reviewed Mar 18, 2026

View reviewed changes

src/qnn/interface.cpp Show resolved Hide resolved

baijumeswani reviewed Mar 18, 2026

View reviewed changes

src/qnn/interface.cpp Outdated Show resolved Hide resolved

baijumeswani reviewed Mar 18, 2026

View reviewed changes

src/qnn/interface.cpp Show resolved Hide resolved

baijumeswani reviewed Mar 18, 2026

View reviewed changes

src/qnn/interface.cpp Outdated Show resolved Hide resolved

qti-ashimaj force-pushed the dev/ashimaj/qnn-stateful branch from ff71f6a to 4796bd8 Compare March 18, 2026 06:57

baijumeswani requested a review from Copilot March 20, 2026 05:27

Copilot started reviewing on behalf of baijumeswani March 20, 2026 05:28 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

qti-ashimaj added 4 commits March 26, 2026 12:08

add support for QNN stateful models

99766fd

use existing api for logits trimming

75001ae

minor formatting

bdcff15

update flag name

7911437

qti-ashimaj force-pushed the dev/ashimaj/qnn-stateful branch from 4dd590d to 7911437 Compare March 26, 2026 06:39

fix lint error

748c496

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for QNN stateful models#2012

Add support for QNN stateful models#2012
qti-ashimaj wants to merge 5 commits intomicrosoft:mainfrom
CodeLinaro:dev/ashimaj/qnn-stateful

qti-ashimaj commented Mar 6, 2026 •

edited

Loading

Uh oh!

qti-ashimaj commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baijumeswani commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

qti-ashimaj Mar 26, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

qti-ashimaj Mar 26, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

qti-ashimaj Mar 26, 2026

Uh oh!

baijumeswani commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qti-ashimaj commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qti-ashimaj commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baijumeswani commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qti-ashimaj Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qti-ashimaj Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

qti-ashimaj Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

baijumeswani commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qti-ashimaj commented Mar 6, 2026 •

edited

Loading