Add support for QNN stateful models#2012
Conversation
9d44197 to
ff71f6a
Compare
|
@qti-ashimaj Thanks for submitting this pull-request. Could you please add more details in the description to explain why this changes are necessary and what value they bring? |
ff71f6a to
4796bd8
Compare
There was a problem hiding this comment.
Pull request overview
Adds detection and handling for QNN “GenAI” stateful models so KV cache can be managed internally by the backend (avoiding exposed past/present tensors and CPU fallbacks).
Changes:
- Introduces
IsQNNStatefulModel(const Model&)helper to detect QNN stateful models via provider options. - Updates KV cache creation logic to use
ModelManagedKeyValueCachefor QNN stateful models (in addition to OpenVINO).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/qnn/interface.h | Declares IsQNNStatefulModel for QNN stateful model detection. |
| src/qnn/interface.cpp | Implements provider-option based detection for QNN stateful models. |
| src/models/kv_cache.cpp | Creates ModelManagedKeyValueCache when QNN stateful models are detected. |
| if (model.p_device_->GetType() == DeviceType::QNN || model.p_device_->GetType() == DeviceType::CPU) { | ||
| const auto& provider_options = model.config_->model.decoder.session_options.provider_options; | ||
| for (const auto& po : provider_options) { | ||
| if (po.name == "QNN") { | ||
| for (const auto& option : po.options) { | ||
| // For QNN, if session option 'genai_model' is set, the session will encapsulate | ||
| // a stateful model, so KVCache will be managed internally. | ||
| if (option.first == "genai_model" && option.second == "True") { | ||
| return true; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The device-type gate contradicts the comment that the model is stateful ‘regardless of device type’ when genai_model=True. As written, the function will return false for any non-CPU/non-QNN DeviceType, even if the QNN provider options indicate a stateful GenAI model. Consider removing the device-type check (or basing the decision solely on presence of the QNN provider + genai_model flag) so the detection matches the intended behavior.
There was a problem hiding this comment.
This check is only relevant for QNN models, hence it should rightly return false for non-CPU/non-QNN device type. The check is valid.
src/qnn/interface.cpp
Outdated
| for (const auto& option : po.options) { | ||
| // For QNN, if session option 'genai_model' is set, the session will encapsulate | ||
| // a stateful model, so KVCache will be managed internally. | ||
| if (option.first == "genai_model" && option.second == "True") { |
There was a problem hiding this comment.
Provider options are commonly encoded as lowercase (\"true\"), numeric (\"1\"), or otherwise case-insensitive values. Comparing strictly to \"True\" is brittle and can cause false negatives. Prefer normalizing the value (e.g., case-insensitive compare) and/or accepting common truthy encodings (true/1/yes) to make detection robust.
There was a problem hiding this comment.
Changed it to first convert the value to lower case and then compare with \"true\"
|
|
||
| DeviceInterface* GetQNNInterface(); | ||
|
|
||
| struct Model; |
There was a problem hiding this comment.
Please ensure the forward declaration uses the same class-key as the actual Model declaration (i.e., class Model; vs struct Model;). If Model is defined with a different class-key in model.h, some toolchains can treat this as an incompatible redeclaration and fail to compile. Align the forward declaration with model.h.
| struct Model; | |
| class Model; |
There was a problem hiding this comment.
It is declared as struct Model in model.h
|
@qti-ashimaj could you please address copilot's feedback? |
4dd590d to
7911437
Compare
This PR enhances support for QNN stateful models that execute entirely on Qualcomm's Neural Processing Unit (NPU), including attention operations and internal KV cache management. This significant improvement eliminates previous limitations where certain operations would fall back to CPU execution, resulting in performance bottlenecks.
The implementation identifies QNN stateful models through the
genie_model=Trueprovider option. For these models: