Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions src/models/kv_cache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "kv_cache.h"
#include "windowed_kv_cache.h"
#include "../openvino/interface.h"
#include "../qnn/interface.h"
#include <algorithm>

namespace Generators {
Expand Down Expand Up @@ -562,10 +563,10 @@ bool IsCacheNeeded(const Model& model) {
} // namespace

std::unique_ptr<KeyValueCache> CreateKeyValueCache(State& state) {
// For OpenVINO Stateful models, they do not contain exposed past/present KV tensors.
// For OpenVINO and QNN Stateful models, they do not contain exposed past/present KV tensors.
// In this case, 'IsCacheNeeded' below will return false. But in this case we need to create a
// special 'ModelManagedKeyValueCache' object, and so we check this condition first.
if (IsOpenVINOStatefulModel(state.model_)) {
if (IsOpenVINOStatefulModel(state.model_) || IsQNNStatefulModel(state.model_)) {
if (g_log.enabled)
Log("info", "CreateKeyValueCache: Creating ModelManagedKeyValueCache");
return std::make_unique<ModelManagedKeyValueCache>(state);
Expand Down
28 changes: 28 additions & 0 deletions src/qnn/interface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

#include "../generators.h"
#include "../search.h"
#include "../models/model.h"
#include "interface.h"

namespace Generators {
Expand Down Expand Up @@ -78,4 +79,31 @@ DeviceInterface* GetQNNInterface() {
return g_device.get();
}

bool IsQNNStatefulModel(const Model& model) {
// Check for both QNN and CPU device types
// When using QNN EP with genai_model=True, the model is stateful regardless of device type (QNN/CPU)
// For QNN models with enable_htp_shared_memory_allocator=1, p_device_ will be QNN type
// For QNN models without shared memory allocator, p_device_ will be CPU type
// Both cases need to be handled the same way for stateful models where KV cache is managed internally
if (model.p_device_->GetType() == DeviceType::QNN || model.p_device_->GetType() == DeviceType::CPU) {
const auto& provider_options = model.config_->model.decoder.session_options.provider_options;
for (const auto& po : provider_options) {
if (po.name == "QNN") {
for (const auto& option : po.options) {
// For QNN, if session option 'genie_model' is set to true, the session will encapsulate
// a stateful model, so KVCache will be managed internally.
if (option.first == "genie_model") {
std::string lower_value(option.second);
std::transform(lower_value.begin(), lower_value.end(), lower_value.begin(),
[](unsigned char c) { return static_cast<unsigned char>(std::tolower(c)); });
return lower_value == "true";
}
}
}
}
}
Comment on lines +88 to +104
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device-type gate contradicts the comment that the model is stateful ‘regardless of device type’ when genai_model=True. As written, the function will return false for any non-CPU/non-QNN DeviceType, even if the QNN provider options indicate a stateful GenAI model. Consider removing the device-type check (or basing the decision solely on presence of the QNN provider + genai_model flag) so the detection matches the intended behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is only relevant for QNN models, hence it should rightly return false for non-CPU/non-QNN device type. The check is valid.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (const auto& po : provider_options) {
      if (po.name == "QNN") {
        for (const auto& option : po.options) {
          // For QNN, if session option 'genai_model' is set, the session will encapsulate
          // a stateful model, so KVCache will be managed internally.
          if (option.first == "genai_model" && option.second == "True") {
            return true;
          }
        }
      }
    }

Already checks for the provider option. Do we need a check at the top as well? Because that check will still fail if the p_device = CPU (which is the case for multiple providers (such as QNN, VitisAI...).


return false;
}

} // namespace Generators
3 changes: 3 additions & 0 deletions src/qnn/interface.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,7 @@ namespace Generators {

DeviceInterface* GetQNNInterface();

struct Model;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure the forward declaration uses the same class-key as the actual Model declaration (i.e., class Model; vs struct Model;). If Model is defined with a different class-key in model.h, some toolchains can treat this as an incompatible redeclaration and fail to compile. Align the forward declaration with model.h.

Suggested change
struct Model;
class Model;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is declared as struct Model in model.h

bool IsQNNStatefulModel(const Model& model);

} // namespace Generators
Loading