Restore NPUW_DQ fallback for older drivers by Maxim-Doronin · Pull Request #33621 · openvinotoolkit/openvino

Maxim-Doronin · 2026-01-15T14:52:40Z

Details:

Restoring the logic added in [NPUW] Update compiler DQ query in LLMCompiledModel #28343 that was unfairly removed in NPUW: Three-model pipeline #30554 by mistake

Tickets:

E#198339

AsyaPronina

Thanks a lot! Great catch!!

dmatveev · 2026-01-15T20:53:39Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

+    // Specify NPUW DQ if Compiler DQ is not enabled
+    if (!npudesc.has_value() || !npudesc->compiler_dq) {
+        config.emplace("NPUW_DQ", "YES");
+    }


This change certainly brings back the missed behavior, but after reviewing the history thoroughly I am not quite sure if the OLD behavior was correct.

The OLD behavior was first introduced here: #28343

The logic we bring back is: "use NPUW_DQ if the compiler DQ is not present". But, if I remember correctly, NPUW_DQ is the compiler DQ. They come together. So one can't substitute another, they come in pair. The idea here is that to make the compiler DQ work, we sometimes need to transform a model a certain way. If the compiler DQ isn't available as in older drivers, we need to tranfrom the model even more (the FULL NPUW-side DQ).

UPD: NPUW_DQ_FULL is on by default, so enabling NPUW_DQ here gives us NPUW_DQ_FULL automatically. It is obscure but seem to work (see below).

Mnemonics in the property description confirm this:

NPUW_DQ: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp#L184

NPUW_DQ_FULL: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp#L192

Looking at the default values - https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/al/include/intel_npu/config/npuw.hpp#L111

NPUW_DQ is false (probably a rudiment)

NPUW_DQ_FULL is true

Now looking into the configuration building:

We start with the "worst case" configuration: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L1133

IF we've got a compiler capability, we enable our light transformations (NPUW_DQ) along with the NPU_COMPILER_DYNAMIC_QUANTIZATION, and erase the DQ_FULL and DCOFF things to avoid run-time unpacks: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L1143

This logic seem to be good for the moment. Remember this is the baseline common configuration that is used as a basis for prefill & generate stages.

But later, when we refine the PREFILL model config, we do something obscure: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L1171 - strangely enough this change is introduced by the same original commit 57025dc

UPD2: The obscurity is deciphered above.

The old logic (in red) seem to make more sense than the new one (in green):

Previously (red), we've set DQ_FULL (to avoid the full transformation) to NO if and only IF compiler supported DQ, that made sense. Now (green) this condition is reversed, but in the case when compiler DQ is not present, we set NPUW_DQ ~~(instead of NPUW_DQ_FULL that is supposed to handle this case). That's clearly a miss.~~ that also includes NPUW_DQ_FULL as that one wasn't disabled.

~~Same thing happened for the GENERATE model - we didn't find the capability but we still set _DQ (not _DQ_FULL that is supposed to be there).~~

Initially, we've only had NPUW_DQ that did the full transformation. Later, when compiler-side DQ has came in, we've provided the past behavior under NPUW_DQ_FULL, and used NPUW_DQ to do the compiler-friendly transformation (only impacting group-quantized models). I beleive the combination of this rename & some "refactoring" in the original commit caused the ~~issue~~ confusion (UPD2).

TL;DR: the old behavior is restored, but the old behavior is sus

UPD: More archeology

NPUW_DQ was the die hard one in the beginning: NPUW: Introduce DQ #26362 (did the full transformation to the model)

NPUW_DQ_FULL was introduced later as an early return in the die-hard NPUW_DQ path: [NPUW] Introduce DQ_FULL property #27678

DQ and DQ_FULL don't inverse each other. DQ_FULL will only work if it is ON while DQ is ON.

So here comes UPD2;

dmatveev · 2026-01-15T21:19:32Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

    if (npudesc.has_value() && npudesc->compiler_dq) {
        config.emplace("NPUW_DQ", "YES");
        config.emplace("NPUW_DQ_FULL", "NO");


Later thought:

So.. if we DONT hit this condition (say, we DONT have compiler DQ), we stay with the default values:

NPUW_DQ false

NPUW_DQ_FULL true

With the way these options are handled, no DQ transformations will be applied to the model - https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp#L2193

If we leave it this way and don't do any later refinements, DCOFF will kick in as it did (the issue with lower NPU performance and higher CPU load).

..and later we enable NPUW_DQ to get NPUW_DQ_FULL enabled along with that, so the past behavior seem to be correct, and the fix seem to be correct too.

Thanks for the comments! Follow-up task has been created: E#199512

### Details: - Restoring the logic added in openvinotoolkit#28343 that was unfairly removed in openvinotoolkit#30554 by mistake ### Tickets: - E#198339

Restore NPUW_DQ fallback for older drivers

9b46cb2

Maxim-Doronin requested review from a team as code owners January 15, 2026 14:52

Maxim-Doronin requested a review from AsyaPronina January 15, 2026 14:52

Merge branch 'master' into md/restore_npuw_dq_fallback

7b9daf1

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jan 15, 2026

AsyaPronina approved these changes Jan 15, 2026

View reviewed changes

dmatveev assigned AsyaPronina Jan 15, 2026

dmatveev added this to the 2026.0 milestone Jan 15, 2026

dmatveev reviewed Jan 15, 2026

View reviewed changes

dmatveev approved these changes Jan 15, 2026

View reviewed changes

Maxim-Doronin added this pull request to the merge queue Jan 16, 2026

Merged via the queue into openvinotoolkit:master with commit 1cc90c2 Jan 16, 2026
182 checks passed

Maxim-Doronin deleted the md/restore_npuw_dq_fallback branch January 16, 2026 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore NPUW_DQ fallback for older drivers#33621

Restore NPUW_DQ fallback for older drivers#33621
Maxim-Doronin merged 2 commits intoopenvinotoolkit:masterfrom
Maxim-Doronin:md/restore_npuw_dq_fallback

Maxim-Doronin commented Jan 15, 2026

Uh oh!

AsyaPronina left a comment

Uh oh!

dmatveev Jan 15, 2026 •

edited

Loading

Uh oh!

dmatveev Jan 15, 2026

Uh oh!

dmatveev Jan 15, 2026

Uh oh!

Maxim-Doronin Jan 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Maxim-Doronin commented Jan 15, 2026

Details:

Tickets:

Uh oh!

AsyaPronina left a comment

Choose a reason for hiding this comment

Uh oh!

dmatveev Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmatveev Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

dmatveev Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Maxim-Doronin Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dmatveev Jan 15, 2026 •

edited

Loading

Maxim-Doronin Jan 16, 2026 •

edited

Loading