Restore NPUW_DQ fallback for older drivers#33621
Restore NPUW_DQ fallback for older drivers#33621Maxim-Doronin merged 2 commits intoopenvinotoolkit:masterfrom
Conversation
AsyaPronina
left a comment
There was a problem hiding this comment.
Thanks a lot! Great catch!!
| // Specify NPUW DQ if Compiler DQ is not enabled | ||
| if (!npudesc.has_value() || !npudesc->compiler_dq) { | ||
| config.emplace("NPUW_DQ", "YES"); | ||
| } |
There was a problem hiding this comment.
This change certainly brings back the missed behavior, but after reviewing the history thoroughly I am not quite sure if the OLD behavior was correct.
The OLD behavior was first introduced here: #28343
The logic we bring back is: "use NPUW_DQ if the compiler DQ is not present". But, if I remember correctly, NPUW_DQ is the compiler DQ. They come together. So one can't substitute another, they come in pair. The idea here is that to make the compiler DQ work, we sometimes need to transform a model a certain way. If the compiler DQ isn't available as in older drivers, we need to tranfrom the model even more (the FULL NPUW-side DQ).
UPD: NPUW_DQ_FULL is on by default, so enabling NPUW_DQ here gives us NPUW_DQ_FULL automatically. It is obscure but seem to work (see below).
Mnemonics in the property description confirm this:
- NPUW_DQ: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp#L184
- NPUW_DQ_FULL: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp#L192
Looking at the default values - https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/al/include/intel_npu/config/npuw.hpp#L111
- NPUW_DQ is false (probably a rudiment)
- NPUW_DQ_FULL is true
Now looking into the configuration building:
- We start with the "worst case" configuration: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L1133
- IF we've got a compiler capability, we enable our light transformations (NPUW_DQ) along with the NPU_COMPILER_DYNAMIC_QUANTIZATION, and erase the DQ_FULL and DCOFF things to avoid run-time unpacks: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L1143
This logic seem to be good for the moment. Remember this is the baseline common configuration that is used as a basis for prefill & generate stages.
But later, when we refine the PREFILL model config, we do something obscure: https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp#L1171 - strangely enough this change is introduced by the same original commit 57025dc
UPD2: The obscurity is deciphered above.
The old logic (in red) seem to make more sense than the new one (in green):
Previously (red), we've set DQ_FULL (to avoid the full transformation) to NO if and only IF compiler supported DQ, that made sense. Now (green) this condition is reversed, but in the case when compiler DQ is not present, we set NPUW_DQ (instead of NPUW_DQ_FULL that is supposed to handle this case). That's clearly a miss. that also includes NPUW_DQ_FULL as that one wasn't disabled.
Same thing happened for the GENERATE model - we didn't find the capability but we still set _DQ (not _DQ_FULL that is supposed to be there).
Initially, we've only had NPUW_DQ that did the full transformation. Later, when compiler-side DQ has came in, we've provided the past behavior under NPUW_DQ_FULL, and used NPUW_DQ to do the compiler-friendly transformation (only impacting group-quantized models). I beleive the combination of this rename & some "refactoring" in the original commit caused the issue confusion (UPD2).
TL;DR: the old behavior is restored, but the old behavior is sus
UPD: More archeology
- NPUW_DQ was the die hard one in the beginning: NPUW: Introduce DQ #26362 (did the full transformation to the model)
- NPUW_DQ_FULL was introduced later as an early return in the die-hard NPUW_DQ path: [NPUW] Introduce DQ_FULL property #27678
DQ and DQ_FULL don't inverse each other. DQ_FULL will only work if it is ON while DQ is ON.
So here comes UPD2;
| if (npudesc.has_value() && npudesc->compiler_dq) { | ||
| config.emplace("NPUW_DQ", "YES"); | ||
| config.emplace("NPUW_DQ_FULL", "NO"); |
There was a problem hiding this comment.
Later thought:
So.. if we DONT hit this condition (say, we DONT have compiler DQ), we stay with the default values:
- NPUW_DQ false
- NPUW_DQ_FULL true
With the way these options are handled, no DQ transformations will be applied to the model - https://github.com/openvinotoolkit/openvino/blob/2025.4.0/src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp#L2193
If we leave it this way and don't do any later refinements, DCOFF will kick in as it did (the issue with lower NPU performance and higher CPU load).
There was a problem hiding this comment.
..and later we enable NPUW_DQ to get NPUW_DQ_FULL enabled along with that, so the past behavior seem to be correct, and the fix seem to be correct too.
There was a problem hiding this comment.
Thanks for the comments! Follow-up task has been created: E#199512
### Details: - Restoring the logic added in openvinotoolkit#28343 that was unfairly removed in openvinotoolkit#30554 by mistake ### Tickets: - E#198339
Details:
Tickets: