Skip to content

Improve accuracy for models using shuffle, unshuffle, cat ops (#19159)#19159

Open
abhinaykukkadapu wants to merge 1 commit intopytorch:mainfrom
abhinaykukkadapu:export-D102626539
Open

Improve accuracy for models using shuffle, unshuffle, cat ops (#19159)#19159
abhinaykukkadapu wants to merge 1 commit intopytorch:mainfrom
abhinaykukkadapu:export-D102626539

Conversation

@abhinaykukkadapu
Copy link
Copy Markdown
Contributor

@abhinaykukkadapu abhinaykukkadapu commented Apr 27, 2026

Summary:

Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for aten.cat. Preserve shared qparams for pixel_shuffle and pixel_unshuffle, extend split_with_sizes_copy coverage, and add regressions for mismatched cat branches plus value-preserving ops that must use SharedQuantizationSpec.

Differential Revision: D102626539

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19159

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit 7fdec04 with merge base ddd8ac6 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 27, 2026

@abhinaykukkadapu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102626539.

@meta-codesync meta-codesync Bot changed the title Improve accuracy for models using shuffle, unshuffle, cat ops Improve accuracy for models using shuffle, unshuffle, cat ops (#19159) Apr 27, 2026
abhinaykukkadapu added a commit to abhinaykukkadapu/executorch that referenced this pull request Apr 27, 2026
…h#19159)

Summary:

Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`.

Differential Revision: D102626539
@abhinaykukkadapu abhinaykukkadapu linked an issue Apr 28, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@winskuo-quic winskuo-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abhinaykukkadapu,
Thanks a lot for this PR and making all changes to improve QNN Backend accuracy.
I think shuffle & unshuffle LGTM.
However, I think there might be some concerns for the concat operation modification.
I have a PR to reproduce the concerns I got: #19182. I also have the outputs and graphs compared in the summary section.
Please have a look.

If change for concat behavior is required to fix accuracy issues for the model you mentioned in #19159, could we possibly reuse custom_annotation feature under https://github.com/pytorch/executorch/blob/main/backends/qualcomm/quantizer/custom_annotation.py for that model.

Thanks

qscheme=quantization_config.output_activation.qscheme,
quant_max=quantization_config.output_activation.quant_max,
quant_min=quantization_config.output_activation.quant_min,
observer_or_fake_quant_ctr=ConcatObserver.with_args(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this change it reverting what this PR is doing: #15162.
The reason #15162 is introduced is because the input[0] could not cover the entire range of values for concat output, so a lot of output values were clipped.

If you have 2 input tensors like:
sample_input = ( torch.tensor([[[[-10.0, 2.0], [3.0, 4.0]]]]), torch.tensor([[[[1.0, 3.0], [8.0, 10.0]]]]), )
and after it goes through cat operation, you will be getting the wrong value with this PR.
[tensor([[[[-9.9798, 1.9849], [ 2.9774, 4.0250], [ 1.0476, 3.0325], [ 4.0802, 4.0802]]]])]
I have a demo PR to reproduce this error, please have a look:
#19182

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @winskuo-quic for the detailed review, i agree that it might've worked for the model but might not work when the ranges skewed like in your example. Let me revert the cat to concatobserver and test the accuracy.

# node1 -> q_ui8 (n) -> dq_ui8 -> q_int32 -> dq_int32 -> node2 -> ....
# We store {node2: quant_attr in dq_int32} in node1.meta
if n.target in q_ops and n.args[0].target not in dq_ops:
self._annotate_cat_requant(n)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a specific op in a general pass would not be the best option. Do you think we can possibly move the logic to htp_rules.py if this is necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i was also going back and forth on this, i added a pass instead of shoving it up here.

abhinaykukkadapu added a commit to abhinaykukkadapu/executorch that referenced this pull request Apr 29, 2026
…h#19159)

Summary:

Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`.

Differential Revision: D102626539
…h#19159)

Summary:

Replace the Qualcomm concat observer path with an explicit same-domain-or-requantize model for `aten.cat`. Preserve shared qparams for `pixel_shuffle` and `pixel_unshuffle`, extend `split_with_sizes_copy` coverage, and add regressions for mismatched `cat` branches plus value-preserving ops that must use `SharedQuantizationSpec`.

Differential Revision: D102626539
@abhinaykukkadapu
Copy link
Copy Markdown
Contributor Author

@winskuo-quic here is the latest patch, it uses the same ConcatObserver approach as previously done. Additionally the pass for requant is introduced after the generic AnnotateQuantAttr pass. Please review when you get a chance.

FYI the SQNR right now stayed above target with this changeset. (> 30db)

@github-project-automation github-project-automation Bot moved this to To triage in ExecuTorch Core Apr 29, 2026
@abhinaykukkadapu abhinaykukkadapu moved this from To triage to In progress in ExecuTorch Core Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

Accuracy issue in QCOM backend for DSR models

2 participants