Skip to content

[GPU] Add squeeze/unsqueeze to the compressed conv1x1 transformation#34957

Open
mdvoretc-intel wants to merge 4 commits intoopenvinotoolkit:masterfrom
mdvoretc-intel:squeeze_matmul_unsqueeze
Open

[GPU] Add squeeze/unsqueeze to the compressed conv1x1 transformation#34957
mdvoretc-intel wants to merge 4 commits intoopenvinotoolkit:masterfrom
mdvoretc-intel:squeeze_matmul_unsqueeze

Conversation

@mdvoretc-intel
Copy link
Copy Markdown
Contributor

Details:

  • Adds an activation squeeze/unsqueeze node pair to the ConvertWeightCompressedConv1x1ToMatmul transformation when outermost dimension is a static 1
  • This is intended to reduce the activation to 3D, allowing compile-time weight reordering on non-systolic devices

Tickets:

AI Assistance:

  • AI assistance used: no

@mdvoretc-intel mdvoretc-intel requested a review from a team as a code owner March 26, 2026 13:19
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Mar 26, 2026
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Mar 26, 2026
@mryzhov mryzhov self-assigned this Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an activation squeeze/unsqueeze pair around the MatMul emitted by ConvertWeightCompressedConv1x1ToMatmul when the activation’s leading dimension is statically 1, aiming to reduce activation rank to 3D to enable compile-time weight reordering on non-systolic devices.

Changes:

  • Add optional activation squeeze before MatMul and matching unsqueeze after MatMul in the conversion pass.
  • Extend the existing transformation unit test to cover batched vs non-batched inputs and update the reference graph accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/common/transformations/src/transformations/op_conversions/convert_weight_compressed_conv1x1_to_matmul.cpp Implements conditional squeeze/unsqueeze reshapes around the produced MatMul.
src/common/transformations/tests/op_conversions/convert_weight_compressed_conv1x1_to_matmul_test.cpp Expands parameterized coverage (adds with_batched_input) and updates the reference model to match the new reshape behavior.

Comment on lines +241 to +247
auto shape_out = activation->get_output_partial_shape(0);
auto squeeze_const =
std::make_shared<ov::op::v0::Constant>(ov::element::i64,
ov::Shape{3},
std::vector<int64_t>{1, -1, shape_out[-1].get_length()});
MatcherPass::register_new_node(squeeze_const);
auto squeeze = std::make_shared<ov::op::v1::Reshape>(activation, squeeze_const, false);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] The new squeeze/unsqueeze reshape constants rely on shape_out[-1].get_length(). If the last dimension is dynamic (e.g., input has dynamic channel/feature dim), get_length() throws and the transformation will crash during pass execution. Consider building the reshape pattern without requiring static last dim (e.g., use special_zero=true with a 0 in the last position to copy the input’s last dimension), and apply the same fix to the unsqueeze block as well.

Suggested change
auto shape_out = activation->get_output_partial_shape(0);
auto squeeze_const =
std::make_shared<ov::op::v0::Constant>(ov::element::i64,
ov::Shape{3},
std::vector<int64_t>{1, -1, shape_out[-1].get_length()});
MatcherPass::register_new_node(squeeze_const);
auto squeeze = std::make_shared<ov::op::v1::Reshape>(activation, squeeze_const, false);
// Reshape [1, D1, ..., Dn] -> [1, -1, Dn] while safely copying the last dimension
// without requiring it to be statically known.
auto squeeze_const =
std::make_shared<ov::op::v0::Constant>(ov::element::i64,
ov::Shape{3},
std::vector<int64_t>{1, -1, 0});
MatcherPass::register_new_node(squeeze_const);
auto squeeze = std::make_shared<ov::op::v1::Reshape>(activation, squeeze_const, true);

Copilot uses AI. Check for mistakes.
Comment on lines 178 to 197
std::shared_ptr<ov::Node> act_node = input;
if (p.activation_op_type == "Reshape" && p.with_act_new_reshape) {
auto reshape_const = ov::opset1::Constant::create(ov::element::i64, ov::Shape{4}, {1, 1, 1, 10});
auto reshape_const = ov::opset1::Constant::create(ov::element::i64, ov::Shape{4}, {1, 1, input_batch, 10});
act_node = std::make_shared<ov::opset1::Reshape>(input, reshape_const, false);
}
if (input_batch == 1 || (p.activation_op_type == "Reshape" && p.with_act_new_reshape)) {
auto squeeze_const = ov::opset1::Constant::create(ov::element::i64, ov::Shape{3}, {1, input_batch, 10});
act_node = std::make_shared<ov::opset1::Reshape>(act_node, squeeze_const, false);
}
auto matmul = std::make_shared<ov::op::v0::MatMul>(act_node, mul, false, true);
current_node = matmul;

if (p.with_bias) {
auto bias_const = ov::opset1::Constant::create(ov::element::f16, ov::Shape{1, 1, 1, 15}, {1});
current_node = std::make_shared<ov::opset1::Add>(matmul, bias_const);
current_node = std::make_shared<ov::opset1::Add>(current_node, bias_const);
}
if (input_batch == 1 || (p.activation_op_type == "Reshape" && p.with_act_new_reshape)) {
auto unsqueeze_const = ov::opset1::Constant::create(ov::element::i64, ov::Shape{4}, {1, 1, input_batch, 15});
current_node = std::make_shared<ov::opset1::Reshape>(current_node, unsqueeze_const, false);
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] The new squeeze/unsqueeze behavior is only validated with fully static shapes. Since the implementation currently depends on shape introspection (and should support dynamic channel dims once the get_length() issue is addressed), please add a regression test case where the activation’s last dimension is dynamic (or weights/output dim is dynamic) to ensure the transformation does not throw and preserves the expected output rank/shape.

Copilot generated this review using guidance from repository custom instructions.
ov::Shape{4},
std::vector<int64_t>{1, 1, -1, shape_out[-1].get_length()});
MatcherPass::register_new_node(unsqueeze_const);
auto unsqueeze = std::make_shared<ov::op::v1::Reshape>(matmul_out, unsqueeze_const, false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're inserting Reshape, but calling it unsqueeze. Is this how it should be?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The squeeze/unsqueeze operations are represented by reshapes here to avoid adding new node types at this stage in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: transformations OpenVINO Runtime library - Transformations ExternalIntelPR External contributor from Intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants