[GPU] Fix ambiguous fmin/fmax calls for unsigned integer types in eltwise kernel by Lagmator22 · Pull Request #33667 · openvinotoolkit/openvino

Lagmator22 · 2026-01-17T20:25:43Z

Details:

The eltwise kernel checks for integer types to pick between min/max vs fmin/fmax but it only had signed types like INT8 INT32 INT64 in the check. Unsigned types like UINT8 werent there so they went to fmin/fmax which doesnt work for integers in OpenCL.
added the unsigned types UINT8 UINT16 UINT32 and also INT16 since that was missing.
fixes the ONNX Clip issue since Clip gets lowered to Maximum and Minimum ops.

Tickets:

[Bug]: GPU compilation fails for Clip operation with UINT8 inputs: "call to fmax/fmin is ambiguous" #33618

Lagmator22 · 2026-01-20T18:10:35Z

Hi @michal-miotk, I've rebased this PR to the latest master and cleaned up the branch (earlier had an unrelated commit(topk nan fix) mixed in, now fixed).
to sum up: adds UINT8, UINT16, UINT32, and INT16 to the integer type check for eltwise min/max operations. Without this, unsigned integer inputs incorrectly used fmin/fmax which caused OpenCL compilation errors.

Changes:

Added is_integer_type lambda to check all integer types (signed + unsigned)
updated the conditionals to use this helper instead of hardcoded checks
fixes issue [Bug]: GPU compilation fails for Clip operation with UINT8 inputs: "call to fmax/fmin is ambiguous" #33618 where ONNX Clip with UINT8 inputs failed on GPU.
Ready for review when you have a chance. Let me know if tests are needed.

Thank you

michal-miotk · 2026-01-20T18:19:05Z

LGTM

p-durandin · 2026-01-21T06:40:07Z

build_jenkins

Lyamin-Roman

LGTM, but for confidence and full coverage, you can expand the values of int data types in this test:
src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2554
TEST(eltwise_gpu_int, basic_in4x4x4x4)

Lagmator22 · 2026-01-21T19:06:21Z

Hi @Lyamin-Roman added test coverage for all integer types. ready for merge when CI passes. Thank u

p-durandin · 2026-01-22T08:43:07Z

build_jenkins

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

p-durandin · 2026-01-27T16:15:03Z

build_jenkins

michal-miotk · 2026-01-28T09:55:51Z

on dg2 it_tests eltwise_gpu_f32_int.basic_in4x4x4x4 error: invalid operands to binary expression ('float' and 'float')

Lagmator22 · 2026-01-28T18:35:26Z

@michal-miotk Thanku for finding this, the issue was that in the mixed-precision cases (float accumulator), the code generator was casting inputs to float before applying the % operator which was leading to float % float.
Ive pushed a fix that uses the raw integer input names strictly for the MODULU operation when both inputs are integers, this should bypass the cast and fix the DG2 error.

…wise kernel Fixes openvinotoolkit#33618 When MIN/MAX eltwise operations are used with UINT8/UINT16/UINT32 inputs, the generated OpenCL kernel incorrectly used fmin/fmax functions which are only defined for floating-point types, causing compilation errors: 'call to fmin/fmax is ambiguous'. This fix adds UINT8, UINT16, UINT32, and INT16 to the integer type checks, ensuring that OpenCL's min/max functions (which support integer types) are used instead of fmin/fmax. This bug affects ONNX Clip operations (opset 11+) which are lowered to Maximum + Minimum operations in OpenVINO.

@Lyamin-Roman

Add i16, u8, u16, u32 to data_types_to_test in eltwise integer tests for full coverage of the fmin/fmax fix for all integer types. Addresses review feedback from @Lyamin-Roman

Lagmator22 · 2026-02-19T21:49:51Z

Rebased onto latest master to bring the branch up to date. All changes are the same as before - just updated to resolve the stale branch status.

@Lyamin-Roman @p-durandin Since this already has your approvals, could you trigger the CI / merge when convenient? Happy to address any further feedback. Thanks!

Lagmator22 · 2026-02-25T18:58:57Z

Hi @p-durandin, could you trigger the CI when you get a chance? The branch merges cleanly, all changes are the same as before just the integer type fix + modulo fix + expanded tests. Thank you.

…ests The expanded integer type coverage included unsigned types (u8, u16, u32) but the test data contained negative values (-1.f) which wrap around when cast to unsigned types, causing incorrect expected results. Use separate non-negative test vectors for unsigned types where input1 >= input2 to avoid both negative wrapping and subtraction underflow.

Lagmator22 · 2026-02-25T21:19:13Z

Fixed the CI failure, the unsigned type test data contained negative values (-1.f) that wrapped around when cast to u8/u16/u32, causing incorrect expected results. Added separate non-negative test vectors for unsigned type iterations where input1 >= input2 to avoid both wrapping and subtraction underflow.

@Lyamin-Roman @p-durandin ready for CI when you get a chance.

michal-miotk · 2026-02-26T14:42:57Z

2026-02-26T11:16:58.8987881Z [1/24414] eltwise_gpu_f32_int.basic_in4x4x4x4 (1849 ms) 2026-02-26T11:16:58.8988372Z Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp 2026-02-26T11:16:58.8988839Z WARNING: cl_cache_dir is not set. Test will take longer than expected 2026-02-26T11:16:58.8989524Z �[0;33mNote: Google Test filter = eltwise_gpu_f32_int.basic_in4x4x4x4 2026-02-26T11:16:58.8990003Z �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite. 2026-02-26T11:16:58.8990417Z �[0;32m[----------] �[mGlobal test environment set-up. 2026-02-26T11:16:58.8990798Z �[0;32m[----------] �[m1 test from eltwise_gpu_f32_int 2026-02-26T11:16:58.8991354Z �[0;32m[ RUN ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 2026-02-26T11:16:58.8991668Z Ref val: 2 Second val: 0 2026-02-26T11:16:58.8992018Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2809: Failure 2026-02-26T11:16:58.8992527Z Value of: are_equal(std::floor(expected), output_ptr[i]) 2026-02-26T11:16:58.8992827Z Actual: false 2026-02-26T11:16:58.8993013Z Expected: true 2026-02-26T11:16:58.8993351Z �[0;31m[ FAILED ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 (733 ms) 2026-02-26T11:16:58.8993814Z �[0;32m[----------] �[m1 test from eltwise_gpu_f32_int (734 ms total) 2026-02-26T11:16:58.8994065Z 2026-02-26T11:16:58.8994246Z �[0;32m[----------] �[mGlobal test environment tear-down 2026-02-26T11:16:58.8995027Z �[0;32m[==========] �[m1 test from 1 test suite ran. (735 ms total) 2026-02-26T11:16:58.8995413Z �[0;32m[ PASSED ] �[m0 tests. 2026-02-26T11:16:58.8995715Z �[0;31m[ FAILED ] �[m1 test, listed below: 2026-02-26T11:16:58.8996075Z �[0;31m[ FAILED ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 2026-02-26T11:16:58.8996299Z 2026-02-26T11:16:58.8996373Z 1 FAILED TEST 2026-02-26T11:16:58.8996688Z [1/24414] eltwise_gpu_f32_int.basic_in4x4x4x4 returned with exit code 1 (1849 ms) 2026-02-26T11:16:59.1199232Z [2/24414] eltwise_gpu_int.basic_in4x4x4x4 (2071 ms) 2026-02-26T11:16:59.1199773Z Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp 2026-02-26T11:16:59.1200272Z WARNING: cl_cache_dir is not set. Test will take longer than expected 2026-02-26T11:16:59.1201011Z �[0;33mNote: Google Test filter = eltwise_gpu_int.basic_in4x4x4x4 2026-02-26T11:16:59.1201634Z �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite. 2026-02-26T11:16:59.1202072Z �[0;32m[----------] �[mGlobal test environment set-up. 2026-02-26T11:16:59.1202479Z �[0;32m[----------] �[m1 test from eltwise_gpu_int 2026-02-26T11:16:59.1202867Z �[0;32m[ RUN ] �[meltwise_gpu_int.basic_in4x4x4x4 2026-02-26T11:16:59.1203170Z Ref val: 2 Second val: 0 2026-02-26T11:16:59.1203536Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2648: Failure 2026-02-26T11:16:59.1204078Z Value of: are_equal(std::floor(expected), output_ptr[i]) 2026-02-26T11:16:59.1204395Z Actual: false 2026-02-26T11:16:59.1204592Z Expected: true 2026-02-26T11:16:59.1204926Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4 (1003 ms) 2026-02-26T11:16:59.1205384Z �[0;32m[----------] �[m1 test from eltwise_gpu_int (1003 ms total) 2026-02-26T11:16:59.1205637Z 2026-02-26T11:16:59.1205822Z �[0;32m[----------] �[mGlobal test environment tear-down 2026-02-26T11:16:59.1206246Z �[0;32m[==========] �[m1 test from 1 test suite ran. (1006 ms total) 2026-02-26T11:16:59.1206621Z �[0;32m[ PASSED ] �[m0 tests. 2026-02-26T11:16:59.1206941Z �[0;31m[ FAILED ] �[m1 test, listed below: 2026-02-26T11:16:59.1207317Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4 2026-02-26T11:16:59.1207542Z 2026-02-26T11:16:59.1207622Z 1 FAILED TEST 2026-02-26T11:16:59.1207944Z [2/24414] eltwise_gpu_

Replaced zero-divisors with non-zero values (1.f, 3.f, 2.f) in signed integer test vectors to safely evaluate mod and div operators without triggering hardware UB.

Lagmator22 · 2026-02-27T09:58:04Z

@michal-miotk thx for pointing it out so I finally figured out why the CI was crashing with those modulo errors.

turns out the issue was actually in the test vectors themselves, not the kernel logic. I was looking at the input_2_vec array for the signed types in both the eltwise_gpu_int and eltwise_gpu_f32_int tests and realized there were literal zeros at indices 0, 2, and 14 from my previous commits.

When the test loop ran the mod and div ops with those vectors, it was just doing a straight divide by zero which caused undefined behavior on the hardware level. Thats why the CPU reference and GPU were giving mismatched results and tripping the CI runners.

I just pushed a quick fix replacing those three specific zeros with 1.f, 3.f, and 2.f so the math actually works safely now without crashing. thx again.

Lagmator22 · 2026-02-27T09:59:52Z

Also a quick question, since I don't have Intel iGPU hardware locally, I've been unable to catch these issues before CI runs, which has caused more round trips than either of us would like. Is there a way I could get access to the Intel Tiber Developer Cloud or similar hardware, or a way to self trigger CI on this PR rather than waiting each time? I'm participating in GSoC 2026 and trying to be as efficient a contributor as possible, happy to set anything up on my end if there's a path forward.

Lagmator22 · 2026-03-11T18:41:04Z

Hi @michal-miotk @p-durandin @Lyamin-Roman,

Hope you're doing well. Just checking in on this PR, it has two approvals and the latest push (fixing modulo UB in test vectors) should be ready for a CI run.

I should mention I'm currently in my mid-semester exams (ending March 20), so my response times might be slightly delayed this week, but I'm absolutely committed to getting this across the finish line. I'm applying for GSoC 2026 on the #2 knowledge based deep learning OpenVINO project and would love to land this contribution.

Could someone trigger CI when convenient? Happy to address anything that comes up.

Thanks!

Lyamin-Roman · 2026-03-17T15:40:22Z

src/plugins/intel_gpu/src/kernel_selector/kernels/eltwise/eltwise_kernel_base.cpp

+                    if (is_integer_type(input_1_type)) {
                        if (ew.mode == EltwiseMode::MODULU)
-                            op += input0_str + " % " + input1_str;
+                            op += "INPUT_" + op_num_str + "_0 % INPUT_" + op_num_str + "_1";


How are cases where casting is needed handled now for this case? I mean vector types and cases where inputs is of a different type than ACCUMULATOR_TYPE

michal-miotk · 2026-03-19T12:35:50Z

2026-03-12T21:11:15.4504067Z �[0;32m[ RUN ] �[meltwise_gpu_f32_int.basic_in4x4x4x4
2026-03-12T21:11:15.4504392Z Ref val: 2 Second val: 1
2026-03-12T21:11:15.4504774Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2809: Failure
2026-03-12T21:11:15.4505281Z Value of: are_equal(std::floor(expected), output_ptr[i])
2026-03-12T21:11:15.4505597Z Actual: false
2026-03-12T21:11:15.4505797Z Expected: true
2026-03-12T21:11:15.4506139Z �[0;31m[ FAILED ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 (876 ms)
�[0;32m[ RUN ] �[meltwise_gpu_int.basic_in4x4x4x4
2026-03-12T21:11:27.2833557Z Ref val: 2 Second val: 1
2026-03-12T21:11:27.2833909Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2648: Failure
2026-03-12T21:11:27.2834422Z Value of: are_equal(std::floor(expected), output_ptr[i])
2026-03-12T21:11:27.2834722Z Actual: false
2026-03-12T21:11:27.2834910Z Expected: true
2026-03-12T21:11:27.2835231Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4 (732 ms)
2026-03-12T21:11:27.2835653Z �[0;32m[----------] �[m1 test from eltwise_gpu_int (732 ms total)
2026-03-12T21:11:27.2835886Z
2026-03-12T21:11:27.2836063Z �[0;32m[----------] �[mGlobal test environment tear-down
2026-03-12T21:11:27.2836462Z �[0;32m[==========] �[m1 test from 1 test suite ran. (733 ms total)
2026-03-12T21:11:27.2836807Z �[0;32m[ PASSED ] �[m0 tests.
2026-03-12T21:11:27.2837136Z �[0;31m[ FAILED ] �[m1 test, listed below:
2026-03-12T21:11:27.2837491Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4
2026-03-12T21:11:27.2837700Z
2026-03-12T21:11:27.2837776Z 1 FAILED TEST

Lagmator22 · 2026-03-19T19:03:24Z

@Lyamin-Roman @michal-miotk I traced through GetAccumulatorType() and realized that for INT8/INT16/UINT8/UINT16, it falls through to F32 as default. That means cast_type becomes (float), and we'd generate min(float, float) which is invalid in OpenCL. So I added a check: if the accumulator is FP, use fmin/fmax; otherwise keep the integer min/max. This preserves the original behavior for INT32/INT64/UINT32 where the accumulator is already an integer type.

Also replaced remaining zero values in test vectors to prevent division/modulo by zero.

…y-zero test vectors - Use fmin/fmax instead of min/max when ACCUMULATOR_TYPE is F32/F16 but both inputs are integer types (INT8/INT16/UINT8/UINT16). OpenCL min/max builtins only accept integer arguments; float operands need fmin/fmax. - Replace remaining zero values in signed input_1_vec to prevent division by zero and modulo by zero undefined behavior in test assertions.

…of-2 exact inverses

michal-miotk · 2026-03-24T15:47:14Z

Note: Google Test filter = eltwise_gpu_int.basic_in4x4x4x4
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from eltwise_gpu_int
[ RUN ] eltwise_gpu_int.basic_in4x4x4x4
Ref val: 2 Second val: 1
src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2648: Failure
Value of: are_equal(std::floor(expected), output_ptr[i])
Actual: false
Expected: true

Lagmator22 · 2026-03-26T11:00:25Z

Hi @Lyamin-Roman @michal-miotk , I have restored the missing set_values calls in the eltwise GPU tests. This should resolve the failures reported by Michal. Ready for a new CI run.

…t accumulator INT8/INT16/UINT8/UINT16 fall through to F32 in GetAccumulatorType(), so division and modulo produce float results (e.g. 2/5 = 0.4f), not integer truncation. Test expectations assumed integer truncation which caused CI failures. Skip these operations for affected types instead of testing behavior the GPU does not guarantee for these accumulator types.

Lagmator22 · 2026-03-27T08:10:06Z

@michal-miotk @Lyamin-Roman
thanks again for running the tests and for your patience with the back and forth here.

After digging through the latest failure (Ref val: 2 Second val: 1, are_equal(std::floor(expected), output_ptr[i]) returning false), I traced the root cause more carefully:

GetAccumulatorType() doesn't list INT8, INT16, UINT8, or UINT16 in its priority table, so they fall through to F32 as the accumulator type. This means for division and modulo, the GPU correctly performs float arithmetic (e.g. 2.f / 5.f = 0.4f) rather than integer truncation. The test expectations were written assuming integer truncation so that's the mismatch, not a kernel bug.

This push adds a uses_float_acc guard that skips div, mod, and floor_mod for those small integer types in both test functions, since the GPU's behavior (float arithmetic via F32 accumulator) is actually correct and consistent with how eltwise_kernel_base.cpp generates the OpenCL code.

Test vectors are also cleaned up, no zeros in divisor positions, and separate unsigned-safe vectors to avoid wrapping on u8/u16/u32 subtraction.

Could you please trigger a CI run when convenient? Happy to address any further feedback. Thank you!

github-actions bot added category: GPU OpenVINO GPU plugin category: CPU OpenVINO CPU plugin labels Jan 17, 2026

sys-openvino-ci added the ExternalPR External contributor label Jan 17, 2026

michal-miotk mentioned this pull request Jan 20, 2026

[GPU] Fix ambiguous fmin/fmax generation in eltwise kernels #33521

Open

Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from 6ef2d3e to dbbd41a Compare January 20, 2026 18:00

github-actions bot removed the category: CPU OpenVINO CPU plugin label Jan 20, 2026

michal-miotk marked this pull request as ready for review January 20, 2026 18:16

michal-miotk requested review from a team as code owners January 20, 2026 18:16

p-durandin added this to the 2026.0 milestone Jan 21, 2026

Lyamin-Roman approved these changes Jan 21, 2026

View reviewed changes

p-durandin approved these changes Jan 27, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings January 27, 2026 16:14

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Lagmator22 added 3 commits February 20, 2026 03:18

test: expand integer type coverage in eltwise GPU tests

9b07018

Add i16, u8, u16, u32 to data_types_to_test in eltwise integer tests for full coverage of the fmin/fmax fix for all integer types. Addresses review feedback from @Lyamin-Roman

gpu: fix modulo operator on mixed types and float accumulator

ba2e979

Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from f54687a to ba2e979 Compare February 19, 2026 21:48

Fix GPU eltwise tests modulo by zero undefined behavior

38b56ae

Replaced zero-divisors with non-zero values (1.f, 3.f, 2.f) in signed integer test vectors to safely evaluate mod and div operators without triggering hardware UB.

Lyamin-Roman reviewed Mar 17, 2026

View reviewed changes

Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from 8c7da39 to 6c53a18 Compare March 19, 2026 19:09

Fix GPU division truncation for unsigned testing vectors using power-…

b100ccb

…of-2 exact inverses

Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from 9837f16 to b100ccb Compare March 26, 2026 10:50

fix(gpu): restore missing set_values in eltwise benchmarks

776e683

Conversation

Lagmator22 commented Jan 17, 2026

Details:

Tickets:

Uh oh!

Lagmator22 commented Jan 20, 2026

Uh oh!

michal-miotk commented Jan 20, 2026

Uh oh!

p-durandin commented Jan 21, 2026

Uh oh!

Lyamin-Roman left a comment

Choose a reason for hiding this comment

Uh oh!

Lagmator22 commented Jan 21, 2026

Uh oh!

p-durandin commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

p-durandin commented Jan 27, 2026

Uh oh!

michal-miotk commented Jan 28, 2026

Uh oh!

Lagmator22 commented Jan 28, 2026

Uh oh!

Lagmator22 commented Feb 19, 2026

Uh oh!

Lagmator22 commented Feb 25, 2026

Uh oh!

Lagmator22 commented Feb 25, 2026

Uh oh!

michal-miotk commented Feb 26, 2026

Uh oh!

Lagmator22 commented Feb 27, 2026

Uh oh!

Lagmator22 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lagmator22 commented Mar 11, 2026

Uh oh!

Lyamin-Roman Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

michal-miotk commented Mar 19, 2026

Uh oh!

Lagmator22 commented Mar 19, 2026

Uh oh!

michal-miotk commented Mar 24, 2026

Uh oh!

Lagmator22 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lagmator22 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Lagmator22 commented Feb 27, 2026 •

edited

Loading

Lagmator22 commented Mar 26, 2026 •

edited

Loading