Skip to content

[GPU] Fix ambiguous fmin/fmax calls for unsigned integer types in eltwise kernel#33667

Open
Lagmator22 wants to merge 9 commits intoopenvinotoolkit:masterfrom
Lagmator22:fix/33618-gpu-clip-uint8-ambiguity
Open

[GPU] Fix ambiguous fmin/fmax calls for unsigned integer types in eltwise kernel#33667
Lagmator22 wants to merge 9 commits intoopenvinotoolkit:masterfrom
Lagmator22:fix/33618-gpu-clip-uint8-ambiguity

Conversation

@Lagmator22
Copy link
Copy Markdown

Details:

  • The eltwise kernel checks for integer types to pick between min/max vs fmin/fmax but it only had signed types like INT8 INT32 INT64 in the check. Unsigned types like UINT8 werent there so they went to fmin/fmax which doesnt work for integers in OpenCL.
    added the unsigned types UINT8 UINT16 UINT32 and also INT16 since that was missing.
    fixes the ONNX Clip issue since Clip gets lowered to Maximum and Minimum ops.

Tickets:

@github-actions github-actions bot added category: GPU OpenVINO GPU plugin category: CPU OpenVINO CPU plugin labels Jan 17, 2026
@sys-openvino-ci sys-openvino-ci added the ExternalPR External contributor label Jan 17, 2026
@Lagmator22 Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from 6ef2d3e to dbbd41a Compare January 20, 2026 18:00
@github-actions github-actions bot removed the category: CPU OpenVINO CPU plugin label Jan 20, 2026
@Lagmator22
Copy link
Copy Markdown
Author

Hi @michal-miotk, I've rebased this PR to the latest master and cleaned up the branch (earlier had an unrelated commit(topk nan fix) mixed in, now fixed).
to sum up: adds UINT8, UINT16, UINT32, and INT16 to the integer type check for eltwise min/max operations. Without this, unsigned integer inputs incorrectly used fmin/fmax which caused OpenCL compilation errors.

Changes:

Thank you

@michal-miotk michal-miotk marked this pull request as ready for review January 20, 2026 18:16
@michal-miotk michal-miotk requested review from a team as code owners January 20, 2026 18:16
@michal-miotk
Copy link
Copy Markdown
Contributor

LGTM

@p-durandin p-durandin added this to the 2026.0 milestone Jan 21, 2026
@p-durandin
Copy link
Copy Markdown
Contributor

build_jenkins

Copy link
Copy Markdown
Contributor

@Lyamin-Roman Lyamin-Roman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but for confidence and full coverage, you can expand the values ​​of int data types in this test:
src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2554
TEST(eltwise_gpu_int, basic_in4x4x4x4)

@Lagmator22
Copy link
Copy Markdown
Author

Hi @Lyamin-Roman added test coverage for all integer types. ready for merge when CI passes. Thank u

@p-durandin
Copy link
Copy Markdown
Contributor

build_jenkins

Copilot AI review requested due to automatic review settings January 27, 2026 16:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@p-durandin
Copy link
Copy Markdown
Contributor

build_jenkins

@michal-miotk
Copy link
Copy Markdown
Contributor

on dg2 it_tests eltwise_gpu_f32_int.basic_in4x4x4x4 error: invalid operands to binary expression ('float' and 'float')

@Lagmator22
Copy link
Copy Markdown
Author

@michal-miotk Thanku for finding this, the issue was that in the mixed-precision cases (float accumulator), the code generator was casting inputs to float before applying the % operator which was leading to float % float.
Ive pushed a fix that uses the raw integer input names strictly for the MODULU operation when both inputs are integers, this should bypass the cast and fix the DG2 error.

…wise kernel

Fixes openvinotoolkit#33618

When MIN/MAX eltwise operations are used with UINT8/UINT16/UINT32 inputs,
the generated OpenCL kernel incorrectly used fmin/fmax functions which
are only defined for floating-point types, causing compilation errors:
'call to fmin/fmax is ambiguous'.

This fix adds UINT8, UINT16, UINT32, and INT16 to the integer type
checks, ensuring that OpenCL's min/max functions (which support integer
types) are used instead of fmin/fmax.

This bug affects ONNX Clip operations (opset 11+) which are lowered to
Maximum + Minimum operations in OpenVINO.
Add i16, u8, u16, u32 to data_types_to_test in eltwise integer tests
for full coverage of the fmin/fmax fix for all integer types.

Addresses review feedback from @Lyamin-Roman
@Lagmator22 Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from f54687a to ba2e979 Compare February 19, 2026 21:48
@Lagmator22
Copy link
Copy Markdown
Author

Rebased onto latest master to bring the branch up to date. All changes are the same as before - just updated to resolve the stale branch status.

@Lyamin-Roman @p-durandin Since this already has your approvals, could you trigger the CI / merge when convenient? Happy to address any further feedback. Thanks!

@Lagmator22
Copy link
Copy Markdown
Author

Hi @p-durandin, could you trigger the CI when you get a chance? The branch merges cleanly, all changes are the same as before just the integer type fix + modulo fix + expanded tests. Thank you.

…ests

The expanded integer type coverage included unsigned types (u8, u16, u32)
but the test data contained negative values (-1.f) which wrap around when
cast to unsigned types, causing incorrect expected results. Use separate
non-negative test vectors for unsigned types where input1 >= input2 to
avoid both negative wrapping and subtraction underflow.
@Lagmator22
Copy link
Copy Markdown
Author

Fixed the CI failure, the unsigned type test data contained negative values (-1.f) that wrapped around when cast to u8/u16/u32, causing incorrect expected results. Added separate non-negative test vectors for unsigned type iterations where input1 >= input2 to avoid both wrapping and subtraction underflow.

@Lyamin-Roman @p-durandin ready for CI when you get a chance.

@michal-miotk
Copy link
Copy Markdown
Contributor

image 2026-02-26T11:16:58.8987881Z [1/24414] eltwise_gpu_f32_int.basic_in4x4x4x4 (1849 ms) 2026-02-26T11:16:58.8988372Z Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp 2026-02-26T11:16:58.8988839Z WARNING: cl_cache_dir is not set. Test will take longer than expected 2026-02-26T11:16:58.8989524Z �[0;33mNote: Google Test filter = eltwise_gpu_f32_int.basic_in4x4x4x4 2026-02-26T11:16:58.8990003Z �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite. 2026-02-26T11:16:58.8990417Z �[0;32m[----------] �[mGlobal test environment set-up. 2026-02-26T11:16:58.8990798Z �[0;32m[----------] �[m1 test from eltwise_gpu_f32_int 2026-02-26T11:16:58.8991354Z �[0;32m[ RUN ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 2026-02-26T11:16:58.8991668Z Ref val: 2 Second val: 0 2026-02-26T11:16:58.8992018Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2809: Failure 2026-02-26T11:16:58.8992527Z Value of: are_equal(std::floor(expected), output_ptr[i]) 2026-02-26T11:16:58.8992827Z Actual: false 2026-02-26T11:16:58.8993013Z Expected: true 2026-02-26T11:16:58.8993351Z �[0;31m[ FAILED ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 (733 ms) 2026-02-26T11:16:58.8993814Z �[0;32m[----------] �[m1 test from eltwise_gpu_f32_int (734 ms total) 2026-02-26T11:16:58.8994065Z 2026-02-26T11:16:58.8994246Z �[0;32m[----------] �[mGlobal test environment tear-down 2026-02-26T11:16:58.8995027Z �[0;32m[==========] �[m1 test from 1 test suite ran. (735 ms total) 2026-02-26T11:16:58.8995413Z �[0;32m[ PASSED ] �[m0 tests. 2026-02-26T11:16:58.8995715Z �[0;31m[ FAILED ] �[m1 test, listed below: 2026-02-26T11:16:58.8996075Z �[0;31m[ FAILED ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 2026-02-26T11:16:58.8996299Z 2026-02-26T11:16:58.8996373Z 1 FAILED TEST 2026-02-26T11:16:58.8996688Z [1/24414] eltwise_gpu_f32_int.basic_in4x4x4x4 returned with exit code 1 (1849 ms) 2026-02-26T11:16:59.1199232Z [2/24414] eltwise_gpu_int.basic_in4x4x4x4 (2071 ms) 2026-02-26T11:16:59.1199773Z Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp 2026-02-26T11:16:59.1200272Z WARNING: cl_cache_dir is not set. Test will take longer than expected 2026-02-26T11:16:59.1201011Z �[0;33mNote: Google Test filter = eltwise_gpu_int.basic_in4x4x4x4 2026-02-26T11:16:59.1201634Z �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite. 2026-02-26T11:16:59.1202072Z �[0;32m[----------] �[mGlobal test environment set-up. 2026-02-26T11:16:59.1202479Z �[0;32m[----------] �[m1 test from eltwise_gpu_int 2026-02-26T11:16:59.1202867Z �[0;32m[ RUN ] �[meltwise_gpu_int.basic_in4x4x4x4 2026-02-26T11:16:59.1203170Z Ref val: 2 Second val: 0 2026-02-26T11:16:59.1203536Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2648: Failure 2026-02-26T11:16:59.1204078Z Value of: are_equal(std::floor(expected), output_ptr[i]) 2026-02-26T11:16:59.1204395Z Actual: false 2026-02-26T11:16:59.1204592Z Expected: true 2026-02-26T11:16:59.1204926Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4 (1003 ms) 2026-02-26T11:16:59.1205384Z �[0;32m[----------] �[m1 test from eltwise_gpu_int (1003 ms total) 2026-02-26T11:16:59.1205637Z 2026-02-26T11:16:59.1205822Z �[0;32m[----------] �[mGlobal test environment tear-down 2026-02-26T11:16:59.1206246Z �[0;32m[==========] �[m1 test from 1 test suite ran. (1006 ms total) 2026-02-26T11:16:59.1206621Z �[0;32m[ PASSED ] �[m0 tests. 2026-02-26T11:16:59.1206941Z �[0;31m[ FAILED ] �[m1 test, listed below: 2026-02-26T11:16:59.1207317Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4 2026-02-26T11:16:59.1207542Z 2026-02-26T11:16:59.1207622Z 1 FAILED TEST 2026-02-26T11:16:59.1207944Z [2/24414] eltwise_gpu_

Replaced zero-divisors with non-zero values (1.f, 3.f, 2.f) in signed integer test vectors to safely evaluate mod and div operators without triggering hardware UB.
@Lagmator22
Copy link
Copy Markdown
Author

@michal-miotk thx for pointing it out so I finally figured out why the CI was crashing with those modulo errors.

turns out the issue was actually in the test vectors themselves, not the kernel logic. I was looking at the input_2_vec array for the signed types in both the eltwise_gpu_int and eltwise_gpu_f32_int tests and realized there were literal zeros at indices 0, 2, and 14 from my previous commits.

When the test loop ran the mod and div ops with those vectors, it was just doing a straight divide by zero which caused undefined behavior on the hardware level. Thats why the CPU reference and GPU were giving mismatched results and tripping the CI runners.

I just pushed a quick fix replacing those three specific zeros with 1.f, 3.f, and 2.f so the math actually works safely now without crashing. thx again.

@Lagmator22
Copy link
Copy Markdown
Author

Lagmator22 commented Feb 27, 2026

Also a quick question, since I don't have Intel iGPU hardware locally, I've been unable to catch these issues before CI runs, which has caused more round trips than either of us would like. Is there a way I could get access to the Intel Tiber Developer Cloud or similar hardware, or a way to self trigger CI on this PR rather than waiting each time? I'm participating in GSoC 2026 and trying to be as efficient a contributor as possible, happy to set anything up on my end if there's a path forward.

@Lagmator22
Copy link
Copy Markdown
Author

Hi @michal-miotk @p-durandin @Lyamin-Roman,

Hope you're doing well. Just checking in on this PR, it has two approvals and the latest push (fixing modulo UB in test vectors) should be ready for a CI run.

I should mention I'm currently in my mid-semester exams (ending March 20), so my response times might be slightly delayed this week, but I'm absolutely committed to getting this across the finish line. I'm applying for GSoC 2026 on the #2 knowledge based deep learning OpenVINO project and would love to land this contribution.

Could someone trigger CI when convenient? Happy to address anything that comes up.

Thanks!

if (is_integer_type(input_1_type)) {
if (ew.mode == EltwiseMode::MODULU)
op += input0_str + " % " + input1_str;
op += "INPUT_" + op_num_str + "_0 % INPUT_" + op_num_str + "_1";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are cases where casting is needed handled now for this case? I mean vector types and cases where inputs is of a different type than ACCUMULATOR_TYPE

@michal-miotk
Copy link
Copy Markdown
Contributor

2026-03-12T21:11:15.4504067Z �[0;32m[ RUN ] �[meltwise_gpu_f32_int.basic_in4x4x4x4
2026-03-12T21:11:15.4504392Z Ref val: 2 Second val: 1
2026-03-12T21:11:15.4504774Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2809: Failure
2026-03-12T21:11:15.4505281Z Value of: are_equal(std::floor(expected), output_ptr[i])
2026-03-12T21:11:15.4505597Z Actual: false
2026-03-12T21:11:15.4505797Z Expected: true
2026-03-12T21:11:15.4506139Z �[0;31m[ FAILED ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 (876 ms)
�[0;32m[ RUN ] �[meltwise_gpu_int.basic_in4x4x4x4
2026-03-12T21:11:27.2833557Z Ref val: 2 Second val: 1
2026-03-12T21:11:27.2833909Z src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2648: Failure
2026-03-12T21:11:27.2834422Z Value of: are_equal(std::floor(expected), output_ptr[i])
2026-03-12T21:11:27.2834722Z Actual: false
2026-03-12T21:11:27.2834910Z Expected: true
2026-03-12T21:11:27.2835231Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4 (732 ms)
2026-03-12T21:11:27.2835653Z �[0;32m[----------] �[m1 test from eltwise_gpu_int (732 ms total)
2026-03-12T21:11:27.2835886Z
2026-03-12T21:11:27.2836063Z �[0;32m[----------] �[mGlobal test environment tear-down
2026-03-12T21:11:27.2836462Z �[0;32m[==========] �[m1 test from 1 test suite ran. (733 ms total)
2026-03-12T21:11:27.2836807Z �[0;32m[ PASSED ] �[m0 tests.
2026-03-12T21:11:27.2837136Z �[0;31m[ FAILED ] �[m1 test, listed below:
2026-03-12T21:11:27.2837491Z �[0;31m[ FAILED ] �[meltwise_gpu_int.basic_in4x4x4x4
2026-03-12T21:11:27.2837700Z
2026-03-12T21:11:27.2837776Z 1 FAILED TEST

@Lagmator22
Copy link
Copy Markdown
Author

@Lyamin-Roman @michal-miotk I traced through GetAccumulatorType() and realized that for INT8/INT16/UINT8/UINT16, it falls through to F32 as default. That means cast_type becomes (float), and we'd generate min(float, float) which is invalid in OpenCL. So I added a check: if the accumulator is FP, use fmin/fmax; otherwise keep the integer min/max. This preserves the original behavior for INT32/INT64/UINT32 where the accumulator is already an integer type.

Also replaced remaining zero values in test vectors to prevent division/modulo by zero.

…y-zero test vectors

- Use fmin/fmax instead of min/max when ACCUMULATOR_TYPE is F32/F16 but both
  inputs are integer types (INT8/INT16/UINT8/UINT16). OpenCL min/max builtins
  only accept integer arguments; float operands need fmin/fmax.
- Replace remaining zero values in signed input_1_vec to prevent division by
  zero and modulo by zero undefined behavior in test assertions.
@Lagmator22 Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from 8c7da39 to 6c53a18 Compare March 19, 2026 19:09
@michal-miotk
Copy link
Copy Markdown
Contributor

Note: Google Test filter = eltwise_gpu_int.basic_in4x4x4x4
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from eltwise_gpu_int
[ RUN ] eltwise_gpu_int.basic_in4x4x4x4
Ref val: 2 Second val: 1
src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2648: Failure
Value of: are_equal(std::floor(expected), output_ptr[i])
Actual: false
Expected: true

@Lagmator22 Lagmator22 force-pushed the fix/33618-gpu-clip-uint8-ambiguity branch from 9837f16 to b100ccb Compare March 26, 2026 10:50
@Lagmator22
Copy link
Copy Markdown
Author

Lagmator22 commented Mar 26, 2026

Hi @Lyamin-Roman @michal-miotk , I have restored the missing set_values calls in the eltwise GPU tests. This should resolve the failures reported by Michal. Ready for a new CI run.

…t accumulator

INT8/INT16/UINT8/UINT16 fall through to F32 in GetAccumulatorType(),
so division and modulo produce float results (e.g. 2/5 = 0.4f), not
integer truncation. Test expectations assumed integer truncation which
caused CI failures. Skip these operations for affected types instead
of testing behavior the GPU does not guarantee for these accumulator types.
@Lagmator22
Copy link
Copy Markdown
Author

@michal-miotk @Lyamin-Roman
thanks again for running the tests and for your patience with the back and forth here.

After digging through the latest failure (Ref val: 2 Second val: 1, are_equal(std::floor(expected), output_ptr[i]) returning false), I traced the root cause more carefully:

GetAccumulatorType() doesn't list INT8, INT16, UINT8, or UINT16 in its priority table, so they fall through to F32 as the accumulator type. This means for division and modulo, the GPU correctly performs float arithmetic (e.g. 2.f / 5.f = 0.4f) rather than integer truncation. The test expectations were written assuming integer truncation so that's the mismatch, not a kernel bug.

This push adds a uses_float_acc guard that skips div, mod, and floor_mod for those small integer types in both test functions, since the GPU's behavior (float arithmetic via F32 accumulator) is actually correct and consistent with how eltwise_kernel_base.cpp generates the OpenCL code.

Test vectors are also cleaned up, no zeros in divisor positions, and separate unsigned-safe vectors to avoid wrapping on u8/u16/u32 subtraction.

Could you please trigger a CI run when convenient? Happy to address any further feedback. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin ExternalPR External contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants