[GPU] Fix ambiguous fmin/fmax calls for unsigned integer types in eltwise kernel#33667
[GPU] Fix ambiguous fmin/fmax calls for unsigned integer types in eltwise kernel#33667Lagmator22 wants to merge 9 commits intoopenvinotoolkit:masterfrom
Conversation
6ef2d3e to
dbbd41a
Compare
|
Hi @michal-miotk, I've rebased this PR to the latest master and cleaned up the branch (earlier had an unrelated commit(topk nan fix) mixed in, now fixed). Changes:
Thank you |
|
LGTM |
|
build_jenkins |
Lyamin-Roman
left a comment
There was a problem hiding this comment.
LGTM, but for confidence and full coverage, you can expand the values of int data types in this test:
src/plugins/intel_gpu/tests/unit/test_cases/eltwise_gpu_test.cpp:2554
TEST(eltwise_gpu_int, basic_in4x4x4x4)
|
Hi @Lyamin-Roman added test coverage for all integer types. ready for merge when CI passes. Thank u |
|
build_jenkins |
|
build_jenkins |
|
on dg2 it_tests eltwise_gpu_f32_int.basic_in4x4x4x4 error: invalid operands to binary expression ('float' and 'float') |
|
@michal-miotk Thanku for finding this, the issue was that in the mixed-precision cases (float accumulator), the code generator was casting inputs to float before applying the % operator which was leading to float % float. |
…wise kernel Fixes openvinotoolkit#33618 When MIN/MAX eltwise operations are used with UINT8/UINT16/UINT32 inputs, the generated OpenCL kernel incorrectly used fmin/fmax functions which are only defined for floating-point types, causing compilation errors: 'call to fmin/fmax is ambiguous'. This fix adds UINT8, UINT16, UINT32, and INT16 to the integer type checks, ensuring that OpenCL's min/max functions (which support integer types) are used instead of fmin/fmax. This bug affects ONNX Clip operations (opset 11+) which are lowered to Maximum + Minimum operations in OpenVINO.
Add i16, u8, u16, u32 to data_types_to_test in eltwise integer tests for full coverage of the fmin/fmax fix for all integer types. Addresses review feedback from @Lyamin-Roman
f54687a to
ba2e979
Compare
|
Rebased onto latest master to bring the branch up to date. All changes are the same as before - just updated to resolve the stale branch status. @Lyamin-Roman @p-durandin Since this already has your approvals, could you trigger the CI / merge when convenient? Happy to address any further feedback. Thanks! |
|
Hi @p-durandin, could you trigger the CI when you get a chance? The branch merges cleanly, all changes are the same as before just the integer type fix + modulo fix + expanded tests. Thank you. |
…ests The expanded integer type coverage included unsigned types (u8, u16, u32) but the test data contained negative values (-1.f) which wrap around when cast to unsigned types, causing incorrect expected results. Use separate non-negative test vectors for unsigned types where input1 >= input2 to avoid both negative wrapping and subtraction underflow.
|
Fixed the CI failure, the unsigned type test data contained negative values (-1.f) that wrapped around when cast to u8/u16/u32, causing incorrect expected results. Added separate non-negative test vectors for unsigned type iterations where input1 >= input2 to avoid both wrapping and subtraction underflow. @Lyamin-Roman @p-durandin ready for CI when you get a chance. |
Replaced zero-divisors with non-zero values (1.f, 3.f, 2.f) in signed integer test vectors to safely evaluate mod and div operators without triggering hardware UB.
|
@michal-miotk thx for pointing it out so I finally figured out why the CI was crashing with those modulo errors. turns out the issue was actually in the test vectors themselves, not the kernel logic. I was looking at the input_2_vec array for the signed types in both the eltwise_gpu_int and eltwise_gpu_f32_int tests and realized there were literal zeros at indices 0, 2, and 14 from my previous commits. When the test loop ran the mod and div ops with those vectors, it was just doing a straight divide by zero which caused undefined behavior on the hardware level. Thats why the CPU reference and GPU were giving mismatched results and tripping the CI runners. I just pushed a quick fix replacing those three specific zeros with 1.f, 3.f, and 2.f so the math actually works safely now without crashing. thx again. |
|
Also a quick question, since I don't have Intel iGPU hardware locally, I've been unable to catch these issues before CI runs, which has caused more round trips than either of us would like. Is there a way I could get access to the Intel Tiber Developer Cloud or similar hardware, or a way to self trigger CI on this PR rather than waiting each time? I'm participating in GSoC 2026 and trying to be as efficient a contributor as possible, happy to set anything up on my end if there's a path forward. |
|
Hi @michal-miotk @p-durandin @Lyamin-Roman, Hope you're doing well. Just checking in on this PR, it has two approvals and the latest push (fixing modulo UB in test vectors) should be ready for a CI run. I should mention I'm currently in my mid-semester exams (ending March 20), so my response times might be slightly delayed this week, but I'm absolutely committed to getting this across the finish line. I'm applying for GSoC 2026 on the #2 knowledge based deep learning OpenVINO project and would love to land this contribution. Could someone trigger CI when convenient? Happy to address anything that comes up. Thanks! |
| if (is_integer_type(input_1_type)) { | ||
| if (ew.mode == EltwiseMode::MODULU) | ||
| op += input0_str + " % " + input1_str; | ||
| op += "INPUT_" + op_num_str + "_0 % INPUT_" + op_num_str + "_1"; |
There was a problem hiding this comment.
How are cases where casting is needed handled now for this case? I mean vector types and cases where inputs is of a different type than ACCUMULATOR_TYPE
|
2026-03-12T21:11:15.4504067Z �[0;32m[ RUN ] �[meltwise_gpu_f32_int.basic_in4x4x4x4 |
|
@Lyamin-Roman @michal-miotk I traced through Also replaced remaining zero values in test vectors to prevent division/modulo by zero. |
…y-zero test vectors - Use fmin/fmax instead of min/max when ACCUMULATOR_TYPE is F32/F16 but both inputs are integer types (INT8/INT16/UINT8/UINT16). OpenCL min/max builtins only accept integer arguments; float operands need fmin/fmax. - Replace remaining zero values in signed input_1_vec to prevent division by zero and modulo by zero undefined behavior in test assertions.
8c7da39 to
6c53a18
Compare
…of-2 exact inverses
|
Note: Google Test filter = eltwise_gpu_int.basic_in4x4x4x4 |
9837f16 to
b100ccb
Compare
|
Hi @Lyamin-Roman @michal-miotk , I have restored the missing |
…t accumulator INT8/INT16/UINT8/UINT16 fall through to F32 in GetAccumulatorType(), so division and modulo produce float results (e.g. 2/5 = 0.4f), not integer truncation. Test expectations assumed integer truncation which caused CI failures. Skip these operations for affected types instead of testing behavior the GPU does not guarantee for these accumulator types.
|
@michal-miotk @Lyamin-Roman After digging through the latest failure (
This push adds a Test vectors are also cleaned up, no zeros in divisor positions, and separate unsigned-safe vectors to avoid wrapping on Could you please trigger a CI run when convenient? Happy to address any further feedback. Thank you! |

Details:
added the unsigned types UINT8 UINT16 UINT32 and also INT16 since that was missing.
fixes the ONNX Clip issue since Clip gets lowered to Maximum and Minimum ops.
Tickets: