[Backport 2026.1] Skip FP16 compression for constants with high absolute roundtrip error#34885
[Backport 2026.1] Skip FP16 compression for constants with high absolute roundtrip error#34885mryzhov wants to merge 2 commits intoopenvinotoolkit:releases/2026/1from
Conversation
Extend the existing scalar error check (PR openvinotoolkit#34110) to non-scalar constants. Add max absolute FP16 roundtrip error threshold (1.0) that protects constants like RoPE frequency tables where large values (>1024) lose significant precision in FP16. Out-of-range values are excluded from the check as they are already handled by the 75% threshold with clamping. This fixes catastrophic accuracy degradation in LTX-Video FP16 export (WWB similarity 0.831 -> 0.956) caused by a RoPE cosine frequency table (341 elements, max FP16 abs error 7.93) being compressed to FP16. The corrupted positional encoding compounded through 28 blocks x 50 denoising steps x 2 CFG passes. CVS-180611 (cherry picked from commit 18e681d)
There was a problem hiding this comment.
Pull request overview
Backport that improves CompressFloatConstants accuracy by skipping FP16 compression for non-scalar FP32/FP64 constants whose FP16 roundtrip introduces large absolute error, extending the earlier scalar-only relative-error safeguard (PR #34110).
Changes:
- Add an absolute FP16 roundtrip error check for non-scalar constants (threshold: 1.0) to decide whether to skip compression.
- Keep existing scalar relative-error check (1e-4) unchanged and apply it only to
numel == 1. - Add regression tests covering both skipping (high abs error) and compressing (low abs error) non-scalar constants.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/common/transformations/src/transformations/common_optimizations/compress_float_constants.cpp |
Adds non-scalar absolute FP16 roundtrip error detection and uses it to skip compression when the error is too large. |
src/common/transformations/tests/common_optimizations/compress_float_constants_test.cpp |
Adds tests verifying the new non-scalar absolute-error behavior (skip vs compress). |
| @@ -176,14 +202,22 @@ CompressFloatConstantsImpl::CompressFloatConstantsImpl(bool postponed) { | |||
|
|
|||
| auto c_type = const_node->get_element_type(); | |||
|
|
|||
| // Skip FP16 compression for scalar constants with significant rounding error. | |||
| // Scalar constants often serve as mathematical scale factors (e.g., log(16) in attention | |||
| // bucketing) where FP16 rounding error cascades through every computation that uses them. | |||
| // Skip FP16 compression for constants with significant rounding error. | |||
| // Scalar: tight relative threshold (1e-4) — protects math scale factors (e.g. log(16)). | |||
| // Non-scalar: absolute threshold (1.0) — protects frequency tables (e.g. RoPE) where | |||
| // large values (>1024) lose significant precision in FP16 and the error compounds | |||
| // through iterative computations (e.g. 50-step denoising with CFG). | |||
| if (ov::shape_size(const_node->get_shape()) == 1) { | |||
| if (c_type == ov::element::f32 && scalar_has_high_f16_error<float>(*const_node)) | |||
| return false; | |||
| if (c_type == ov::element::f64 && scalar_has_high_f16_error<double>(*const_node)) | |||
| return false; | |||
| } else { | |||
| constexpr double max_abs_error = 1.0; | |||
| if (c_type == ov::element::f32 && has_high_f16_abs_error<float>(*const_node, max_abs_error)) | |||
| return false; | |||
| if (c_type == ov::element::f64 && has_high_f16_abs_error<double>(*const_node, max_abs_error)) | |||
| return false; | |||
There was a problem hiding this comment.
[MEDIUM] has_high_f16_abs_error() adds an extra full scan over every non-scalar f32/f64 constant before the existing out-of-range scan + conversion. On x86 f32 this makes the pass do three O(N) passes (has_high_f16_abs_error + count_out_of_f16_range + convert_from_f32_to_f16_with_clamp), which can noticeably slow FP16 compression on large weight constants.
Consider folding the abs-error check into an existing loop (e.g., during conversion / range-counting) or adding a single helper that computes both “out-of-range count” and “max abs roundtrip error” in one traversal, so large constants aren’t re-scanned multiple times.
Description:
Summary
Extend the scalar FP16 error check from PR #34110 to non-scalar constants
Add max absolute roundtrip error threshold (1.0) to CompressFloatConstantsImpl
Out-of-range values are excluded from the check (handled separately by the 75% threshold with clamping)
Details
compress_float_constants.cpp already skips FP16 compression for scalar constants with high relative roundtrip error (PR #34110). However, non-scalar constants with large values (>1024) can have absolute FP16 error exceeding 1.0 due to limited mantissa resolution, and these were not checked.
This caused accuracy degradation in LTX-Video FP16 export: a RoPE cosine frequency table (341 elements, values up to ~31416, max FP16 abs error 7.93) was compressed to FP16 and then applied multiplicatively to Q/K vectors across 28 transformer blocks in a 50-step denoising loop, compounding the error. WWB similarity improved from 0.831 to 0.956 (FP32 baseline: 0.984).
The threshold of 1.0 is chosen because FP16 ULP reaches 1.0 only at value range [1024, 2048]. Normal neural network weights (typically in [-10, 10]) have max absolute error ~0.005, so no false positives are expected. Size impact is negligible — only a few small frequency/scale constants per model stay in FP32.
Tickets: