You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/articles_en/openvino-workflow/running-inference/optimize-inference/precision-control.rst
+29Lines changed: 29 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,6 +94,35 @@ capabilities).
94
94
for ``inference_precision`` attribute.
95
95
96
96
97
+
Activation Scaling
98
+
###################
99
+
100
+
Since ``f16`` has a smaller dynamic range compared to ``f32`` or ``bf16``, overflow might occur when using ``f16`` for ``inference_precision``.
101
+
To address this issue, activation scaling divides the input of linear operations such as ``MatMul`` or ``Convolution`` by an activations scale factor, ensuring the layer's output does not exceed ``f16``'s dynamic range.
102
+
103
+
The layer's output is then multiplied back by the activations scale factor to restore its original value, but overflow can occur again during this process.
104
+
Activation scaling uses :doc:`LPT (Low Precision Transformations) <../../../documentation/openvino-extensibility/openvino-plugin-library/advanced-guides/low-precision-transformations>` to delay multiplication by the scale factor as much as possible, preventing overflow.
105
+
The activations scale factor can be specified in the ``rt_info`` of the model IR or via ``ov::hint::activations_scale_factor``.
0 commit comments