Skip to content

Commit 16beb34

Browse files
authored
[feature] arm: speed up exp_ps floor step on aarch64 (#6657)
Summary: Use vrndmq_f32 for floor computation in exp_ps on aarch64 while keeping the legacy fallback path for non-aarch64 targets. This reduces the exp_ps hot-path cost on ARM without changing approximation behavior.
1 parent e366f48 commit 16beb34

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

src/layer/arm/neon_mathfun.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,13 +141,17 @@ static inline float32x4_t exp_ps(float32x4_t x)
141141
fx = VFMAQ_F32(vdupq_n_f32(0.5f), x, vdupq_n_f32(c_cephes_LOG2EF));
142142

143143
/* perform a floorf */
144+
#if defined(__aarch64__)
145+
fx = vrndmq_f32(fx);
146+
#else
144147
tmp = vcvtq_f32_s32(vcvtq_s32_f32(fx));
145148

146149
/* if greater, substract 1 */
147150
uint32x4_t mask = vcgtq_f32(tmp, fx);
148151
mask = vandq_u32(mask, vreinterpretq_u32_f32(one));
149152

150153
fx = vsubq_f32(tmp, vreinterpretq_f32_u32(mask));
154+
#endif
151155

152156
tmp = vmulq_f32(fx, vdupq_n_f32(c_cephes_exp_C1));
153157
float32x4_t z = vmulq_f32(fx, vdupq_n_f32(c_cephes_exp_C2));

0 commit comments

Comments
 (0)