Skip to content

RMS_NORM dispatcher unconditionally calls AVX-512 kernel, SIGILL on Zen3 #23

@bghimireamd

Description

@bghimireamd

Description

normalization_kernel_wrapper() in zendnnl/src/lowoha_operators/normalization/lowoha_normalization.cpp unconditionally dispatches RMS_NORM and FUSED_ADD_RMS_NORM to rms_norm_avx512() without checking whether the CPU supports AVX-512. This causes SIGILL (Illegal Instruction, exit code 132) on Zen 3 and earlier processors that only support AVX2.

Reproduction

CPU: AMD EPYC 7313 16-Core (Zen 3 / Milan) — AVX2 only, no AVX-512

$ OMP_NUM_THREADS=4 ./build/install/examples/bin/examples
...
Thread 1 "examples" received signal SIGILL, Illegal instruction.
0x... in zendnnl::lowoha::normalization::rms_norm_avx512(...) ()
#0  rms_norm_avx512()
#1  normalization_kernel_wrapper()
#2  normalization_direct()
#3  run_lowoha_rms_norm_fp32_example()
#4  main()

The crash originates from run_lowoha_rms_norm_fp32_example() in the examples binary. Any user code calling normalization_direct() with norm_type_t::RMS_NORM will also crash on non-AVX-512 hardware.

Root cause

In lowoha_normalization.cpp:37-46:

if (params.norm_type == norm_type_t::RMS_NORM ||
    params.norm_type == norm_type_t::FUSED_ADD_RMS_NORM) {
  log_info("Using AVX512 kernel for ", norm_type_to_str(params.norm_type));
  status_t status = rms_norm_avx512(input, output, residual, gamma, params);
  // ...
  return status;
}

No get_avx512f_status() check before calling the AVX-512 path. Other operators (e.g., matmul) correctly guard ISA-specific paths — see lowoha_matmul.cpp:351 which checks get_f16_status() before F16 dispatch.

LayerNorm and BatchNorm are unaffected because they take the else branch to normalization_reference_wrapper(), which is a portable reference implementation. RMS_NORM has no fallback path.

Suggested fix

Add the platform check and fall through to the existing reference kernel:

if (params.norm_type == norm_type_t::RMS_NORM ||
    params.norm_type == norm_type_t::FUSED_ADD_RMS_NORM) {
  if (zendnnl_platform_info().get_avx512f_status()) {
    log_info("Using AVX512 kernel for ", norm_type_to_str(params.norm_type));
    status_t status = rms_norm_avx512(input, output, residual, gamma, params);
    if (status != status_t::success) {
      log_error(norm_type_to_str(params.norm_type), " kernel failed");
    }
    return status;
  }
  log_info("AVX512 not available, using reference kernel for ",
           norm_type_to_str(params.norm_type));
}

Requires adding #include "common/zendnnl_global.hpp" for zendnnl_platform_info().

Impact

  • Severity: Any call to LOWOHA RMS_NORM or FusedAddRMSNorm crashes on Zen 3 and earlier
  • Workaround: None without code change — the only code path for RMS_NORM is the AVX-512 kernel
  • Scope: Affects all users running on non-AVX-512 AMD CPUs (Zen 1/2/3, EPYC 7001/7002/7003 series)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions