Skip to content

[Metal] Default fast-math produces wrong f32 results #23914

@dbrll

Description

@dbrll

What happened?

Running att * mask + (-1e10) * (1 - mask) gives wrong results on the Metal backend:

metal: [0 -1E+10 ...] (actual)
llvm-cpu, cuda: [1 -1E+10 ...] (expected)

Setting compile_options.mathMode = MTLMathModeSafe in runtime/src/iree/hal/drivers/metal/executable.m solves the issue.

I ran a benchmark to evaluate the performance impact of this build option, and it appears to be negligible (within noise).

I also noticed CUDA doesn't enable fast-math for compute shaders, so I assume it's probably safe to turn it off for Metal as well.

Attached is a sample MLIR to reproduce the problem. I'll be glad to submit a patch if that helps.

fastmath_test.mlir.zip

Steps to reproduce your issue

  1. Compile the attached MLIR for both the metal-spirv and local-task backends:
iree-compile --iree-hal-target-backends=metal-spirv --iree-metal-compile-to-metallib=false ./fastmath_test.mlir -o fastmath_test_metal.vmfb

iree-compile --iree-hal-target-backends=llvm-cpu ./fastmath_test.mlir -o fastmath_test_cpu.vmfb

  1. Run them with iree-run-module:
iree-run-module --module=fastmath_test_metal.vmfb --device=metal --input="1x6x256x256xf32=1"

iree-run-module --module=fastmath_test_cpu.vmfb --device=local-task --input="1x6x256x256xf32=1"
  1. Check the first output value:
metal:

result[0]: hal.buffer_view
1x6x256x256xf32=[[[0 -1E+10

llvm-cpu:

result[0]: hal.buffer_view
1x6x256x256xf32=[[[1 -1E+10

What component(s) does this issue relate to?

Runtime

Version information

HEAD (fa12ca2)

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐞Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions