-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Labels
Description
Problem Description
There is a considerable performance regression when using Llama.cpp going from ROCM 6.4.4 to 7.2 or the ROCm nightly builds from TheRock.
Source: kyuz0/amd-strix-halo-toolboxes#45 (comment)
Llama.cpp with ROCm 6.4.4 is faster than when using 7.2, which is the worst performance regression (3x slower !!), and 7-nightlies (from TheRock), almost 2x slower than 6.4.4:
Examples:
model size params backend ngl n_ubatch fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 999 2048 1 pp512 815.27 ± 7.37
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 999 2048 1 tg128 72.97 ± 0.29
build: 8f91ca54e (7822)
rocm-7.2
model size params backend ngl n_ubatch fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 999 2048 1 pp512 545.11 ± 6.65
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 999 2048 1 tg128 73.21 ± 0.06
build: 8f91ca54e (7822)
rocm-6.4.4
model size params backend ngl n_ubatch fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 999 2048 1 pp512 1648.22 ± 20.43
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 999 2048 1 tg128 72.96 ± 0.05
build: 8f91ca54e (7822)
Full table here for many model architectures and quantizations: https://kyuz0.github.io/amd-strix-halo-toolboxes/
Operating System
Fedora 43 (6.18.3-200)
CPU
AMD Ryzen AI MAX 395+
GPU
Strix Halo gfx1151
ROCm Version
ROCm 7+
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Reactions are currently unavailable