-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Description
--verify-numerics is slow for large conv configs, with a handful of configs dominating total runtime (slack thread, initial report).
Sorted timings for the worst offenders (conv only, all NHWC):
| Time | Command |
|---|---|
| 555s | convfp16 -n 32 -c 256 -H 100 -W 100 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1 |
| 141s | convfp16 -n 32 -c 256 -H 50 -W 50 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1 |
| 141s | convbfp16 -n 16 -c 768 -H 48 -W 32 -k 2048 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -g 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -t 1 -b 0 -F 4 |
| 124s | convfp16 -n 32 -c 256 -H 100 -W 100 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 1 -t 1 |
| 122s | convbfp16 -n 16 -c 768 -H 48 -W 32 -k 2048 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 100 |
| 66s | convfp16 -n 32 -c 256 -H 100 -W 100 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 2 -t 1 |
| 66s | convbfp16 -n 16 -c 2 -H 450 -W 450 -k 2 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -g 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -t 1 -b 0 -F 4 |
| 54s | convbfp16 -n 16 -c 2048 -H 48 -W 32 -k 2048 -y 3 -x 1 -p 1 -q 0 -u 3 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 100 |
| 43s | convbfp16 -n 16 -c 2048 -H 48 -W 32 -k 2048 -y 3 -x 1 -p 1 -q 0 -u 3 -v 1 -l 1 -j 1 -g 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -t 1 -b 1 -F 4 |
| 38s | convfp16 -n 32 -c 256 -H 25 -W 25 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1 |
| 36s | convfp16 -n 32 -c 256 -H 100 -W 100 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1 |
The top 5 configs alone account for ~1083s (~18 min). The common pattern is large output channel count (k=2376, k=2048) combined with non-trivial spatial dims — likely dominated by the CPU reference computation.
Total config counts: 1071 proxy, 591 prod conv, 320 gemm, 33 batch norm.
The --verify-numerics flag is in iree-turbine/.../boo/driver/driver.py, which calls into numerics.py and the per-op numerics runners (e.g., conv_exports/numerics_runner.py).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels