Skip to content

--verify-numerics is slow for large conv configs #2841

@rkayaith

Description

@rkayaith

--verify-numerics is slow for large conv configs, with a handful of configs dominating total runtime (slack thread, initial report).

Sorted timings for the worst offenders (conv only, all NHWC):

Time Command
555s convfp16 -n 32 -c 256 -H 100 -W 100 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1
141s convfp16 -n 32 -c 256 -H 50 -W 50 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1
141s convbfp16 -n 16 -c 768 -H 48 -W 32 -k 2048 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -g 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -t 1 -b 0 -F 4
124s convfp16 -n 32 -c 256 -H 100 -W 100 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 1 -t 1
122s convbfp16 -n 16 -c 768 -H 48 -W 32 -k 2048 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 100
66s convfp16 -n 32 -c 256 -H 100 -W 100 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 2 -t 1
66s convbfp16 -n 16 -c 2 -H 450 -W 450 -k 2 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -g 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -t 1 -b 0 -F 4
54s convbfp16 -n 16 -c 2048 -H 48 -W 32 -k 2048 -y 3 -x 1 -p 1 -q 0 -u 3 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 --in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 100
43s convbfp16 -n 16 -c 2048 -H 48 -W 32 -k 2048 -y 3 -x 1 -p 1 -q 0 -u 3 -v 1 -l 1 -j 1 -g 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -t 1 -b 1 -F 4
38s convfp16 -n 32 -c 256 -H 25 -W 25 -k 2376 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1
36s convfp16 -n 32 -c 256 -H 100 -W 100 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 --in_layout NHWC --fil_layout NHWC --out_layout NHWC -m conv -g 1 -F 4 -t 1

The top 5 configs alone account for ~1083s (~18 min). The common pattern is large output channel count (k=2376, k=2048) combined with non-trivial spatial dims — likely dominated by the CPU reference computation.

Total config counts: 1071 proxy, 591 prod conv, 320 gemm, 33 batch norm.

The --verify-numerics flag is in iree-turbine/.../boo/driver/driver.py, which calls into numerics.py and the per-op numerics runners (e.g., conv_exports/numerics_runner.py).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions