Commit eca3aed
⚡ Bolt: Optimize rebuild_padding dispatch
Optimizes the `rebuild_padding` function by caching the platform-specific implementation after the first call, removing repeated import and platform check overhead in the hot path.
Detailed changes:
- Introduced `_rebuild_padding_impl` for lazy initialization.
- Added wrappers for DCU, GCU, and CPU to handle argument differences.
- Restored fallback to `ops.gpu` for unhandled platforms (like XPU) to prevent regressions.
- Benchmark shows ~5x speedup in dispatch overhead (4.75s -> 0.92s for 50k calls).
- Fixed code style issues (black formatting).
Co-authored-by: ZeyuChen <1371212+ZeyuChen@users.noreply.github.com>1 parent 852a357 commit eca3aed
1 file changed
+8
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
934 | 934 | | |
935 | 935 | | |
936 | 936 | | |
| 937 | + | |
| 938 | + | |
937 | 939 | | |
938 | | - | |
939 | | - | |
| 940 | + | |
| 941 | + | |
940 | 942 | | |
941 | | - | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
942 | 947 | | |
943 | 948 | | |
944 | 949 | | |
| |||
0 commit comments