⚡ Bolt: Optimize rebuild_padding dispatch

google-labs-jules[bot] · ZeyuChen · google-labs-jules[bot] · commit eca3aedd1d10 · 2026-02-18T15:29:50.000Z
Optimizes the `rebuild_padding` function by caching the platform-specific implementation after the first call, removing repeated import and platform check overhead in the hot path.

Detailed changes:
- Introduced `_rebuild_padding_impl` for lazy initialization.
- Added wrappers for DCU, GCU, and CPU to handle argument differences.
- Restored fallback to `ops.gpu` for unhandled platforms (like XPU) to prevent regressions.
- Benchmark shows ~5x speedup in dispatch overhead (4.75s -&gt; 0.92s for 50k calls).
- Fixed code style issues (black formatting).

Co-authored-by: ZeyuChen &lt;1371212+ZeyuChen@users.noreply.github.com&gt;
diff --git a/fastdeploy/model_executor/pre_and_post_process.py b/fastdeploy/model_executor/pre_and_post_process.py
@@ -934,11 +934,16 @@ def wrapper(
 
             _rebuild_padding_impl = rebuild_padding
         else:
+            try:
+                from fastdeploy.model_executor.ops.gpu import rebuild_padding
 
-            def raiser(*args, **kwargs):
-                raise RuntimeError("Not supported platform")
+                _rebuild_padding_impl = rebuild_padding
+            except ImportError:
 
-            _rebuild_padding_impl = raiser
+                def raiser(*args, **kwargs):
+                    raise RuntimeError("Not supported platform")
+
+                _rebuild_padding_impl = raiser
 
     return _rebuild_padding_impl(
         tmp_out,