Skip to content

Disable MathDx GEMM for tiled kernel launches#1194

Open
adenzler-nvidia wants to merge 6 commits intogoogle-deepmind:mainfrom
adenzler-nvidia:adenzler/mathdx-guards
Open

Disable MathDx GEMM for tiled kernel launches#1194
adenzler-nvidia wants to merge 6 commits intogoogle-deepmind:mainfrom
adenzler-nvidia:adenzler/mathdx-guards

Conversation

@adenzler-nvidia
Copy link
Collaborator

Summary

  • Add scoped_mathdx_gemm_disabled context manager to temporarily set wp.config.enable_mathdx_gemm = False during wp.launch_tiled calls that use tile_matmul/tile_cholesky
  • Guard tiled launches in solver.py (JTDAJ sparse/dense), derivative.py (qderiv dense), and clean up duplicate imports from a prior merge
  • These kernels don't benefit from MathDx GEMM but pay the full compilation cost; disabling it avoids unnecessary JIT overhead
  • The context manager is a no-op on warp versions that don't expose enable_mathdx_gemm (< 1.13.0)

Benchmark results

Benchmarked on full suite (10 benchmarks, --clear_warp_cache=true) comparing feature vs main, both on warp-lang==1.13.0.dev20260227.

Euler integrator (default)

Benchmark Main JIT (s) Feature JIT (s) JIT Delta Main steps/sec Feature steps/sec Runtime Delta
aloha_cloth 43.2 43.0 +0% 866 886 +2%
aloha_pot 66.9 52.0 -22% 3,064,615 3,065,498 +0%
aloha_sdf 76.0 61.4 -19% 853,909 856,498 +0%
apollo_flat 51.3 37.2 -28% 4,764,393 4,711,253 -1%
apollo_hfield 78.5 64.7 -18% 2,923,074 2,931,270 +0%
apollo_terrain 50.4 36.5 -28% 1,285,420 1,290,697 +0%
cloth 14.7 14.8 +0% 793 784 -1%
franka_panda 43.9 20.1 -54% 29,578,420 29,501,112 +0%
humanoid 34.4 20.7 -40% 5,820,047 5,850,498 +1%
three_humanoids 64.6 52.4 -19% 794,637 794,710 +0%

Implicitfast integrator (exercises derivative.py path)

Benchmark Main JIT (s) Feature JIT (s) JIT Delta Main steps/sec Feature steps/sec Runtime Delta
aloha_cloth 44.0 44.6 +1% 842 858 +2%
aloha_pot 91.7 51.5 -44% 2,954,617 2,998,061 +1%
aloha_sdf 100.7 60.9 -40% 816,150 820,188 +0%
apollo_flat 65.5 38.8 -41% 4,453,510 4,470,702 +0%
apollo_hfield 92.9 67.8 -27% 2,677,550 2,625,060 -2%
apollo_terrain 65.5 37.9 -42% 1,121,283 1,224,617 +9%
cloth 15.7 15.1 -4% 720 762 +6%
franka_panda 44.7 19.7 -56% 29,440,158 29,877,063 +1%
humanoid 49.4 22.0 -55% 5,221,367 5,245,457 +0%
three_humanoids 66.4 53.2 -20% 774,656 770,729 ~0%*

*Averaged over 6 runs each; ranges overlap (main: 758k-791k, feature: 727k-781k). No regression confirmed.

Test plan

  • Verify all existing tests pass
  • Benchmark with --clear_warp_cache=true to confirm JIT improvement
  • Verify no runtime regression (steps/sec within noise)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants