Disable MathDx GEMM for tiled kernel launches by adenzler-nvidia · Pull Request #1194 · google-deepmind/mujoco_warp

adenzler-nvidia · 2026-02-27T17:19:27Z

Summary

Add scoped_mathdx_gemm_disabled context manager to temporarily set wp.config.enable_mathdx_gemm = False during wp.launch_tiled calls that use tile_matmul/tile_cholesky
Guard tiled launches in solver.py (JTDAJ sparse/dense), derivative.py (qderiv dense), and clean up duplicate imports from a prior merge
These kernels don't benefit from MathDx GEMM but pay the full compilation cost; disabling it avoids unnecessary JIT overhead
The context manager is a no-op on warp versions that don't expose enable_mathdx_gemm (< 1.13.0)

Benchmark results

Benchmarked on full suite (10 benchmarks, --clear_warp_cache=true) comparing feature vs main, both on warp-lang==1.13.0.dev20260227.

Euler integrator (default)

Benchmark	Main JIT (s)	Feature JIT (s)	JIT Delta	Main steps/sec	Feature steps/sec	Runtime Delta
aloha_cloth	43.2	43.0	+0%	866	886	+2%
aloha_pot	66.9	52.0	-22%	3,064,615	3,065,498	+0%
aloha_sdf	76.0	61.4	-19%	853,909	856,498	+0%
apollo_flat	51.3	37.2	-28%	4,764,393	4,711,253	-1%
apollo_hfield	78.5	64.7	-18%	2,923,074	2,931,270	+0%
apollo_terrain	50.4	36.5	-28%	1,285,420	1,290,697	+0%
cloth	14.7	14.8	+0%	793	784	-1%
franka_panda	43.9	20.1	-54%	29,578,420	29,501,112	+0%
humanoid	34.4	20.7	-40%	5,820,047	5,850,498	+1%
three_humanoids	64.6	52.4	-19%	794,637	794,710	+0%

Implicitfast integrator (exercises derivative.py path)

Benchmark	Main JIT (s)	Feature JIT (s)	JIT Delta	Main steps/sec	Feature steps/sec	Runtime Delta
aloha_cloth	44.0	44.6	+1%	842	858	+2%
aloha_pot	91.7	51.5	-44%	2,954,617	2,998,061	+1%
aloha_sdf	100.7	60.9	-40%	816,150	820,188	+0%
apollo_flat	65.5	38.8	-41%	4,453,510	4,470,702	+0%
apollo_hfield	92.9	67.8	-27%	2,677,550	2,625,060	-2%
apollo_terrain	65.5	37.9	-42%	1,121,283	1,224,617	+9%
cloth	15.7	15.1	-4%	720	762	+6%
franka_panda	44.7	19.7	-56%	29,440,158	29,877,063	+1%
humanoid	49.4	22.0	-55%	5,221,367	5,245,457	+0%
three_humanoids	66.4	53.2	-20%	774,656	770,729	~0%*

*Averaged over 6 runs each; ranges overlap (main: 758k-791k, feature: 727k-781k). No regression confirmed.

Test plan

Verify all existing tests pass
Benchmark with --clear_warp_cache=true to confirm JIT improvement
Verify no runtime regression (steps/sec within noise)

…ds-pr1183

Remove duplicate SPARSE_CONSTRAINT_JACOBIAN imports in io.py and smooth.py introduced during merge. Update uv.lock to warp-lang 1.13.0.dev20260227 which includes enable_mathdx_gemm support.

thowell and others added 6 commits February 25, 2026 16:02

guard sparse constraint jacobian

e489767

Guard targeted tile_matmul kernels from MathDx GEMM

315e888

Merge remote-tracking branch 'upstream/main' into feature/mathdx-guar…

000cdee

…ds-pr1183

Remove duplicate imports and update warp-lang to 1.13.0.dev20260227

8193f91

Remove duplicate SPARSE_CONSTRAINT_JACOBIAN imports in io.py and smooth.py introduced during merge. Update uv.lock to warp-lang 1.13.0.dev20260227 which includes enable_mathdx_gemm support.

Merge remote-tracking branch 'upstream/main' into adenzler/mathdx-guards

7055b45

Fix ruff import sorting in smooth.py

e6ee9d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable MathDx GEMM for tiled kernel launches#1194

Disable MathDx GEMM for tiled kernel launches#1194
adenzler-nvidia wants to merge 6 commits intogoogle-deepmind:mainfrom
adenzler-nvidia:adenzler/mathdx-guards

adenzler-nvidia commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adenzler-nvidia commented Feb 27, 2026

Summary

Benchmark results

Euler integrator (default)

Implicitfast integrator (exercises derivative.py path)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants