Skip to content

Determinism support 1/N#1281

Draft
mar-yan24 wants to merge 12 commits intogoogle-deepmind:mainfrom
mar-yan24:mark/determinism1
Draft

Determinism support 1/N#1281
mar-yan24 wants to merge 12 commits intogoogle-deepmind:mainfrom
mar-yan24:mark/determinism1

Conversation

@mar-yan24
Copy link
Copy Markdown
Contributor

Add opt.deterministic flag with post-narrowphase contact sort (#562)

I was previously working on differentiation support for MJWarp but I am taking a break from that because the contacts are giving me a hard time. I can't seem to figure out how to optimize the gradient landscape while keeping good dynamics from rigid contact and coulombic friction. Thus, I have decided spending some time on this would be of more use for now lol.

Summary

This is one of several phased additions. This is a basic PR that just adds an opt-in opt.deterministic flag that sorts contacts after narrowphase by (worldid, geom0, geom1, geomcollisionid) using wp.utils.radix_sort_pairs. This fixes the most upstream source of run-to-run non-determinism on GPU: contact index permutation from atomic_add counters in narrowphase and CCD. After sorting, d.contact.* is rewritten in canonical order before any downstream kernel reads it.

Downstream state (qacc, qvel, qpos, constraint force, solver reductions) is not yet bitwise reproducible. Follow-ups needed, see Roadmap below.

Changes

  • types.py: Option.deterministic: bool (default False). Docstring notes phase 1 scope and ~5-10% overhead.
  • io.py: Wires the default in put_model, adds the field to override_model so opt.deterministic=True works from the CLI.
  • collision_driver.py: _sort_contacts() runs after _narrowphase() when the flag is set. Composite 32-bit key: ((world * ngeom + geom0) * ngeom + geom1) * gcid_max + gcid. Falls back to gcid_max = 1 on int32 overflow. Three gather-permute kernels rewrite d.contact.* from temp buffers.
  • determinism_test.py: 8 parameterized tests -> contact ordering, field bitwise equality across repeat runs, sort key monotonicity, default-false smoke check.

Test results

8/8 pass on RTX 4060 Laptop (sm_89, Ada Lovelace), Warp 1.13.0.dev20260302:

test_contact_ordering_deterministic[collision.xml, nworld=1]   PASSED
test_contact_ordering_deterministic[collision.xml, nworld=4]   PASSED
test_contact_ordering_deterministic[humanoid.xml, nworld=1]    PASSED
test_contact_ordering_deterministic[humanoid.xml, nworld=4]    PASSED
test_contact_fields_deterministic[collision.xml, nworld=1]     PASSED
test_contact_fields_deterministic[humanoid.xml, nworld=1]      PASSED
test_contacts_sorted_by_geom                                   PASSED
test_deterministic_flag_default_false                          PASSED

Coverage: contact geom arrays bitwise identical across 3 runs x 10 steps at two nworld sizes. All contact fields (dist, pos, frame, dim, worldid, geomcollisionid) bitwise identical. Sort key monotonicity verified. Default False confirmed (no cost unless opted in).

Benchmarks

I had claude help me formulate some benchmarks to see the potential overhead with this implementation. 3 trials x 500 steps, 50-step warmup, wp.synchronize() fences around the timing window.

Newton + Dense, RTX 4060 Laptop (sm_89)

model nworld nacon off (us/step) on (us/step) overhead
humanoid.xml 1 7 3,459 3,903 +12.8%
humanoid.xml 64 448 3,570 4,006 +12.2%
humanoid.xml 512 3,584 3,866 4,550 +17.7%
collision.xml 1 6 4,185 4,449 +6.3%
collision.xml 64 384 4,219 4,610 +9.3%
collision.xml 512 3,072 5,258 5,829 +10.9%

CG + Sparse, RTX 4060 Laptop (sm_89)

model nworld nacon off (us/step) on (us/step) overhead
humanoid.xml 1 7 6,445 6,858 +6.4%
humanoid.xml 64 413 7,205 7,612 +5.6%
humanoid.xml 512 3,385 7,245 7,600 +4.9%
collision.xml 1 6 4,505 4,930 +9.4%
collision.xml 64 384 5,037 5,399 +7.2%
collision.xml 512 3,072 6,342 6,833 +7.7%

All configs under 25% overhead. Worst case is +17.7% (humanoid nworld=512, Newton+Dense); actually one trial in that config hit +28.9% but had 208 ms stdev vs ~65 ms for adjacent configs. Im pretty sure that is likely thermal throttling on my crappy laptop lol.

Overhead % is roughly flat across nworld within each solver path. The bottleneck is the 17 wp.empty_like calls in _sort_contacts , not the GPU sort itself. I am planning on implementing pre-allocated scratch buffers and will fix this in a follow-up, let me know thoughts.

Roadmap

Full reproducibility obviously needs more phases:

My rough plan at the moment is to work on constraint row allocation next, this is probably what will help open up downstream effects. After that I will work on actuator moment allocation. Both of these will be done using prefix-sum.

The biggest fix later will be implementing solver reductions, i.e. cost, grad_dot, search_dot. This should make d.qacc bitwise stable and thus follows qpos and qvel as well.

This current PR does not make simulation bitwise reproducible end to end. It guarantees only that d.contact.* is stable across runs of the same input. End to end full state reproducibility will probably come after some more phases are released.

@thowell thowell self-requested a review April 14, 2026 21:59
@thowell thowell linked an issue Apr 14, 2026 that may be closed by this pull request
@thowell
Copy link
Copy Markdown
Collaborator

thowell commented Apr 15, 2026

@mar-yan24 thank you for contributing this feature to mujoco warp!

Comment thread mujoco_warp/_src/types.py Outdated
zeros out the contacts at each step)
contact_sensor_maxmatch: max number of contacts considered by contact sensor matching criteria
contacts matched after this value is exceded will be ignored
deterministic: enable deterministic contact ordering after narrowphase (opt-in, ~5-10% overhead)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets update this to remove '(opt-in, ...' and please add a todo to update this comment when more parts of the simulation pipeline have optional deterministic results

@thowell
Copy link
Copy Markdown
Collaborator

thowell commented Apr 15, 2026

@mar-yan24 fyi there is a warp draft pr for introducing determinism in warp NVIDIA/warp#1355

@erikfrey
Copy link
Copy Markdown
Collaborator

We just discussed this - it could be worth pursuing this approach in parallel to Warp's low-level support for determinism as they two different approaches may have different performance tradeoffs.

@mar-yan24
Copy link
Copy Markdown
Contributor Author

@thowell, thanks for the input! Actually I haven't kept up with Warp as closely recently so I'll take a look at the PR brought up there and see if there are similar ideas compared to what I have in my current plan.

Regarding @erikfrey's comment, I don't mind working on the rest of the determinism implementation for this PR and comparing the performance once finished. I'll probably continue working on this for the week and I'll try to finish by around a week from now for the full end-to-end implementation.

Thank you both for the info/updates!

Comment thread mujoco_warp/_src/collision_driver.py Outdated
Comment thread mujoco_warp/_src/collision_driver.py
Comment thread mujoco_warp/_src/collision_driver.py
Comment thread mujoco_warp/_src/determinism_test.py Outdated
Comment thread mujoco_warp/_src/determinism_test.py
Comment thread mujoco_warp/_src/collision_driver.py Outdated
@mar-yan24
Copy link
Copy Markdown
Contributor Author

Thanks for the review @thowell! The changes should be good to go. I am planning the next determinism steps after this PR like constraint row allocation and actuator moment allocation. Before I continue building, would you prefer that I keep extending this branch/PR so the work is all on this PR or split it up into separate requests for review. Either works for me.

@thowell
Copy link
Copy Markdown
Collaborator

thowell commented Apr 17, 2026

@mar-yan24 lets create separate prs for the next deterministic features. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Determinism

3 participants