Skip to content

torchcomms: add UCC+MPI backends + benchmarking#2052

Draft
d4l3k wants to merge 1 commit intomainfrom
d4l3k/ucc
Draft

torchcomms: add UCC+MPI backends + benchmarking#2052
d4l3k wants to merge 1 commit intomainfrom
d4l3k/ucc

Conversation

@d4l3k
Copy link
Copy Markdown
Member

@d4l3k d4l3k commented Apr 13, 2026

This adds a UCC and MPI backends for torchcomms as a prototype. These backends were tested on single node with tcp and shm and compared against the Gloo TCP backend. This is prototyping switching off of gloo to a more widely maintained backend.

Key considerations:

  • Gloo supports Windows/Mac but UCC does not and MPI is a lot more fragmented.
  • UCC is significantly faster than Gloo and MPI on both TCP and shm (lower latency and bandwidth).
  • shm can be very important for inference
  • UCC provides rdma/efa/cuda backends which could make it easier to use PyTorch for non-standard use cases

See the full report at https://github.com/meta-pytorch/torchcomms/blob/d4l3k/ucc/comms/torchcomms/tests/perf/py/gloo_vs_ucc_report.md with benchmark results.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 13, 2026
@d4l3k d4l3k force-pushed the d4l3k/ucc branch 3 times, most recently from 568b37c to df691bd Compare April 13, 2026 22:53
@d4l3k d4l3k changed the title torchcomms: add UCC backend torchcomms: add UCC+MPI backends + benchmarking Apr 13, 2026
@d4l3k d4l3k force-pushed the d4l3k/ucc branch 3 times, most recently from 3aac734 to 5813930 Compare April 13, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant