Skip to content

Add SamplingRegistry to decouple profiler sampling from CVAR (#2063)#2063

Open
mlunar-meta wants to merge 1 commit intometa-pytorch:mainfrom
mlunar-meta:export-D100706980
Open

Add SamplingRegistry to decouple profiler sampling from CVAR (#2063)#2063
mlunar-meta wants to merge 1 commit intometa-pytorch:mainfrom
mlunar-meta:export-D100706980

Conversation

@mlunar-meta
Copy link
Copy Markdown
Contributor

@mlunar-meta mlunar-meta commented Apr 14, 2026

Summary:

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.

Changes:

  • Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
  • Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
  • Profiler::initForEachColl() no longer takes samplingWeight parameter
  • Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
  • MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
  • Add SamplingRegistryTest with 5 test cases
  • Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 14, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 14, 2026

@mlunar-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100706980.

mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.

This decouples the profiler sampling from the hardcoded NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT CVAR, allowing MCCL to control its sampling rate independently via a new MCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT CVAR set on ctranConfig before ctranInit().

Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit()
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
@meta-codesync meta-codesync bot changed the title Add SamplingRegistry to decouple profiler sampling from CVAR Add SamplingRegistry to decouple profiler sampling from CVAR (#2063) Apr 14, 2026
mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.



Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.



Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:
Pull Request resolved: meta-pytorch#2063

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.

Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
@mlunar-meta mlunar-meta force-pushed the export-D100706980 branch 2 times, most recently from 104f655 to 5272a76 Compare April 14, 2026 16:01
mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.



Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:
Pull Request resolved: meta-pytorch#2063

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.

Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
@mlunar-meta mlunar-meta force-pushed the export-D100706980 branch 2 times, most recently from b14f3ad to b55633d Compare April 14, 2026 22:00
mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.



Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
mlunar-meta added a commit to mlunar-meta/torchcomms that referenced this pull request Apr 14, 2026
…torch#2063)

Summary:

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.



Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
…torch#2063)

Summary:
Pull Request resolved: meta-pytorch#2063

Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call.

Changes:
- Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method
- Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT)
- Profiler::initForEachColl() no longer takes samplingWeight parameter
- Update all 7 algorithm call sites to use simplified initForEachColl(opCount)
- MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() [See D100730174]
- Add SamplingRegistryTest with 5 test cases
- Update ProfilerTest to construct Profiler with config-based sampling weight

Differential Revision: D100706980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant