Add profilingSampleCount to decouple profiler sampling source from opCount (#2067) by mlunar-meta · Pull Request #2067 · meta-pytorch/torchcomms

mlunar-meta · 2026-04-14T06:18:10Z

Summary:

Add a profilingSampleCount_ pointer on CtranComm that allows callers to control which counter the profiler uses for sampling decisions. When set, the profiler samples based on the pointed-to counter instead of opCount. When null (default), existing behavior is preserved — opCount is used.

MCCL uses this to sample based on a new collectiveCount_ counter (incremented per allReduce call) rather than opCount, enabling independent sampling control.

Changes:

Add uint64_t* profilingSampleCount_ to CtranComm (nullptr = use opCount)
Add uint64_t collectiveCount_ to McclComm, incremented in allReduce
Wire profilingSampleCount_ to collectiveCount_ in McclComm::finishInit
Profiler::initForEachColl uses profilingSampleCount when set, else opCount
Add AUTODEPS fix for profiler_test BUCK (ncclx-cvars dep)
Add 2 test cases: override via profilingSampleCount, nullptr fallback to opCount

Differential Revision: D100730174

meta-codesync · 2026-04-14T06:18:18Z

@mlunar-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100730174.

…Count (meta-pytorch#2067) Summary: Add a `profilingSampleCount_` pointer on `CtranComm` that allows callers to control which counter the profiler uses for sampling decisions. When set, the profiler samples based on the pointed-to counter instead of opCount. When null (default), existing behavior is preserved — opCount is used. MCCL uses this to sample based on a new `collectiveCount_` counter (incremented per allReduce call) rather than opCount, enabling independent sampling control. Changes: - Add `uint64_t* profilingSampleCount_` to CtranComm (nullptr = use opCount) - Add `uint64_t collectiveCount_` to McclComm, incremented in allReduce - Wire `profilingSampleCount_` to `collectiveCount_` in McclComm::finishInit - Profiler::initForEachColl uses profilingSampleCount when set, else opCount - Add AUTODEPS fix for profiler_test BUCK (ncclx-cvars dep) - Add 2 test cases: override via profilingSampleCount, nullptr fallback to opCount Differential Revision: D100730174

…Count (meta-pytorch#2067) Summary: Pull Request resolved: meta-pytorch#2067 Add a `profilingSampleCount_` pointer on `CtranComm` that allows callers to control which counter the profiler uses for sampling decisions. When set, the profiler samples based on the pointed-to counter instead of opCount. When null (default), existing behavior is preserved — opCount is used. MCCL uses this to sample based on a new `collectiveCount_` counter (incremented per allReduce call) rather than opCount, enabling independent sampling control. Changes: - Add `uint64_t* profilingSampleCount_` to CtranComm (nullptr = use opCount) - Add `uint64_t collectiveCount_` to McclComm, incremented in allReduce - Wire `profilingSampleCount_` to `collectiveCount_` in McclComm::finishInit - Profiler::initForEachColl uses profilingSampleCount when set, else opCount - Add AUTODEPS fix for profiler_test BUCK (ncclx-cvars dep) - Add 2 test cases: override via profilingSampleCount, nullptr fallback to opCount Differential Revision: D100730174

Summary: Introduce a SamplingRegistry class in ctran/profiler/ that encapsulates the sampling decision (shouldTrace based on opCount % samplingWeight). The Profiler now reads the sampling weight from ctranConfig at construction instead of receiving it as a parameter at every initForEachColl() call. This decouples the profiler sampling from the hardcoded NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT CVAR, allowing MCCL to control its sampling rate independently via a new MCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT CVAR set on ctranConfig before ctranInit(). Changes: - Add SamplingRegistry class (SamplingRegistry.h/.cc) with shouldTrace(opCount) method - Add profilingSamplingWeight field to ctranConfig (default: NCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT) - Add MCCL_CTRAN_ALGO_PROFILING_SAMPLING_WEIGHT CVAR (int, default: 1) - Profiler::initForEachColl() no longer takes samplingWeight parameter - Update all 7 algorithm call sites to use simplified initForEachColl(opCount) - MCCL sets profilingSamplingWeight on ctranConfig before ctranInit() - Add SamplingRegistryTest with 5 test cases - Update ProfilerTest to construct Profiler with config-based sampling weight Differential Revision: D100706980

…Count (meta-pytorch#2067) Summary: Add a `profilingSampleCount_` pointer on `CtranComm` that allows callers to control which counter the profiler uses for sampling decisions. When set, the profiler samples based on the pointed-to counter instead of opCount. When null (default), existing behavior is preserved — opCount is used. MCCL uses this to sample based on a new `collectiveCount_` counter (incremented per allReduce call) rather than opCount, enabling independent sampling control. Changes: - Add `uint64_t* profilingSampleCount_` to CtranComm (nullptr = use opCount) - Add `uint64_t collectiveCount_` to McclComm, incremented in allReduce - Wire `profilingSampleCount_` to `collectiveCount_` in McclComm::finishInit - Profiler::initForEachColl uses profilingSampleCount when set, else opCount - Add AUTODEPS fix for profiler_test BUCK (ncclx-cvars dep) - Add 2 test cases: override via profilingSampleCount, nullptr fallback to opCount Differential Revision: D100730174

…Count (meta-pytorch#2067) Summary: Pull Request resolved: meta-pytorch#2067 Add a `profilingSampleCount_` pointer on `CtranComm` that allows callers to control which counter the profiler uses for sampling decisions. When set, the profiler samples based on the pointed-to counter instead of opCount. When null (default), existing behavior is preserved — opCount is used. MCCL uses this to sample based on a new `collectiveCount_` counter (incremented per allReduce call) rather than opCount, enabling independent sampling control. Changes: - Add `uint64_t* profilingSampleCount_` to CtranComm (nullptr = use opCount) - Add `uint64_t collectiveCount_` to McclComm, incremented in allReduce - Wire `profilingSampleCount_` to `collectiveCount_` in McclComm::finishInit - Profiler::initForEachColl uses profilingSampleCount when set, else opCount - Add AUTODEPS fix for profiler_test BUCK (ncclx-cvars dep) - Add 2 test cases: override via profilingSampleCount, nullptr fallback to opCount Differential Revision: D100730174

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 14, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 14, 2026

meta-codesync bot changed the title ~~Add profilingSampleCount to decouple profiler sampling source from opCount~~ Add profilingSampleCount to decouple profiler sampling source from opCount (#2067) Apr 14, 2026

mlunar-meta force-pushed the export-D100730174 branch from be63eea to 270ec30 Compare April 14, 2026 06:26

mlunar-meta force-pushed the export-D100730174 branch from 270ec30 to 77bf57d Compare April 14, 2026 06:30

mlunar-meta force-pushed the export-D100730174 branch from 77bf57d to 6a70f4d Compare April 14, 2026 22:01

mlunar-meta force-pushed the export-D100730174 branch from 6a70f4d to 236824f Compare April 14, 2026 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add profilingSampleCount to decouple profiler sampling source from opCount (#2067)#2067

Add profilingSampleCount to decouple profiler sampling source from opCount (#2067)#2067
mlunar-meta wants to merge 2 commits intometa-pytorch:mainfrom
mlunar-meta:export-D100730174

mlunar-meta commented Apr 14, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mlunar-meta commented Apr 14, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mlunar-meta commented Apr 14, 2026 •

edited by meta-codesync bot

Loading