Skip to content

Remove old collTrace fields#2312

Closed
YulunW wants to merge 5 commits intometa-pytorch:mainfrom
YulunW:export-D102736358
Closed

Remove old collTrace fields#2312
YulunW wants to merge 5 commits intometa-pytorch:mainfrom
YulunW:export-D102736358

Conversation

@YulunW
Copy link
Copy Markdown
Contributor

@YulunW YulunW commented Apr 29, 2026

Summary: Remove the old colltrace fields

Differential Revision: D102736358

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 29, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 29, 2026

@YulunW has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102736358.

yulunwang and others added 5 commits May 4, 2026 01:23
Summary:
Remove the old colltrace fallback paths that were guarded by NCCL_COLLTRACE_USE_NEW_COLLTRACE. The new colltrace is now the only path.

Changes:
- CollTraceFunc.cc: collTraceBaselineGetHandle() now always uses the new colltrace path. Delete all old event acquisition/recording functions (collTraceAquireEvent*, collTraceRecordStartEvent, collTraceRecordEndEvent, etc.) that served the old CollTraceEvent pipeline.
- CollTraceFunc.h: Remove declarations for deleted functions. Remove CollTrace.h include.
- Delete CollTraceLegacyHandle.h/.cc (adapter from old events to ICollTraceHandle interface, no longer used).
- ctran CollTraceWrapper: Remove legacyFunc static variable, setCollTraceLegacyHandleFunc(), and the CVAR-guarded fallback in getCollTraceHandle(). Always use getNewCollTraceHandle().
- v2_27/v2_28/v2_29 param.cc: Delete initLegacyColltraceForCtran() and its call_once invocation. Remove includes for CollTraceFunc.h, CollTraceLegacyHandle.h, and ctran CollTraceWrapper.h.
- Delete old colltrace tests: CollTraceDistTest.cc (tested old colltrace with USE_NEW_COLLTRACE=0) and SlowCollReporterUT.cc (tested SlowCollReporter from old CollTrace). Equivalent coverage exists in NewCollTraceDistTest{NoLocal,Local}.cc.
- AllToAllTest.cc: Remove commented-out old colltrace dump code.
- CommWithCtranTest.cc: Remove assertion comparing old collTrace_ field.

Differential Revision: D102560243
Summary: CommDesc is a string, so in json it should be quoted. Previously it was unquoted, change it to quoted.

Differential Revision: D103649809
Summary:
Fix issues in commdump test by:

1. Make the sleep deterministic by using waitForCollTraceDrain, it will return false if the colltrace is not drained after 3 seconds.
2. Fix tests by removing `codepath` field which no longer exists.
3. Fix DumpAfterColl by temporarily disable CommsMonitor, it currently has an issue if 2 communicators used the same memory address -- shouldn't happen in prod but would happen in tests. Temporarily fix it as it requires redesign of the CommsMonitor Will properly fix it in a follow-up diff.
4. Disabled DumpWhileCommsInDestruct as it is too expensive, will try to make it lightweight and turn back on
5. Disabled TestDumpAllWithTwoComms, there is an issue with destruction for CtranEX Comm, not related to the current chage. Current debugging it, temporarily disable it to not block this diff stack from landing.

Differential Revision: D103649812
Summary: Remove the old colltrace fields

Differential Revision: D102736358
@meta-codesync meta-codesync Bot changed the title Remove old collTrace fields and init/destroy functions Remove old collTrace fields May 4, 2026
@YulunW YulunW force-pushed the export-D102736358 branch from afdedf2 to 647c0dc Compare May 4, 2026 08:29
@meta-codesync meta-codesync Bot closed this in 202a24a May 4, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 4, 2026

This pull request has been merged in 202a24a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant