Skip to content

Delete old colltrace files and inline CollTraceFunc (#2313)#2313

Open
YulunW wants to merge 6 commits intometa-pytorch:mainfrom
YulunW:export-D102736359
Open

Delete old colltrace files and inline CollTraceFunc (#2313)#2313
YulunW wants to merge 6 commits intometa-pytorch:mainfrom
YulunW:export-D102736359

Conversation

@YulunW
Copy link
Copy Markdown
Contributor

@YulunW YulunW commented Apr 29, 2026

Summary:

Delete original files and dependencies

Reviewed By: minsii

Differential Revision: D102736359

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 29, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 29, 2026

@YulunW has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102736359.

yulunwang and others added 6 commits May 4, 2026 01:23
Summary:
Remove the old colltrace fallback paths that were guarded by NCCL_COLLTRACE_USE_NEW_COLLTRACE. The new colltrace is now the only path.

Changes:
- CollTraceFunc.cc: collTraceBaselineGetHandle() now always uses the new colltrace path. Delete all old event acquisition/recording functions (collTraceAquireEvent*, collTraceRecordStartEvent, collTraceRecordEndEvent, etc.) that served the old CollTraceEvent pipeline.
- CollTraceFunc.h: Remove declarations for deleted functions. Remove CollTrace.h include.
- Delete CollTraceLegacyHandle.h/.cc (adapter from old events to ICollTraceHandle interface, no longer used).
- ctran CollTraceWrapper: Remove legacyFunc static variable, setCollTraceLegacyHandleFunc(), and the CVAR-guarded fallback in getCollTraceHandle(). Always use getNewCollTraceHandle().
- v2_27/v2_28/v2_29 param.cc: Delete initLegacyColltraceForCtran() and its call_once invocation. Remove includes for CollTraceFunc.h, CollTraceLegacyHandle.h, and ctran CollTraceWrapper.h.
- Delete old colltrace tests: CollTraceDistTest.cc (tested old colltrace with USE_NEW_COLLTRACE=0) and SlowCollReporterUT.cc (tested SlowCollReporter from old CollTrace). Equivalent coverage exists in NewCollTraceDistTest{NoLocal,Local}.cc.
- AllToAllTest.cc: Remove commented-out old colltrace dump code.
- CommWithCtranTest.cc: Remove assertion comparing old collTrace_ field.

Differential Revision: D102560243
Summary: CommDesc is a string, so in json it should be quoted. Previously it was unquoted, change it to quoted.

Differential Revision: D103649809
Summary:
Fix issues in commdump test by:

1. Make the sleep deterministic by using waitForCollTraceDrain, it will return false if the colltrace is not drained after 3 seconds.
2. Fix tests by removing `codepath` field which no longer exists.
3. Fix DumpAfterColl by temporarily disable CommsMonitor, it currently has an issue if 2 communicators used the same memory address -- shouldn't happen in prod but would happen in tests. Temporarily fix it as it requires redesign of the CommsMonitor Will properly fix it in a follow-up diff.
4. Disabled DumpWhileCommsInDestruct as it is too expensive, will try to make it lightweight and turn back on
5. Disabled TestDumpAllWithTwoComms, there is an issue with destruction for CtranEX Comm, not related to the current chage. Current debugging it, temporarily disable it to not block this diff stack from landing.

Differential Revision: D103649812
Summary: Remove the old colltrace fields

Differential Revision: D102736358
Summary:
Pull Request resolved: meta-pytorch#2313

Delete original files and dependencies

Reviewed By: minsii

Differential Revision: D102736359
@YulunW YulunW force-pushed the export-D102736359 branch from 3d533e8 to d7df8c2 Compare May 4, 2026 08:30
@meta-codesync meta-codesync Bot changed the title Delete old colltrace files and inline CollTraceFunc Delete old colltrace files and inline CollTraceFunc (#2313) May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant