Delete old colltrace files and inline CollTraceFunc (#2313)#2313
Open
YulunW wants to merge 6 commits intometa-pytorch:mainfrom
Open
Delete old colltrace files and inline CollTraceFunc (#2313)#2313YulunW wants to merge 6 commits intometa-pytorch:mainfrom
YulunW wants to merge 6 commits intometa-pytorch:mainfrom
Conversation
Contributor
|
@YulunW has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102736359. |
Summary:
Remove the old colltrace fallback paths that were guarded by NCCL_COLLTRACE_USE_NEW_COLLTRACE. The new colltrace is now the only path.
Changes:
- CollTraceFunc.cc: collTraceBaselineGetHandle() now always uses the new colltrace path. Delete all old event acquisition/recording functions (collTraceAquireEvent*, collTraceRecordStartEvent, collTraceRecordEndEvent, etc.) that served the old CollTraceEvent pipeline.
- CollTraceFunc.h: Remove declarations for deleted functions. Remove CollTrace.h include.
- Delete CollTraceLegacyHandle.h/.cc (adapter from old events to ICollTraceHandle interface, no longer used).
- ctran CollTraceWrapper: Remove legacyFunc static variable, setCollTraceLegacyHandleFunc(), and the CVAR-guarded fallback in getCollTraceHandle(). Always use getNewCollTraceHandle().
- v2_27/v2_28/v2_29 param.cc: Delete initLegacyColltraceForCtran() and its call_once invocation. Remove includes for CollTraceFunc.h, CollTraceLegacyHandle.h, and ctran CollTraceWrapper.h.
- Delete old colltrace tests: CollTraceDistTest.cc (tested old colltrace with USE_NEW_COLLTRACE=0) and SlowCollReporterUT.cc (tested SlowCollReporter from old CollTrace). Equivalent coverage exists in NewCollTraceDistTest{NoLocal,Local}.cc.
- AllToAllTest.cc: Remove commented-out old colltrace dump code.
- CommWithCtranTest.cc: Remove assertion comparing old collTrace_ field.
Differential Revision: D102560243
Differential Revision: D102736360
Summary: CommDesc is a string, so in json it should be quoted. Previously it was unquoted, change it to quoted. Differential Revision: D103649809
Summary: Fix issues in commdump test by: 1. Make the sleep deterministic by using waitForCollTraceDrain, it will return false if the colltrace is not drained after 3 seconds. 2. Fix tests by removing `codepath` field which no longer exists. 3. Fix DumpAfterColl by temporarily disable CommsMonitor, it currently has an issue if 2 communicators used the same memory address -- shouldn't happen in prod but would happen in tests. Temporarily fix it as it requires redesign of the CommsMonitor Will properly fix it in a follow-up diff. 4. Disabled DumpWhileCommsInDestruct as it is too expensive, will try to make it lightweight and turn back on 5. Disabled TestDumpAllWithTwoComms, there is an issue with destruction for CtranEX Comm, not related to the current chage. Current debugging it, temporarily disable it to not block this diff stack from landing. Differential Revision: D103649812
Summary: Remove the old colltrace fields Differential Revision: D102736358
Summary: Pull Request resolved: meta-pytorch#2313 Delete original files and dependencies Reviewed By: minsii Differential Revision: D102736359
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Delete original files and dependencies
Reviewed By: minsii
Differential Revision: D102736359