Make IpcRemHandle trivially destructible to prevent secondary segfaults (#2068)#2068
Open
dsjohns2 wants to merge 1 commit intometa-pytorch:mainfrom
Open
Make IpcRemHandle trivially destructible to prevent secondary segfaults (#2068)#2068dsjohns2 wants to merge 1 commit intometa-pytorch:mainfrom
dsjohns2 wants to merge 1 commit intometa-pytorch:mainfrom
Conversation
Contributor
8a2399a to
be9bff7
Compare
dsjohns2
added a commit
to dsjohns2/torchcomms-1
that referenced
this pull request
Apr 14, 2026
…ts (meta-pytorch#2068) Summary: IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
dsjohns2
added a commit
to dsjohns2/torchcomms-1
that referenced
this pull request
Apr 14, 2026
…ts (meta-pytorch#2068) Summary: IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
be9bff7 to
7637c2f
Compare
dsjohns2
added a commit
to dsjohns2/torchcomms-1
that referenced
this pull request
Apr 14, 2026
…ts (meta-pytorch#2068) Summary: IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
7637c2f to
5f02941
Compare
dsjohns2
added a commit
to dsjohns2/torchcomms-1
that referenced
this pull request
Apr 14, 2026
…ts (meta-pytorch#2068) Summary: IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
5f02941 to
82a6298
Compare
dsjohns2
added a commit
to dsjohns2/torchcomms-1
that referenced
this pull request
Apr 14, 2026
…ts (meta-pytorch#2068) Summary: Pull Request resolved: meta-pytorch#2068 IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
82a6298 to
c379ab3
Compare
dsjohns2
added a commit
to dsjohns2/torchcomms-1
that referenced
this pull request
Apr 14, 2026
…ts (meta-pytorch#2068) Summary: Pull Request resolved: meta-pytorch#2068 IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
c379ab3 to
3ffb99e
Compare
dsjohns2
added a commit
to dsjohns2/torchcomms-1
that referenced
this pull request
Apr 14, 2026
…ts (meta-pytorch#2068) Summary: Pull Request resolved: meta-pytorch#2068 IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
3ffb99e to
6db33ce
Compare
…ts (meta-pytorch#2068) Summary: IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen]. When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug. With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported. Reviewed By: elvinlife Differential Revision: D97314372
6db33ce to
2c11571
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].
When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.
With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.
Reviewed By: elvinlife
Differential Revision: D97314372