Skip to content

Make IpcRemHandle trivially destructible to prevent secondary segfaults (#2068)#2068

Open
dsjohns2 wants to merge 1 commit intometa-pytorch:mainfrom
dsjohns2:export-D97314372
Open

Make IpcRemHandle trivially destructible to prevent secondary segfaults (#2068)#2068
dsjohns2 wants to merge 1 commit intometa-pytorch:mainfrom
dsjohns2:export-D97314372

Conversation

@dsjohns2
Copy link
Copy Markdown
Contributor

@dsjohns2 dsjohns2 commented Apr 14, 2026

Summary:

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 14, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 14, 2026

@dsjohns2 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97314372.

@meta-codesync meta-codesync bot changed the title Make IpcRemHandle trivially destructible to prevent secondary segfaults Make IpcRemHandle trivially destructible to prevent secondary segfaults (#2068) Apr 14, 2026
dsjohns2 added a commit to dsjohns2/torchcomms-1 that referenced this pull request Apr 14, 2026
…ts (meta-pytorch#2068)

Summary:

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
dsjohns2 added a commit to dsjohns2/torchcomms-1 that referenced this pull request Apr 14, 2026
…ts (meta-pytorch#2068)

Summary:

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
dsjohns2 added a commit to dsjohns2/torchcomms-1 that referenced this pull request Apr 14, 2026
…ts (meta-pytorch#2068)

Summary:

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
dsjohns2 added a commit to dsjohns2/torchcomms-1 that referenced this pull request Apr 14, 2026
…ts (meta-pytorch#2068)

Summary:

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
dsjohns2 added a commit to dsjohns2/torchcomms-1 that referenced this pull request Apr 14, 2026
…ts (meta-pytorch#2068)

Summary:
Pull Request resolved: meta-pytorch#2068

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
dsjohns2 added a commit to dsjohns2/torchcomms-1 that referenced this pull request Apr 14, 2026
…ts (meta-pytorch#2068)

Summary:
Pull Request resolved: meta-pytorch#2068

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
dsjohns2 added a commit to dsjohns2/torchcomms-1 that referenced this pull request Apr 14, 2026
…ts (meta-pytorch#2068)

Summary:
Pull Request resolved: meta-pytorch#2068

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
…ts (meta-pytorch#2068)

Summary:

IpcRemHandle::peerId is changed from std::string to char[kMaxPeerIdLen].

When an IB request double-completion causes refCount_ to go negative, commInternalError propagates up through FB_COMMCHECKTHROW_EX in AllReduceRing, and stack unwinding destroys the local vector<unique_ptr<CtranMapperRequest>>. The destruction chain reaches IpcRemHandle::~IpcRemHandle(), where ~std::string() tries to free a corrupted heap pointer and segfaults — masking the real IB bug.

With char[], the destructor is trivial (no-op), so the error path completes cleanly and the actual commInternalError is reported.

Reviewed By: elvinlife

Differential Revision: D97314372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant