Skip to content

fix: fix connection panic caused by WaitGroup misuse on close#1484

Merged
nodece merged 6 commits into
apache:masterfrom
nodece:fix/issue-1483-waitgroup-panic
May 15, 2026
Merged

fix: fix connection panic caused by WaitGroup misuse on close#1484
nodece merged 6 commits into
apache:masterfrom
nodece:fix/issue-1483-waitgroup-panic

Conversation

@nodece
Copy link
Copy Markdown
Member

@nodece nodece commented Apr 24, 2026

Fixes #1483

Motivation

When closing a connection, failLeftRequestsWhenClose() waits on incomingRequestsWG before draining queued requests.

Previously, only SendRequest / SendRequestNoWait were tracked by the WaitGroup. WriteData could still enqueue into writeRequestsCh after close selection started, creating a race where buffered data requests were left undrained and never released.

Additionally, in Go 1.25+, calling Add(1) concurrently with Wait() on a WaitGroup can panic with sync: WaitGroup is reused before previous Wait has returned if close and in-flight request paths are not synchronized correctly.

Modifications

Introduced a shared registerIncomingRequest() helper used by all three entry points:

  • SendRequest
  • SendRequestNoWait
  • WriteData

The registerIncomingRequest() helper performs:

  1. c.mu.RLock()
  2. state check (returns false if connectionClosed)
  3. incomingRequestsWG.Add(1) (only if still open)

This ensures the check-and-add is atomic with respect to the close state transition:

  • Close() sets state under c.mu.Lock(), creating a clear happens-before relationship
  • After state becomes connectionClosed, no new request/write path can increment the WaitGroup
  • Prevents concurrent Add() calls after Wait() has started, eliminating the panic

The non-blocking drain after Wait() is now safe and sufficient:

  • All tracked producers have completed and called Done()
  • All enqueues have completed before Wait() returns
  • Drain releases leftover writeRequestsCh buffers and fails queued request callbacks

Testing

Added regression test TestConnectionSendRequestRaceWithClose that:

  • Directly exercises the WaitGroup Add/Wait race by calling registerIncomingRequest() concurrently
  • Calls failLeftRequestsWhenClose() to invoke Wait()
  • Runs 10 trials with 50 concurrent goroutines to maximize race timing variations
  • Verifies no panic occurs in Go 1.25+ with proper synchronization
  • Confirms all three entry points properly reject requests after close via assertConnectionClosed

@nodece nodece force-pushed the fix/issue-1483-waitgroup-panic branch 2 times, most recently from f070589 to 154a0ac Compare April 24, 2026 11:30
@nodece nodece changed the title [Issue 1483] Fix connection panic caused by WaitGroup misuse on close fix: fix connection panic caused by WaitGroup misuse on close Apr 24, 2026
@nodece nodece requested a review from Copilot May 14, 2026 02:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a connection-close race that can panic when SendRequest/SendRequestNoWait interact with WaitGroup while the connection is closing.

Changes:

  • Synchronizes request state checks and WaitGroup.Add(1) with c.mu.RLock().
  • Replaces sentinel-based channel draining with a non-blocking drain after waiting for in-flight requests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pulsar/internal/connection.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread pulsar/internal/connection.go
@nodece nodece force-pushed the fix/issue-1483-waitgroup-panic branch from c992cf9 to 9785098 Compare May 14, 2026 06:31
@nodece nodece requested a review from Copilot May 14, 2026 06:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread pulsar/internal/connection_test.go Outdated
Comment thread pulsar/internal/connection_test.go Outdated
nodece and others added 2 commits May 14, 2026 14:57
The previous test did not exercise the actual Add/Wait race because it
didn't call failLeftRequestsWhenClose() which contains the Wait() call.

This updated test:
- Directly calls registerIncomingRequest() to exercise WaitGroup.Add()
- Concurrently calls failLeftRequestsWhenClose() to exercise Wait()
- Runs 10 trials with 50 goroutines each to maximize race window
- Verifies no panic occurs in Go 1.25+ with improper synchronization
- Completes successfully with proper locking under mu.RLock()

Also verify all three entry points (SendRequest, SendRequestNoWait, WriteData)
properly reject calls after connection close via assertConnectionClosed test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread pulsar/internal/connection_test.go Outdated
@nodece nodece merged commit dfdbc46 into apache:master May 15, 2026
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

connection panic When close

3 participants