Skip to content

[Issue 1483] Fix sync: WaitGroup is reused panic in connection close path#1489

Open
SAY-5 wants to merge 1 commit intoapache:masterfrom
SAY-5:fix/issue-1483-waitgroup-reuse-panic
Open

[Issue 1483] Fix sync: WaitGroup is reused panic in connection close path#1489
SAY-5 wants to merge 1 commit intoapache:masterfrom
SAY-5:fix/issue-1483-waitgroup-reuse-panic

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented Apr 30, 2026

Fixes #1483.

Motivation

failLeftRequestsWhenClose calls c.incomingRequestsWG.Wait() to drain in-flight SendRequest / SendRequestNoWait callers, but those callers do incomingRequestsWG.Add(1) before checking c.getState() == connectionClosed. A caller that started concurrently with the close path can therefore execute Add(1) at the exact moment Wait() is finishing draining to zero, which Go treats as a reuse violation:

panic: sync: WaitGroup is reused before previous Wait has returned
goroutine ...
sync.(*WaitGroup).Wait(...)
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).failLeftRequestsWhenClose(...)
github.com/apache/pulsar-client-go/pulsar/internal.(*connection).run(...)

This is the production crash captured in #1483.

Modifications

Add a sync.RWMutex (incomingRequestsLock) that gates the Add(1) so it cannot interleave with Wait():

  • SendRequest / SendRequestNoWait take RLock, check state, Add(1), RUnlock. Multiple senders still proceed in parallel.
  • failLeftRequestsWhenClose takes Lock, sets state=closed, Unlocks, then Wait(). The exclusive section guarantees:
    1. no in-flight Add(1) straddles the Wait(), and
    2. any caller arriving after this point sees state == connectionClosed and returns ErrConnectionClosed without touching the WaitGroup.

Inline comments in both call sites document why the lock is necessary so future contributors don't re-introduce the race.

Verifying this change

  • go build ./pulsar/internal/... passes
  • go vet ./pulsar/internal/... passes
  • go test -count=1 -short ./pulsar/internal/ — all internal tests still pass

Documentation

  • doc-not-needed (bug fix in internal package; no public API change)

failLeftRequestsWhenClose previously called incomingRequestsWG.Wait()
without first stopping new SendRequest / SendRequestNoWait callers.
Both callers did Add(1) before checking the connection state, so a
caller that started concurrently with the close path could Add(1) at
the moment Wait() was draining to zero, tripping Go's

  panic: sync: WaitGroup is reused before previous Wait has returned

(reported in apache#1483 and seen in production close paths).

Add a sync.RWMutex (incomingRequestsLock) that:

* SendRequest / SendRequestNoWait take RLock, check state, Add(1),
  RUnlock — so multiple senders still proceed in parallel; and
* failLeftRequestsWhenClose takes Lock, sets state=closed, Unlocks,
  then Wait()s. The exclusive section guarantees no in-flight
  Add(1) is straddling the Wait() and that any new caller arriving
  after the close sees state=closed and returns ErrConnectionClosed
  without touching the WaitGroup.

Closes apache#1483.

Signed-off-by: SAY-5 <say.apm35@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

connection panic When close

1 participant