[cleanup][test] Remove PortManager and use kernel-assigned ports everywhere#25694
[cleanup][test] Remove PortManager and use kernel-assigned ports everywhere#25694merlimat wants to merge 8 commits intoapache:masterfrom
Conversation
…ywhere `PortManager` (in pulsar-common) and the inner `BasePortManager` of `LocalBookkeeperEnsemble` allocated "free" ports via `ServerSocket(0)` and tracked them in a JVM-level lock set. The lock was JVM-only, so the OS-level port could still be grabbed by another process between allocation and bind, producing flaky `BindException`s in tests. The class has outlived its usefulness — modern BookKeeper identifies bookies by `bookieId`, not host:port, so port 0 (kernel-assigned) is fine across the board. Removed: - `pulsar-common/...PortManager.java` and its test. - `pulsar-package-management/...test/PortManager.java` (unused duplicate). - `BasePortManager` inner class in `LocalBookkeeperEnsemble`. - `bkBasePort` constructor variants of `LocalBookkeeperEnsemble` (no callers left after the standalone change). - `--bookkeeper-port` CLI option from `PulsarStandalone` (and its setter, getter, builder method, plus 3 test calls). Multi-bookie standalone now always uses kernel-assigned ports — bookies are identified by `bookieId`. - `useDynamicBrokerPorts()` toggle in `MultiBrokerBaseTest` (now always true in spirit; ports come from `getBrokerListenPort()` after start). - `bkPort` field in `BKCluster.BKClusterConf`. Refactored: - `BKCluster`: bookies bind to port 0; `bookieId` is derived from the data dir hash so multiple cluster instances on a shared metadata store don't collide on cookie validation, while restarts on the same dir keep cookies stable. The unused cookie-parsing port-recovery logic is dropped. - `MultiBrokerBaseTest`: `mainBrokerPort` and `additionalBrokerPorts` are populated post-start from `pulsar.getBrokerListenPort()` instead of pre-allocated. - `PulsarTestContext.handlePreallocatePorts`: inlined `ServerSocket(0)` allocation. Still needed by tests that build advertised-listener URLs before the broker starts. - 17 test files: `PortManager.nextLockedFreePort()` calls replaced with `0` (let kernel pick) or inline `ServerSocket(0)` (only where pre-allocation is genuinely needed, e.g. for advertised-listener URLs). - `PulsarStandaloneTest.testStandaloneWithRocksDB`: assertion switched from `getBookiePort()` (now returns kernel-assigned ports that change per restart) to `getBookieId()` (the persistent identity). Behavior change: `--bookkeeper-port` is gone from `pulsar standalone`. Bookies get kernel-assigned ports — no impact for normal usage since clients only connect to the broker.
Now that all callers pass () -> 0, the supplier is dead weight. Remove it from LocalBookkeeperEnsemble entirely; bookies always bind to port 0. Updated all 41 call sites accordingly.
lhotari
left a comment
There was a problem hiding this comment.
There's still a need for PortManager in cases where ephemeral ports (0) can't be used. Where they can, ephemeral is obviously preferred.
To avoid collisions for assigned ports across multiple processes, PortManager should allocate from outside the ephemeral range. On Linux the default emphemeral range is 32768–60999:
❯ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768 60999
Proposal: reserve a block per JVM starting at a fixed offset. Each instance claims block n by binding a server socket on 20000 + (n * 1000), giving it 999 usable ports in that range. That handles coordination across multiple test JVMs without a shared registry. If ports run out, a new block could be allocated.
On release, PortManager should verify the port is actually free before returning it to the pool — sockets can linger in TIME_WAIT after close, so the free-list update needs to handle async release rather than marking the port available immediately.
Address review feedback: PortManager is still useful for tests that need to know a port BEFORE binding (advertised-listener URLs, znode pre-creation, unavailable-node URL synthesis). The old implementation suffered from the JVM-level lock not preventing OS-level collisions, since it allocated inside the kernel's ephemeral range. The restored PortManager addresses both problems: - Allocates from outside the ephemeral range (Linux default 32768-60999) by reserving a 1000-port block per JVM. Each JVM claims a block by binding a lock ServerSocket on the block's base port for the JVM's lifetime; other JVMs that hit the same range observe the bind failure and pick the next. - Released ports do not return to the pool immediately. They sit in a pending-release set until a fresh bind to the port succeeds, which avoids handing out a port that's still in TIME_WAIT. Wired PortManager back into the call sites that genuinely need pre-allocation: - ServiceUrlQuarantineTest (broker port + 40 unavailable-node ports). - ExtensibleLoadManagerImplWithAdvertisedListenersTest (advertised listeners). - ModularLoadManagerImplTest.testOwnBrokerZnodeByMultipleBroker (znode pre-creation). - AdvertisedListenersTest (advertised listeners per broker). - SimpleProtocolHandlerTestsBase / SimpleProxyExtensionTestBase (handler listener address). - PulsarTestContext.handlePreallocatePorts (preallocatePorts(true) preserved). Everything else still uses port 0 with kernel assignment.
|
@lhotari thanks — restored The new implementation in
Wired it back into the call sites that genuinely need pre-allocation: |
…Socket(0) version
…-manager # Conflicts: # pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/ShadowTopicRealBkTest.java
Two distinct issues surfaced on https://github.com/apache/pulsar/actions/runs/25449522387: 1. BookKeeperClusterTestCase (3 copies in pulsar-metadata, managed-ledger, pulsar-package-management) now starts bookies on port 0. Multiple bookies ended up sharing the cookie identity 127.0.0.1:0, breaking cookie validation on the second bookie. This code path (unlike BKCluster, which uses bookieId in metadata) needs the bookie to be reachable by host:port for the test client's address resolver to work — bookieId-only registration leads to UnknownHostException when clients try to dial the bookie. Fix: route through PortManager.nextLockedFreePort() again for these cases. This is exactly the use case lhotari called out in the PR review. 2. PulsarFunctionTlsTest configured both brokers with WebServicePortTls=0. PulsarService.initializeWorkerConfigFromBrokerConfig builds the workerId from "{cluster}-fw-{host}-{configuredPort}", which means both brokers got the same workerId, the function-worker membership manager never elected a leader, and the test timed out waiting for one. Fix: pre-allocate WebServicePortTls via PortManager so each broker gets a unique configured port (and therefore a unique workerId).
Motivation
PortManager(inpulsar-common) and the innerBasePortManagerofLocalBookkeeperEnsembleallocated "free" ports viaServerSocket(0)and tracked them in a JVM-level lock set. The lock was JVM-only, so the OS-level port could still be grabbed by another process between allocation and bind, producing flakyBindExceptions in tests (most recentlyShadowTopicRealBkTest.setup).The class has outlived its usefulness — modern BookKeeper identifies bookies by
bookieId, not host:port, so kernel-assigned ports (port 0) work across the board. The "lock" never actually prevented OS-level collisions anyway.Modifications
Removed
pulsar-common/.../PortManager.javaand its test.pulsar-package-management/.../test/PortManager.java(unused duplicate).BasePortManagerinner class inLocalBookkeeperEnsemble.bkBasePortconstructor variants ofLocalBookkeeperEnsemble.Supplier<Integer> portManagerparameter fromLocalBookkeeperEnsemble— every caller was passing() -> 0. Bookies now always bind to kernel-assigned ports internally.--bookkeeper-portCLI option frompulsar standalone(and its setter, getter, builder method, plus the test calls). Multi-bookie standalone gets kernel-assigned ports.useDynamicBrokerPorts()toggle inMultiBrokerBaseTest(now always dynamic).bkPortfield inBKCluster.BKClusterConf.Refactored
BKCluster: bookies bind to port 0;bookieIdis derived from the data dir hash so multiple cluster instances on a shared metadata store don't collide on cookie validation, while restarts on the same dir keep cookies stable. The unused cookie-parsing port-recovery logic is dropped.MultiBrokerBaseTest:mainBrokerPortandadditionalBrokerPortsare populated post-start frompulsar.getBrokerListenPort()instead of pre-allocated.PulsarTestContext.handlePreallocatePorts: inlinedServerSocket(0)allocation. Still needed by tests that build advertised-listener URLs before the broker starts (those URLs require a known port up front).PortManager.nextLockedFreePort()calls replaced with0(let kernel pick) or inlineServerSocket(0)(only where pre-allocation is genuinely required, e.g. for advertised-listener URLs).PulsarStandaloneTest.testStandaloneWithRocksDB: assertion switched fromgetBookiePort()(now returns kernel-assigned ports that change per restart) togetBookieId()(the persistent BK identity).Behavior change
--bookkeeper-portis gone frompulsar standalone. Bookies get kernel-assigned ports — no impact for normal usage since clients only connect to the broker, not directly to bookies.Verifying this change
Locally exercised the affected tests:
LocalBookkeeperEnsembleTest(2/2)ShadowTopicRealBkTest(1/1)PulsarStandaloneTest(5/5)EndToEndTestforBKCluster(10 passed, 1 skipped)AdvertisedListenersTest,LoadManagerFailFastTest,AdvertisedListenersMultiBrokerLeaderElectionTestFull
compileJavaandcompileTestJavaclean.Does this pull request potentially affect one of the following parts:
--bookkeeper-portCLI option frompulsar standalone--bookkeeper-portremoved frompulsar standalone