Automatically update the endpoint list when then network configuration change#2020
Automatically update the endpoint list when then network configuration change#2020Hugal31 wants to merge 32 commits intoeclipse-zenoh:mainfrom
Conversation
On Unix systems, use netlink to detect added or delete IPv4 addresses to: * Renew the interface list. * Update the node locators. * Restart the scouting to use the new addresses/discard the old addresses. This is still quite hacky, with the following shortcommings: * We do not handle IPv6 addresses. * We perform the locator update and the scouting reset even if the new/old addresses are not used as per the configuration. * The code overall is not the best quality.
Remove the scouting reset upon interface change. This is in preparation of an improvemnt of the scouting update upon interface change.
When updating the locators after a network change, send the new locators to the routers and the peers if in linkstate mode.
…TLINK NETLINK is only available on Linux. The poll interval may be less reactive and efficient, but it is available everywhere. Moreover, this fix the case where a new locator was not deteced if an interface IP was added before the interface was UP and RUNNING.
…ators in the same order
|
PR missing one of the required labels: {'new feature', 'dependencies', 'internal', 'bug', 'documentation', 'breaking-change', 'enhancement'} |
ffbdb8e to
3d8d2b6
Compare
|
PR missing one of the required labels: {'documentation', 'enhancement', 'bug', 'new feature', 'breaking-change', 'dependencies', 'internal'} |
|
PR missing one of the required labels: {'documentation', 'dependencies', 'internal', 'new feature', 'enhancement', 'bug', 'breaking-change'} |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2020 +/- ##
==========================================
- Coverage 72.56% 72.40% -0.16%
==========================================
Files 390 391 +1
Lines 63360 63587 +227
==========================================
+ Hits 45975 46043 +68
- Misses 17385 17544 +159 ☔ View full report in Codecov by Sentry. |
Fix cyclic reference with Runtime and Scouting.
|
PR missing one of the required labels: {'breaking-change', 'internal', 'bug', 'dependencies', 'documentation', 'new feature', 'enhancement'} |
| pub retry: Option<connection_retry::ConnectionRetryModeDependentConf>, | ||
| /// Interval in millisecond to check if the listening endpoints changed (e.g. when listening on 0.0.0.0). | ||
| /// Also update the multicast scouting listening interfaces. Use -1 to disable. | ||
| pub endpoint_poll_interval_ms: Option<i64>, |
There was a problem hiding this comment.
Is there any specific reason to use i64 for purely positive value, can Option::None be used for disabling ?
There was a problem hiding this comment.
I copied the idea from timeout_ms, which has the same semantic if I'm not mistaken (-1 = infinite / disabled).
After verification, I disable the poll if the interval is <= 0, not just < 0 , while timeout_ms actually has a timeout of 0 if you set 0.
I don't mind changing
|
@Hugal31 could you please make any update to make the CI restart. For some reason I don't see a way to restart it from github interface. It's not clear now if CI failure is caused by PR itself or is it some sporadic thing |
|
I think the macos tests fails only because they are flaky, they fail on main right now. |
69f05b9 to
d14c043
Compare
|
@Hugal31 sorry for delay in reviewing. If it's still relevant, could you please merge latest main to it to restart tests and make sure that the update is still valid |
There was a problem hiding this comment.
Pull request overview
This PR adds periodic monitoring to detect network configuration changes and react by refreshing resolved listen locators and multicast scouting interfaces, so peers/routers can be updated without restarting the node.
Changes:
- Introduces a configurable polling loop (
endpoint_poll_interval_ms, default 10s) that refreshes interface cache (Unix), updates resolved locators, and notifies routing neighbors when locators change. - Refactors multicast scouting logic into a dedicated
Scoutingruntime module and wires it into the orchestrator and API scouting paths. - Adds routing “self locator update” hooks so locator changes propagate via gossip/network linkstate mechanisms.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| zenoh/src/net/runtime/scouting.rs | New scouting module with restart/update logic for multicast interfaces and scout routines. |
| zenoh/src/net/runtime/orchestrator.rs | Switches to Scouting module and replaces print_locators() with change-detecting update_locators(). |
| zenoh/src/net/runtime/mod.rs | Adds background poller, locator refresh + neighbor notification, and stores Scouting in runtime state. |
| zenoh/src/net/routing/hat/mod.rs | Adds update_self_locators() hook to HAT base trait. |
| zenoh/src/net/routing/hat/router/mod.rs | Implements locator update propagation for router HAT. |
| zenoh/src/net/routing/hat/p2p_peer/mod.rs | Implements locator update propagation for p2p/gossip HAT. |
| zenoh/src/net/routing/hat/linkstate_peer/mod.rs | Implements locator update propagation for linkstate peer HAT. |
| zenoh/src/net/protocol/network.rs | Adds Network::update_locators() to send locator updates on links. |
| zenoh/src/net/protocol/gossip.rs | Adds Gossip::{update_locators} and GossipNet::update_locators() to propagate locator updates. |
| zenoh/src/api/scouting.rs | Uses Scouting::scout after moving the scout routine out of Runtime. |
| commons/zenoh-util/src/net/mod.rs | Makes interface list cache mutable (RwLock) and adds update_iface_cache() (Unix). |
| commons/zenoh-config/src/lib.rs | Adds listen.endpoint_poll_interval_ms configuration field and docs. |
| commons/zenoh-config/src/defaults.rs | Provides default value (10s) for endpoint_poll_interval_ms. |
| DEFAULT_CONFIG.json5 | Documents and sets default endpoint_poll_interval_ms. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| let sockets = zasyncread!(this.state.sockets); | ||
| tokio::select! { | ||
| _ = this.responder(&sockets.mcast_socket, &sockets.ucast_sockets) => {}, | ||
| _ = this.autoconnect_all( | ||
| &sockets.ucast_sockets, | ||
| this.state.autoconnect, | ||
| &this.state.addr | ||
| ) => {}, | ||
| _ = token.cancelled() => (), |
There was a problem hiding this comment.
Mmh the mutex is only read/write by this struct, and I don't want to clone the sockets. They are write-locked only when the scouting coroutine is terminated.
| let scouting = self.state.scouting.lock().await; | ||
| if let Some(scouting) = scouting.as_ref() { |
|
I'm not sure how to go about the unit tests, since they would require adding/removing network interfaces... |
Set up a configurable 10s poll interval to:
This allows Zenoh to react to a network configuration change (i.e. connections and disconnections).
I tried to use the NETLINK socket in #1824 to be more efficient and reactive but it only worked on Linux and I didn't knew how to make it work with IPv6.
Replaces #1824
Closes #1823
🏷️ Label-Based Checklist
Based on the labels applied to this PR, please complete these additional requirements:
Labels:
enhancement✨ Enhancement Requirements
Since this PR enhances existing functionality:
Remember: Enhancements should not introduce new APIs or breaking changes.
Instructions:
- [ ]to- [x])This checklist updates automatically when labels change, but preserves your checked boxes.