Skip to content

Automatically update the endpoint list when then network configuration change#2020

Open
Hugal31 wants to merge 32 commits intoeclipse-zenoh:mainfrom
Hugal31:feature/endpoint-auto-update
Open

Automatically update the endpoint list when then network configuration change#2020
Hugal31 wants to merge 32 commits intoeclipse-zenoh:mainfrom
Hugal31:feature/endpoint-auto-update

Conversation

@Hugal31
Copy link
Contributor

@Hugal31 Hugal31 commented Jun 26, 2025

Set up a configurable 10s poll interval to:

  • Update the endpoint list (i.e. resolve 0.0.0.0. & [::] endpoints).
  • If changed, send an update to the neighbors.
  • If needed, join new multicast interfaces and restart the scouting routine to scout & listen the new interfaces.

This allows Zenoh to react to a network configuration change (i.e. connections and disconnections).

I tried to use the NETLINK socket in #1824 to be more efficient and reactive but it only worked on Linux and I didn't knew how to make it work with IPv6.

Replaces #1824
Closes #1823


🏷️ Label-Based Checklist

Based on the labels applied to this PR, please complete these additional requirements:

Labels: enhancement

✨ Enhancement Requirements

Since this PR enhances existing functionality:

  • Enhancement scope documented - Clear description of what is being improved
  • Minimum necessary code - Implementation is as simple as possible, doesn't overcomplicate the system
  • Backwards compatible - Existing code/APIs still work unchanged
  • No new APIs added - Only improving existing functionality
  • Tests updated - Existing tests pass, new test cases added if needed
  • Performance improvement measured - If applicable, before/after metrics provided
  • Documentation updated - Existing docs updated to reflect improvements
  • User impact documented - How users benefit from this enhancement

Remember: Enhancements should not introduce new APIs or breaking changes.

Instructions:

  1. Check off items as you complete them (change - [ ] to - [x])
  2. The PR checklist CI will verify these are completed

This checklist updates automatically when labels change, but preserves your checked boxes.

Hugo Laloge added 16 commits April 7, 2025 09:41
On Unix systems, use netlink to detect added or delete IPv4 addresses to:
  * Renew the interface list.
  * Update the node locators.
  * Restart the scouting to use the new addresses/discard the old addresses.

This is still quite hacky, with the following shortcommings:
* We do not handle IPv6 addresses.
* We perform the locator update and the scouting reset even if the
  new/old addresses are not used as per the configuration.
* The code overall is not the best quality.
Remove the scouting reset upon interface change.
This is in preparation of an improvemnt of the scouting update upon interface change.
When updating the locators after a network change, send the new
locators to the routers and the peers if in linkstate mode.
…TLINK

NETLINK is only available on Linux. The poll interval may be less
reactive and efficient, but it is available everywhere. Moreover, this
fix the case where a new locator was not deteced if an interface IP
was added before the interface was UP and RUNNING.
@github-actions
Copy link

PR missing one of the required labels: {'new feature', 'dependencies', 'internal', 'bug', 'documentation', 'breaking-change', 'enhancement'}

@Hugal31 Hugal31 force-pushed the feature/endpoint-auto-update branch from ffbdb8e to 3d8d2b6 Compare June 26, 2025 09:06
@github-actions
Copy link

PR missing one of the required labels: {'documentation', 'enhancement', 'bug', 'new feature', 'breaking-change', 'dependencies', 'internal'}

@github-actions
Copy link

PR missing one of the required labels: {'documentation', 'dependencies', 'internal', 'new feature', 'enhancement', 'bug', 'breaking-change'}

@codecov
Copy link

codecov bot commented Jun 26, 2025

Codecov Report

❌ Patch coverage is 57.00713% with 181 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.40%. Comparing base (aefcca7) to head (3a56692).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
zenoh/src/net/runtime/scouting.rs 60.79% 118 Missing ⚠️
zenoh/src/net/protocol/gossip.rs 0.00% 18 Missing ⚠️
zenoh/src/net/runtime/orchestrator.rs 70.58% 10 Missing ⚠️
zenoh/src/net/protocol/network.rs 0.00% 8 Missing ⚠️
zenoh/src/net/routing/hat/router/mod.rs 0.00% 8 Missing ⚠️
zenoh/src/net/routing/hat/linkstate_peer/mod.rs 0.00% 5 Missing ⚠️
zenoh/src/net/routing/hat/p2p_peer/mod.rs 0.00% 5 Missing ⚠️
zenoh/src/net/runtime/mod.rs 81.48% 5 Missing ⚠️
commons/zenoh-util/src/net/mod.rs 75.00% 3 Missing ⚠️
zenoh/src/net/routing/hat/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2020      +/-   ##
==========================================
- Coverage   72.56%   72.40%   -0.16%     
==========================================
  Files         390      391       +1     
  Lines       63360    63587     +227     
==========================================
+ Hits        45975    46043      +68     
- Misses      17385    17544     +159     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link

PR missing one of the required labels: {'breaking-change', 'internal', 'bug', 'dependencies', 'documentation', 'new feature', 'enhancement'}

@diogomatsubara diogomatsubara added the enhancement Existing things could work better label Jul 1, 2025
pub retry: Option<connection_retry::ConnectionRetryModeDependentConf>,
/// Interval in millisecond to check if the listening endpoints changed (e.g. when listening on 0.0.0.0).
/// Also update the multicast scouting listening interfaces. Use -1 to disable.
pub endpoint_poll_interval_ms: Option<i64>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason to use i64 for purely positive value, can Option::None be used for disabling ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the idea from timeout_ms, which has the same semantic if I'm not mistaken (-1 = infinite / disabled).

After verification, I disable the poll if the interval is <= 0, not just < 0 , while timeout_ms actually has a timeout of 0 if you set 0.
I don't mind changing

@milyin
Copy link
Contributor

milyin commented Aug 4, 2025

@Hugal31 could you please make any update to make the CI restart. For some reason I don't see a way to restart it from github interface. It's not clear now if CI failure is caused by PR itself or is it some sporadic thing

@Hugal31
Copy link
Contributor Author

Hugal31 commented Sep 1, 2025

I think the macos tests fails only because they are flaky, they fail on main right now.

@Hugal31 Hugal31 force-pushed the feature/endpoint-auto-update branch from 69f05b9 to d14c043 Compare November 25, 2025 14:21
@milyin
Copy link
Contributor

milyin commented Mar 16, 2026

@Hugal31 sorry for delay in reviewing. If it's still relevant, could you please merge latest main to it to restart tests and make sure that the update is still valid

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds periodic monitoring to detect network configuration changes and react by refreshing resolved listen locators and multicast scouting interfaces, so peers/routers can be updated without restarting the node.

Changes:

  • Introduces a configurable polling loop (endpoint_poll_interval_ms, default 10s) that refreshes interface cache (Unix), updates resolved locators, and notifies routing neighbors when locators change.
  • Refactors multicast scouting logic into a dedicated Scouting runtime module and wires it into the orchestrator and API scouting paths.
  • Adds routing “self locator update” hooks so locator changes propagate via gossip/network linkstate mechanisms.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
zenoh/src/net/runtime/scouting.rs New scouting module with restart/update logic for multicast interfaces and scout routines.
zenoh/src/net/runtime/orchestrator.rs Switches to Scouting module and replaces print_locators() with change-detecting update_locators().
zenoh/src/net/runtime/mod.rs Adds background poller, locator refresh + neighbor notification, and stores Scouting in runtime state.
zenoh/src/net/routing/hat/mod.rs Adds update_self_locators() hook to HAT base trait.
zenoh/src/net/routing/hat/router/mod.rs Implements locator update propagation for router HAT.
zenoh/src/net/routing/hat/p2p_peer/mod.rs Implements locator update propagation for p2p/gossip HAT.
zenoh/src/net/routing/hat/linkstate_peer/mod.rs Implements locator update propagation for linkstate peer HAT.
zenoh/src/net/protocol/network.rs Adds Network::update_locators() to send locator updates on links.
zenoh/src/net/protocol/gossip.rs Adds Gossip::{update_locators} and GossipNet::update_locators() to propagate locator updates.
zenoh/src/api/scouting.rs Uses Scouting::scout after moving the scout routine out of Runtime.
commons/zenoh-util/src/net/mod.rs Makes interface list cache mutable (RwLock) and adds update_iface_cache() (Unix).
commons/zenoh-config/src/lib.rs Adds listen.endpoint_poll_interval_ms configuration field and docs.
commons/zenoh-config/src/defaults.rs Provides default value (10s) for endpoint_poll_interval_ms.
DEFAULT_CONFIG.json5 Documents and sets default endpoint_poll_interval_ms.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +297 to +305
let sockets = zasyncread!(this.state.sockets);
tokio::select! {
_ = this.responder(&sockets.mcast_socket, &sockets.ucast_sockets) => {},
_ = this.autoconnect_all(
&sockets.ucast_sockets,
this.state.autoconnect,
&this.state.addr
) => {},
_ = token.cancelled() => (),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh the mutex is only read/write by this struct, and I don't want to clone the sockets. They are write-locked only when the scouting coroutine is terminated.

Comment on lines +819 to +820
let scouting = self.state.scouting.lock().await;
if let Some(scouting) = scouting.as_ref() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What?

@Hugal31
Copy link
Contributor Author

Hugal31 commented Mar 18, 2026

I'm not sure how to go about the unit tests, since they would require adding/removing network interfaces...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Existing things could work better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle the addition or suppression of network interfaces/addresses

5 participants