Feature Request: Use Latency as a Tie-Breaker for Routing Peers with the Same Metric #2297

Open
opened 2025-11-20 07:07:16 -05:00 by saavagebueno · 0 comments
Owner

Originally created by @tkloda on GitHub (Sep 23, 2025).

Is your feature request related to a problem? Please describe.
Yes, the current routing peer selection process can lead to suboptimal network performance. When multiple routing peers advertise the same network with an identical metric, NetBird appears to select the active route randomly. This means a client might be routed through a peer with significantly higher latency, even when a lower-latency peer with the same metric is available. This negatively impacts the performance of latency-sensitive applications and results in an inconsistent user experience.

Describe the solution you'd like
I propose enhancing the routing logic to use latency as a secondary tie-breaker when selecting a routing peer.

The new selection process should be:

  1. First, select the routing peer(s) with the lowest metric for a given network prefix.

  2. If this results in two or more peers with the same metric, NetBird should then measure or use existing latency information for the connection to each of these peers.

  3. Finally, it should select the peer with the lowest latency as the active route.

This would ensure that traffic is always sent via the most performant path available among equally prioritized peers, leading to a more intelligent and efficient network.

Describe alternatives you've considered
The primary alternative is to manually configure more granular metrics for each peer to force a specific routing path. However, this approach has several drawbacks:

  1. Static: It doesn't adapt to dynamic changes in network conditions or latency. If the preferred peer goes down or experiences high latency, manual intervention is required to adjust the metrics.

  2. Tedious: It adds significant administrative overhead, especially in larger or more complex networks with many routing peers.

  3. Error-prone: Manual configuration can easily lead to mistakes and misconfigurations.

Additional context
Here's a practical example to illustrate the issue and the proposed solution:

- Network: 192.168.10.0/24

- Routing Peer A: Located in Frankfurt, advertises the network with metric = 500.

- Routing Peer B: Located in London, also advertises the network with metric = 500.

A client located in Paris connects to the NetBird network.

- Latency to Peer A (Frankfurt): 25ms

- Latency to Peer B (London): 60ms

Current Behavior: NetBird sees that both peers have the same metric (500) and may randomly select Peer B as the route. The client's traffic to 192.168.10.0/24 is now routed through the higher-latency London peer.

Proposed Behavior: NetBird would first identify that both peers have the identical best metric (500). It would then compare their latencies (25ms vs. 60ms) and deterministically select Peer A (Frankfurt) as the active route, ensuring the best possible performance for the client.

This feature would also be incredibly powerful for creating anycast-style services over the NetBird network. By advertising a service's network prefix from multiple geographically distributed routing peers with the same metric, this latency-based selection would automatically direct clients to the nearest or most responsive service instance. This is ideal for improving the resilience and performance of internal services, databases, or application gateways.

Originally created by @tkloda on GitHub (Sep 23, 2025). **Is your feature request related to a problem? Please describe.** Yes, the current routing peer selection process can lead to suboptimal network performance. When multiple routing peers advertise the same network with an identical metric, NetBird appears to select the active route randomly. This means a client might be routed through a peer with significantly higher latency, even when a lower-latency peer with the same metric is available. This negatively impacts the performance of latency-sensitive applications and results in an inconsistent user experience. **Describe the solution you'd like** I propose enhancing the routing logic to use latency as a secondary tie-breaker when selecting a routing peer. The new selection process should be: 1) First, select the routing peer(s) with the lowest metric for a given network prefix. 2) If this results in two or more peers with the same metric, NetBird should then measure or use existing latency information for the connection to each of these peers. 3) Finally, it should select the peer with the lowest latency as the active route. This would ensure that traffic is always sent via the most performant path available among equally prioritized peers, leading to a more intelligent and efficient network. **Describe alternatives you've considered** The primary alternative is to manually configure more granular metrics for each peer to force a specific routing path. However, this approach has several drawbacks: 1) Static: It doesn't adapt to dynamic changes in network conditions or latency. If the preferred peer goes down or experiences high latency, manual intervention is required to adjust the metrics. 2) Tedious: It adds significant administrative overhead, especially in larger or more complex networks with many routing peers. 3) Error-prone: Manual configuration can easily lead to mistakes and misconfigurations. **Additional context** Here's a practical example to illustrate the issue and the proposed solution: - Network: 192.168.10.0/24 - Routing Peer A: Located in Frankfurt, advertises the network with metric = 500. - Routing Peer B: Located in London, also advertises the network with metric = 500. A client located in Paris connects to the NetBird network. - Latency to Peer A (Frankfurt): 25ms - Latency to Peer B (London): 60ms Current Behavior: NetBird sees that both peers have the same metric (500) and may randomly select Peer B as the route. The client's traffic to 192.168.10.0/24 is now routed through the higher-latency London peer. Proposed Behavior: NetBird would first identify that both peers have the identical best metric (500). It would then compare their latencies (25ms vs. 60ms) and deterministically select Peer A (Frankfurt) as the active route, ensuring the best possible performance for the client. This feature would also be incredibly powerful for creating anycast-style services over the NetBird network. By advertising a service's network prefix from multiple geographically distributed routing peers with the same metric, this latency-based selection would automatically direct clients to the nearest or most responsive service instance. This is ideal for improving the resilience and performance of internal services, databases, or application gateways.
saavagebueno added the feature-request label 2025-11-20 07:07:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#2297