Per-route latency measurement for client route pick #1464

Open
opened 2025-11-20 05:31:02 -05:00 by saavagebueno · 1 comment
Owner

Originally created by @mohamed-essam on GitHub (Nov 30, 2024).

Is your feature request related to a problem? Please describe.
Currently client connects routes to peers based only on the latency between the client and the peer, this doesn't take into account that some routes could be geographically much further from the peer.

For example if I have a route pointing to a resource in western Europe, and I have two peers that can handle this route, one in west Europe and one in west US, and the client is in east US, the fastest route would be to choose the peer in Europe.

Describe the solution you'd like
Calculate latency for each route from each peer, this information could either be communicated peer-to-peer or through management service keeping a cache of peer-route-latency as reported by each routing peer.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
I understand that this feature could be difficult to implement given the peers know nothing about which ports are open and how to check for latency for a given route, but this possibly could be configured by the user per-route.

For example, when creating the route, the user can choose to use TCP 443 to check latency.

Originally created by @mohamed-essam on GitHub (Nov 30, 2024). **Is your feature request related to a problem? Please describe.** Currently client connects routes to peers based only on the latency between the client and the peer, this doesn't take into account that some routes could be geographically much further from the peer. For example if I have a route pointing to a resource in western Europe, and I have two peers that can handle this route, one in west Europe and one in west US, and the client is in east US, the fastest route would be to choose the peer in Europe. **Describe the solution you'd like** Calculate latency for each route from each peer, this information could either be communicated peer-to-peer or through management service keeping a cache of peer-route-latency as reported by each routing peer. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** I understand that this feature could be difficult to implement given the peers know nothing about which ports are open and how to check for latency for a given route, but this possibly could be configured by the user per-route. For example, when creating the route, the user can choose to use TCP 443 to check latency.
saavagebueno added the feature-requestclientroutes labels 2025-11-20 05:31:02 -05:00
Author
Owner

@mohamed-essam commented on GitHub (Dec 4, 2024):

I suggest the following approach:

  1. [Management] Add new field to Route to specify method of checking latency (Protocol, IP/Domain, Port).
    1. [Dashboard] Add necessary UI.
    2. [Management] Add necessary APIs.
  2. [Client] ServerRouter to calculate latency based on given latency check settings per route.
  3. [Client] Store and send latency reports with sync requests.
  4. [Management] Receive and store latency reports from Peers.
  5. [Management] Send available latency reports with Route objects in pb.
  6. [Client] clientNetwork to include latency reported by peer in route + p2p latency in route selection.

Notes:

  1. Should the latencies be stored in-memory cache on management-side, or stored in Store?
  2. If I understand correctly, sync requests are only sent in the very initial connection to management or when connection is interrupted and restored, would there be a way for client to send updates to management periodically? (in this case for example, it could be when route latency changes significantly?)

Draft data structure diff:

diff --git a/management/proto/management.proto b/management/proto/management.proto
index fe6a828b..5a0dd74c 100644
--- a/management/proto/management.proto
+++ b/management/proto/management.proto
@@ -59,6 +59,12 @@ message EncryptedMessage {
 message SyncRequest {
   // Meta data of the peer
   PeerSystemMeta meta = 1;
+  repeated LatencyReport latencyReport = 2;
+}
+
+message LatencyReport {
+  string RouteID = 1;
+  float Latency = 2;
 }
 
 // SyncResponse represents a state that should be applied to the local peer (e.g. Wiretrustee servers config as well as local peer and remote peers configs)
@@ -351,6 +357,16 @@ message Route {
   string NetID = 7;
   repeated string Domains = 8;
   bool keepRoute = 9;
+  LatencyCheck latencyCheck = 10;
+}
+
+message LatencyCheck {
+  bool Enabled = 1;
+  string Protocol = 2;
+  string Domain = 3;
+  string IP = 4;
+  uint16 Port = 5;
+  float Latency = 6;
 }
 
 // DNSConfig represents a dns.Update
diff --git a/route/route.go b/route/route.go
index e23801e6..71bcbe72 100644
--- a/route/route.go
+++ b/route/route.go
@@ -45,10 +45,18 @@ const (
        DomainNetwork
 )
 
+const (
+       LatencyICMP LatencyProtocol = "ICMP"
+       LatencyTCP  LatencyProtocol = "TCP"
+       LatencyUDP  LatencyProtocol = "UDP"
+)
+
 type ID string
 
 type NetID string
 
+type LatencyProtocol string
+
 type HAMap map[HAUniqueID][]*Route
 
 // NetworkType route network type
@@ -101,6 +109,15 @@ type Route struct {
        Enabled             bool
        Groups              []string `gorm:"serializer:json"`
        AccessControlGroups []string `gorm:"serializer:json"`
+       LatencyCheck        LatencyCheck
+}
+
+type LatencyCheck struct {
+       Enabled  bool
+       Protocol LatencyProtocol
+       Domain   string
+       IP       netip.Addr
+       Port     uint16
 }
 
 // EventMeta returns activity event meta related to the route
@@ -125,6 +142,7 @@ func (r *Route) Copy() *Route {
                Enabled:             r.Enabled,
                Groups:              slices.Clone(r.Groups),
                AccessControlGroups: slices.Clone(r.AccessControlGroups),
+               LatencyCheck:        r.LatencyCheck,
        }
        return route
 }
@@ -150,7 +168,8 @@ func (r *Route) IsEqual(other *Route) bool {
                other.Enabled == r.Enabled &&
                slices.Equal(r.Groups, other.Groups) &&
                slices.Equal(r.PeerGroups, other.PeerGroups) &&
-               slices.Equal(r.AccessControlGroups, other.AccessControlGroups)
+               slices.Equal(r.AccessControlGroups, other.AccessControlGroups) &&
+               r.LatencyCheck == other.LatencyCheck
 }
@mohamed-essam commented on GitHub (Dec 4, 2024): I suggest the following approach: 1. [Management] Add new field to Route to specify method of checking latency (Protocol, IP/Domain, Port). 1. [Dashboard] Add necessary UI. 2. [Management] Add necessary APIs. 3. [Client] ServerRouter to calculate latency based on given latency check settings per route. 4. [Client] Store and send latency reports with sync requests. 5. [Management] Receive and store latency reports from Peers. 6. [Management] Send available latency reports with Route objects in pb. 7. [Client] clientNetwork to include latency reported by peer in route + p2p latency in route selection. Notes: 1. Should the latencies be stored in-memory cache on management-side, or stored in Store? 2. If I understand correctly, sync requests are only sent in the very initial connection to management or when connection is interrupted and restored, would there be a way for client to send updates to management periodically? (in this case for example, it could be when route latency changes significantly?) Draft data structure diff: ```diff diff --git a/management/proto/management.proto b/management/proto/management.proto index fe6a828b..5a0dd74c 100644 --- a/management/proto/management.proto +++ b/management/proto/management.proto @@ -59,6 +59,12 @@ message EncryptedMessage { message SyncRequest { // Meta data of the peer PeerSystemMeta meta = 1; + repeated LatencyReport latencyReport = 2; +} + +message LatencyReport { + string RouteID = 1; + float Latency = 2; } // SyncResponse represents a state that should be applied to the local peer (e.g. Wiretrustee servers config as well as local peer and remote peers configs) @@ -351,6 +357,16 @@ message Route { string NetID = 7; repeated string Domains = 8; bool keepRoute = 9; + LatencyCheck latencyCheck = 10; +} + +message LatencyCheck { + bool Enabled = 1; + string Protocol = 2; + string Domain = 3; + string IP = 4; + uint16 Port = 5; + float Latency = 6; } // DNSConfig represents a dns.Update diff --git a/route/route.go b/route/route.go index e23801e6..71bcbe72 100644 --- a/route/route.go +++ b/route/route.go @@ -45,10 +45,18 @@ const ( DomainNetwork ) +const ( + LatencyICMP LatencyProtocol = "ICMP" + LatencyTCP LatencyProtocol = "TCP" + LatencyUDP LatencyProtocol = "UDP" +) + type ID string type NetID string +type LatencyProtocol string + type HAMap map[HAUniqueID][]*Route // NetworkType route network type @@ -101,6 +109,15 @@ type Route struct { Enabled bool Groups []string `gorm:"serializer:json"` AccessControlGroups []string `gorm:"serializer:json"` + LatencyCheck LatencyCheck +} + +type LatencyCheck struct { + Enabled bool + Protocol LatencyProtocol + Domain string + IP netip.Addr + Port uint16 } // EventMeta returns activity event meta related to the route @@ -125,6 +142,7 @@ func (r *Route) Copy() *Route { Enabled: r.Enabled, Groups: slices.Clone(r.Groups), AccessControlGroups: slices.Clone(r.AccessControlGroups), + LatencyCheck: r.LatencyCheck, } return route } @@ -150,7 +168,8 @@ func (r *Route) IsEqual(other *Route) bool { other.Enabled == r.Enabled && slices.Equal(r.Groups, other.Groups) && slices.Equal(r.PeerGroups, other.PeerGroups) && - slices.Equal(r.AccessControlGroups, other.AccessControlGroups) + slices.Equal(r.AccessControlGroups, other.AccessControlGroups) && + r.LatencyCheck == other.LatencyCheck } ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1464