Direct Connection should check if connection actually is there #285

Closed
opened 2025-11-20 05:09:06 -05:00 by saavagebueno · 9 comments
Owner

Originally created by @CommanderRedYT on GitHub (Mar 8, 2023).

Describe the problem
Client A (Laptop) wants to connect to Client B (Server). Client B has a public IP, so netbird selects Direct Connection. However, ufw blocks the port (51820 as found in /var/log/netbird/client.log), so ping, ssh,.. don't work. As soon as I type ufw allow 51820, everything works as intended.

To Reproduce
Steps to reproduce the behavior:

  1. Connect to a Client via Direct Connection and ufw enabled
  2. Try to ping
  3. Allow port from client.log
  4. Try to ping again

Expected behavior
Check if direct connection actually works or if hole-punching or something else is needed.

NetBird status -d output:

[root@CENSORED netbird]# netbird status -d
Peers detail:
 ...

 myserver.netbird.cloud:
  NetBird IP: 100.foo.bar.faz
  Public key: Y19yoxTrPamPCHeQyb6cQTWwom/Bo7+KGHH9fEcK0BI=
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/host
  Last connection update: 2023-03-08 09:48:44

 ...

Daemon version: 0.14.2
CLI version: 0.14.2
Management: Connected to https://api.wiretrustee.com:443
Signal: Connected to https://signal2.wiretrustee.com:443
FQDN: myclient.netbird.cloud
NetBird IP: 100.....
Interface type: Kernel
Peers count: 8/9 Connected

Additional context
This for some reason only happens for my VPS, my homeserver, which also has a public IP, does not have this problem. Maybe this is because it is behind a router?

Laptop Netbird Version: 0.14.2
Server (VPS) Netbird Version: 0.14.2
Server (Home) Netbird Version: 0.14.2

Originally created by @CommanderRedYT on GitHub (Mar 8, 2023). **Describe the problem** Client A (Laptop) wants to connect to Client B (Server). Client B has a public IP, so netbird selects Direct Connection. However, ufw blocks the port (**51820** as found in `/var/log/netbird/client.log`), so ping, ssh,.. don't work. As soon as I type `ufw allow 51820`, everything works as intended. **To Reproduce** Steps to reproduce the behavior: 1. Connect to a Client via Direct Connection and ufw enabled 2. Try to ping 3. Allow port from client.log 4. Try to ping again **Expected behavior** Check if direct connection actually works or if hole-punching or something else is needed. **NetBird status -d output:** ```shell [root@CENSORED netbird]# netbird status -d Peers detail: ... myserver.netbird.cloud: NetBird IP: 100.foo.bar.faz Public key: Y19yoxTrPamPCHeQyb6cQTWwom/Bo7+KGHH9fEcK0BI= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/host Last connection update: 2023-03-08 09:48:44 ... Daemon version: 0.14.2 CLI version: 0.14.2 Management: Connected to https://api.wiretrustee.com:443 Signal: Connected to https://signal2.wiretrustee.com:443 FQDN: myclient.netbird.cloud NetBird IP: 100..... Interface type: Kernel Peers count: 8/9 Connected ``` **Additional context** This for some reason only happens for my VPS, my homeserver, which also has a public IP, does not have this problem. Maybe this is because it is behind a router? Laptop Netbird Version: 0.14.2 Server (VPS) Netbird Version: 0.14.2 Server (Home) Netbird Version: 0.14.2
saavagebueno added the waiting-feedback label 2025-11-20 05:09:06 -05:00
Author
Owner

@mlsmaycon commented on GitHub (Jun 16, 2023):

Since release 0.20.0 we've updated the core connectivity layer and improved the direct connection check. Can you test the latest version and see if the issue still exists?

@mlsmaycon commented on GitHub (Jun 16, 2023): Since release 0.20.0 we've updated the core connectivity layer and improved the direct connection check. Can you test the latest version and see if the issue still exists?
Author
Owner

@galexrt commented on GitHub (Aug 8, 2023):

This still seems to happen. We have people behind DSL routers (without static IPv4 nor static IPv6 addresses) showing as direct: true and thus not being able to connect with any other peers.
Is there anyway to force direct: false? Could we get a flag to force it?

This is with Netbird server + client version 0.22.3.

Status detail output of a client that shouldn't be direct:

[...]
Status: Connected
-- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/srflx
[...]

(More info about that client: a windows PC with only private IPs)

It says connected but it isn't.., it should be using the coturn server.

@galexrt commented on GitHub (Aug 8, 2023): This still seems to happen. We have people behind DSL routers (without static IPv4 nor static IPv6 addresses) showing as `direct: true` and thus not being able to connect with any other peers. Is there anyway to force `direct: false`? Could we get a flag to force it? This is with Netbird server + client version 0.22.3. Status detail output of a client that shouldn't be direct: ``` [...] Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/srflx [...] ``` (More info about that client: a windows PC with only private IPs) It says connected but it isn't.., it should be using the coturn server.
Author
Owner

@galexrt commented on GitHub (Aug 15, 2023):

I'm not going to call it a definite fix (more info below), but in my case making the following code change, makes the srflx clients are not set as direct: true:

diff --git a/client/internal/peer/conn.go b/client/internal/peer/conn.go
index 9247ed3..5e69cac 100644
--- a/client/internal/peer/conn.go
+++ b/client/internal/peer/conn.go
@@ -348,7 +348,7 @@ func (conn *Conn) Open() error {
 }

 func isRelayCandidate(candidate ice.Candidate) bool {
-       return candidate.Type() == ice.CandidateTypeRelay
+       return candidate.Type() == ice.CandidateTypeRelay || candidate.Type() == ice.CandidateTypeServerReflexive
 }

 // configureConnection starts proxying traffic from/to local Wireguard and sets connection status to StatusConnected

I have tried to read up on RTC and ICE candidate states to better understand the srflx state. I don't fully understand it, but so far, with all the clients that are put into srflx state, it was seemingly always wrongly chosen for the "small set" (roughly 12-15+) of clients/nodes I use Netbird on.

  • Is this the "right" direction to relay for both candidate types?
  • What about adding a config option/flag on the client side for a client to always use the relay?
@galexrt commented on GitHub (Aug 15, 2023): I'm not going to call it a definite fix (more info below), but in my case making the following code change, makes the `srflx` clients are not set as `direct: true`: ``` diff --git a/client/internal/peer/conn.go b/client/internal/peer/conn.go index 9247ed3..5e69cac 100644 --- a/client/internal/peer/conn.go +++ b/client/internal/peer/conn.go @@ -348,7 +348,7 @@ func (conn *Conn) Open() error { } func isRelayCandidate(candidate ice.Candidate) bool { - return candidate.Type() == ice.CandidateTypeRelay + return candidate.Type() == ice.CandidateTypeRelay || candidate.Type() == ice.CandidateTypeServerReflexive } // configureConnection starts proxying traffic from/to local Wireguard and sets connection status to StatusConnected ``` I have tried to read up on RTC and ICE candidate states to better understand the `srflx` state. I don't fully understand it, but so far, with all the clients that are put into `srflx` state, it was seemingly always wrongly chosen for the "small set" (roughly 12-15+) of clients/nodes I use Netbird on. * Is this the "right" direction to relay for both candidate types? * What about adding a config option/flag on the client side for a client to always use the relay?
Author
Owner

@mlsmaycon commented on GitHub (Aug 15, 2023):

Hello @galexrt,

The term srflx refers to a public IP address and port combination (IP:PORT). This is the address obtained when a device reaches out to a STUN/TURN server to learn about its public-facing address. It's crucial for enabling direct connections between two devices over the internet, especially when they're behind certain network configurations like NAT.

To make things more efficient, starting from version 0.19, we've decided to use the same port as Wireguard (a VPN protocol) for these discovery processes. This way, we can utilize the srflx address directly, eliminating the need for any intermediary proxy.

So currently, assuming both peers are using a more recent version, the srflx would work just fine. But it depends on the firewall between peers.

That said, there is an environment variable available to force use of Relay:

NB_ICE_FORCE_RELAY_CONN=true

To configure it on Windows, unfortunately you have to use regedit and insert it as value in:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netbird\

With a multi-strings Key named Environment, see example below:

image

Once that is created, you can restart the NetBird service with:

netbird service stop
netbird service start
@mlsmaycon commented on GitHub (Aug 15, 2023): Hello @galexrt, The term `srflx` refers to a public IP address and port combination (IP:PORT). This is the address obtained when a device reaches out to a STUN/TURN server to learn about its public-facing address. It's crucial for enabling direct connections between two devices over the internet, especially when they're behind certain network configurations like NAT. To make things more efficient, starting from version 0.19, we've decided to use the same port as Wireguard (a VPN protocol) for these discovery processes. This way, we can utilize the `srflx` address directly, eliminating the need for any intermediary proxy. So currently, assuming both peers are using a more recent version, the srflx would work just fine. But it depends on the firewall between peers. That said, there is an environment variable available to force use of Relay: `NB_ICE_FORCE_RELAY_CONN=true` To configure it on Windows, unfortunately you have to use regedit and insert it as value in: `HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netbird\` With a multi-strings Key named `Environment`, see example below: <img width="515" alt="image" src="https://github.com/netbirdio/netbird/assets/7747744/54ce02eb-f8f3-49fd-b747-0311fb437e59"> Once that is created, you can restart the NetBird service with: ```shell netbird service stop netbird service start ```
Author
Owner

@galexrt commented on GitHub (Aug 19, 2023):

Thank you for the info about the env var and info for windows clients as well!


The term srflx refers to a public IP address and port combination (IP:PORT). This is the address obtained when a device reaches out to a STUN/TURN server to learn about its public-facing address. It's crucial for enabling direct connections between two devices over the internet, especially when they're behind certain network configurations like NAT.

I see, so it seems that the public IP discovered is correct, but because the clients are behind a "typical ISP IPv4 NAT" /"shared IPv4 address" where only the ISP controls the firewall the port can't work.
So why would the system assume that the port works? Maybe the ISP firewall is messing up these results?

Is there a way to actually make sure that the "public IP + port" is actually reaching the VPN client so this issue wouldn't arise?
Or am I misunderstanding what you wrote in regards to the discovery/usage of the public IP?

@galexrt commented on GitHub (Aug 19, 2023): Thank you for the info about the env var and info for windows clients as well! *** > The term srflx refers to a public IP address and port combination (IP:PORT). This is the address obtained when a device reaches out to a STUN/TURN server to learn about its public-facing address. It's crucial for enabling direct connections between two devices over the internet, especially when they're behind certain network configurations like NAT. I see, so it seems that the public IP discovered is correct, but because the clients are behind a "typical ISP IPv4 NAT" /"shared IPv4 address" where only the ISP controls the firewall the port can't work. So why would the system assume that the port works? Maybe the ISP firewall is messing up these results? Is there a way to actually make sure that the "public IP + port" is actually reaching the VPN client so this issue wouldn't arise? Or am I misunderstanding what you wrote in regards to the discovery/usage of the public IP?
Author
Owner

@CommanderRedYT commented on GitHub (Aug 19, 2023):

I guess most of the people either have CG-Nat, so ipv6 on user end and shared ipv4 on ISP end. For us in Austria, we have the option to have the ipv6 disabled but instead have a public ipv4 for the modem.

@CommanderRedYT commented on GitHub (Aug 19, 2023): I guess most of the people either have CG-Nat, so ipv6 on user end and shared ipv4 on ISP end. For us in Austria, we have the option to have the ipv6 disabled but instead have a public ipv4 for the modem.
Author
Owner

@CertainLach commented on GitHub (Apr 1, 2024):

I have a laptop (192.168.1.200) and a nas (192.168.1.100), both in the same local network, behind NAT, behind CGNAT.

Usually, this setup works perfectly, direct connection is estabilished between those two peers

Fragment of netbird status -d called from the laptop:

 nas.my.vpn:
  NetBird IP: 100.103.11.12
  Public key: vIy9n3pE7knxL9tSm11eyhbxnLe1sxAojzYPSvb/RHs=
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/prflx
  ICE candidate endpoints (Local/Remote): 192.168.1.200:51820/192.168.1.100:51820
  Last connection update: 2024-03-29 19:35:04
  Last WireGuard handshake: 2024-03-29 19:37:09
  Transfer status (received/sent) 1.7 MiB/108.6 KiB
  Quantum resistance: true
  Routes: -

However, sometimes, after the laptop was asleep... Netbird decides to try another route, which doesn't work

nas.my.vpn:
 NetBird IP: 100.103.11.12
 Public key: vIy9n3pE7knxL9tSm11eyhbxnLe1sxAojzYPSvb/RHs=
 Status: Connected
 -- detail --
 Connection type: P2P
 Direct: true
 ICE candidate (Local/Remote): srflx/prflx
 ICE candidate endpoints (Local/Remote): 109.111.114.115:51820/192.168.1.100:51820
 Last connection update: 2024-03-29 22:42:38
 Last WireGuard handshake: 2024-03-29 22:44:49
 Transfer status (received/sent) 223.8 KiB/277.1 KiB
 Quantum resistance: true
 Routes: -

109.111.114.115 here is my public IP.
At first I tough it is just happens that somehow udp hole punching works here, and it is somehow port 51820 is being preserved after two layers of NAT, but I have set up tcpdump on the stun server (Which belongs to me, and lies outside of this CGNAT), and the received connection IP:PORT is never 109.111.114.115:51820, so this route is clearly wrong and will never work, yet this connection is somehow marked as connected.

(Identifiers here are anonymized, this isn't my real public IP/pubkey/netbird ips/NAT IPs)

System information:

Daemon version: 0.26.3
CLI version: 0.26.3
Management: Connected to https://my.netbird.mydomain.com:443
Signal: Connected to https://my.netbird.mydomain.com:443
Relays: 
  [stun:stun0.netbird.mydomain.com:3478] is Available
  [turn:turn0.netbird.mydomain.com?transport=udp] is Available
Nameservers: 
FQDN: laptop.my.vpn
NetBird IP: 100.103.20.30/16
Interface type: Kernel
Quantum resistance: true (permissive)
Routes: -
Peers count: 12/14 Connected
@CertainLach commented on GitHub (Apr 1, 2024): I have a laptop (192.168.1.200) and a nas (192.168.1.100), both in the same local network, behind NAT, behind CGNAT. Usually, this setup works perfectly, direct connection is estabilished between those two peers Fragment of `netbird status -d` called from the laptop: ``` nas.my.vpn: NetBird IP: 100.103.11.12 Public key: vIy9n3pE7knxL9tSm11eyhbxnLe1sxAojzYPSvb/RHs= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/prflx ICE candidate endpoints (Local/Remote): 192.168.1.200:51820/192.168.1.100:51820 Last connection update: 2024-03-29 19:35:04 Last WireGuard handshake: 2024-03-29 19:37:09 Transfer status (received/sent) 1.7 MiB/108.6 KiB Quantum resistance: true Routes: - ``` However, sometimes, after the laptop was asleep... Netbird decides to try another route, which doesn't work ``` nas.my.vpn: NetBird IP: 100.103.11.12 Public key: vIy9n3pE7knxL9tSm11eyhbxnLe1sxAojzYPSvb/RHs= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): srflx/prflx ICE candidate endpoints (Local/Remote): 109.111.114.115:51820/192.168.1.100:51820 Last connection update: 2024-03-29 22:42:38 Last WireGuard handshake: 2024-03-29 22:44:49 Transfer status (received/sent) 223.8 KiB/277.1 KiB Quantum resistance: true Routes: - ``` 109.111.114.115 here is my public IP. At first I tough it is just happens that somehow udp hole punching works here, and it is somehow port 51820 is being preserved after two layers of NAT, but I have set up tcpdump on the stun server (Which belongs to me, and lies outside of this CGNAT), and the received connection IP:PORT is never 109.111.114.115:51820, so this route is clearly wrong and will never work, yet this connection is somehow marked as connected. (Identifiers here are anonymized, this isn't my real public IP/pubkey/netbird ips/NAT IPs) System information: ``` Daemon version: 0.26.3 CLI version: 0.26.3 Management: Connected to https://my.netbird.mydomain.com:443 Signal: Connected to https://my.netbird.mydomain.com:443 Relays: [stun:stun0.netbird.mydomain.com:3478] is Available [turn:turn0.netbird.mydomain.com?transport=udp] is Available Nameservers: FQDN: laptop.my.vpn NetBird IP: 100.103.20.30/16 Interface type: Kernel Quantum resistance: true (permissive) Routes: - Peers count: 12/14 Connected ```
Author
Owner

@nazarewk commented on GitHub (Sep 4, 2024):

I think I'm observing the same issue after getting my RutOS mobile router behind CG-NAT to connect to Netbird Cloud in course of #2530 :

  1. connectivity to Netbird Cloud is enabled with with NB_DISABLE_CUSTOM_ROUTING=true (actually NB_SKIP_SOCKET_MARK=true in patched version) )
  2. netbird status -d doesn't show anything after Last WireGuard handshake: for non-LAN peers
  3. adding NB_ICE_FORCE_RELAY_CONN=true helps with WAN connectivity (from my home network and cloud server) , but now LAN peers communicate over internet,
@nazarewk commented on GitHub (Sep 4, 2024): I think I'm observing the same issue after getting my RutOS mobile router behind CG-NAT to connect to Netbird Cloud in course of #2530 : 1. connectivity to Netbird Cloud is enabled with with `NB_DISABLE_CUSTOM_ROUTING=true` (actually `NB_SKIP_SOCKET_MARK=true` in [patched version)](https://github.com/netbirdio/netbird/commit/885a5744ab9fff3d542d6aaeb29f3323eadd38eb) ) 2. `netbird status -d` doesn't show anything after `Last WireGuard handshake:` for non-LAN peers 3. adding `NB_ICE_FORCE_RELAY_CONN=true` helps with WAN connectivity (from my home network and cloud server) , but now LAN peers communicate over internet,
Author
Owner

@mlsmaycon commented on GitHub (Jun 1, 2025):

closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.

@mlsmaycon commented on GitHub (Jun 1, 2025): closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#285