Intermittent Connection Issues Across Different Regions #1490

Closed
opened 2025-11-20 05:31:36 -05:00 by saavagebueno · 5 comments
Owner

Originally created by @ginsul on GitHub (Dec 10, 2024).

Describe the problem

I had a coordinator server on AWS Virginia and an exit node in Singapore.

When I tried to connect using Windows, I randomly encountered the following conditions:

  • Sometimes, there was always a request timeout to the internet.
  • Sometimes, I got a ping of >400ms (even though my public IP showed Singapore, the latency should have been only ~10ms).
  • Sometimes, I got a ping of 10ms, with my public IP correctly showing Singapore.
  • When I had a 10ms ping and tried to disconnect and reconnect, the connection failed and seemed to hang

I have already searched through the issues for a solution but still have no clue.
I captured logs using netbird -A debug for 2 minutes and have attached them.
I have already opened all inbound traffic in the AWS security group.

netbird.debug.1914716158.zip

Expected behavior

Got a Singapore IP address with a 10ms ping, and the connection no longer hangs when disconnecting and reconnecting

NetBird version

netbird selfhosted 0.34.1

NetBird status -dA output:

Peers detail:
 ip-172-31-5-27.netbird.selfhosted:
  NetBird IP: 100.127.129.240
  Public key: mP047JXbCIyJ+qvk53Fd9HP2sDpRNlhxbXij4AQYQhI=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): host/prflx
  ICE candidate endpoints (Local/Remote): 172.31.12.91:51820/172.31.5.27:51820
  Relay server address: rel://terbang.anon-5Ev56.domain:33080
  Last connection update: 12 minutes, 38 seconds ago
  Last WireGuard handshake: 2 minutes, 19 seconds ago
  Transfer status (received/sent) 940 B/1.5 KiB
  Quantum resistance: false
  Routes: -
  Latency: 380.987µs

 abcd.netbird.selfhosted:
  NetBird IP: 100.127.226.41
  Public key: 83ef9I0e6R6VtuReCzhSbzJ/UPZAHRle1k0T2kriZQA=
  Status: Disconnected
  -- detail --
  Connection type:
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address:
  Last connection update: 5 minutes, 56 seconds ago
  Last WireGuard handshake: -
  Transfer status (received/sent) 0 B/0 B
  Quantum resistance: false
  Routes: -
  Latency: 12.867329ms

OS: linux/amd64
Daemon version: 0.34.1
CLI version: 0.34.1
Management: Connected to https://terbang.anon-5Ev56.domain:33073
Signal: Connected to http://terbang.anon-5Ev56.domain:10000
Relays:
  [stun:terbang.anon-5Ev56.domain:3478] is Available
  [turn:terbang.anon-5Ev56.domain:3478?transport=udp] is Available
  [rel://terbang.anon-5Ev56.domain:33080] is Available
Nameservers:
FQDN: ip-172-31-12-91.netbird.selfhosted
NetBird IP: 100.127.83.39/16
Interface type: Kernel
Quantum resistance: false
Routes: 0.0.0.0/0
Peers count: 1/2 Connected

Screenshots

image

Additional context

Any insights or suggestions would be greatly appreciated. Thank you.

Originally created by @ginsul on GitHub (Dec 10, 2024). **Describe the problem** I had a coordinator server on AWS Virginia and an exit node in Singapore. When I tried to connect using Windows, I randomly encountered the following conditions: - Sometimes, there was always a request timeout to the internet. - Sometimes, I got a ping of >400ms (even though my public IP showed Singapore, the latency should have been only ~10ms). - Sometimes, I got a ping of 10ms, with my public IP correctly showing Singapore. - When I had a 10ms ping and tried to disconnect and reconnect, the connection failed and seemed to hang I have already searched through the issues for a solution but still have no clue. I captured logs using netbird -A debug for 2 minutes and have attached them. I have already opened all inbound traffic in the AWS security group. [netbird.debug.1914716158.zip](https://github.com/user-attachments/files/18082596/netbird.debug.1914716158.zip) **Expected behavior** Got a Singapore IP address with a 10ms ping, and the connection no longer hangs when disconnecting and reconnecting **NetBird version** netbird selfhosted 0.34.1 **NetBird status -dA output:** ``` Peers detail: ip-172-31-5-27.netbird.selfhosted: NetBird IP: 100.127.129.240 Public key: mP047JXbCIyJ+qvk53Fd9HP2sDpRNlhxbXij4AQYQhI= Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): host/prflx ICE candidate endpoints (Local/Remote): 172.31.12.91:51820/172.31.5.27:51820 Relay server address: rel://terbang.anon-5Ev56.domain:33080 Last connection update: 12 minutes, 38 seconds ago Last WireGuard handshake: 2 minutes, 19 seconds ago Transfer status (received/sent) 940 B/1.5 KiB Quantum resistance: false Routes: - Latency: 380.987µs abcd.netbird.selfhosted: NetBird IP: 100.127.226.41 Public key: 83ef9I0e6R6VtuReCzhSbzJ/UPZAHRle1k0T2kriZQA= Status: Disconnected -- detail -- Connection type: ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: Last connection update: 5 minutes, 56 seconds ago Last WireGuard handshake: - Transfer status (received/sent) 0 B/0 B Quantum resistance: false Routes: - Latency: 12.867329ms OS: linux/amd64 Daemon version: 0.34.1 CLI version: 0.34.1 Management: Connected to https://terbang.anon-5Ev56.domain:33073 Signal: Connected to http://terbang.anon-5Ev56.domain:10000 Relays: [stun:terbang.anon-5Ev56.domain:3478] is Available [turn:terbang.anon-5Ev56.domain:3478?transport=udp] is Available [rel://terbang.anon-5Ev56.domain:33080] is Available Nameservers: FQDN: ip-172-31-12-91.netbird.selfhosted NetBird IP: 100.127.83.39/16 Interface type: Kernel Quantum resistance: false Routes: 0.0.0.0/0 Peers count: 1/2 Connected ``` **Screenshots** ![image](https://github.com/user-attachments/assets/e9933990-9715-4eba-b9d5-f5302aefc3de) **Additional context** Any insights or suggestions would be greatly appreciated. Thank you.
saavagebueno added the waiting-feedbacktriage-needed labels 2025-11-20 05:31:36 -05:00
Author
Owner

@ginsul commented on GitHub (Dec 10, 2024):

Hi, just as a temporary solution, this setup seems to be working for now.
I'm still not sure why the latest version 0.34.1 (and other newer versions from 0.30 to 0.34, which I already tested) still have issues.

Current working setup:

Coordinator Server: 0.34.1
Node Server: 0.29.4
Client (Windows): 0.29.4

@ginsul commented on GitHub (Dec 10, 2024): Hi, just as a temporary solution, this setup seems to be working for now. I'm still not sure why the latest version 0.34.1 (and other newer versions from 0.30 to 0.34, which I already tested) still have issues. Current working setup: Coordinator Server: 0.34.1 Node Server: 0.29.4 Client (Windows): 0.29.4
Author
Owner

@ginsul commented on GitHub (Dec 11, 2024):

Hi,
The intermittent issue disappeared with version 0.29.4, but latency remains poor.
I suspect this is because the client is using the Coordinator Server as an Exit Node instead of the actual peer chosen as the Exit Node.

How can we enforce that the client does not use the Coordinator Server as the Exit Node?

@ginsul commented on GitHub (Dec 11, 2024): Hi, The intermittent issue disappeared with version 0.29.4, but latency remains poor. I suspect this is because the client is using the Coordinator Server as an Exit Node instead of the actual peer chosen as the Exit Node. How can we enforce that the client does not use the Coordinator Server as the Exit Node?
Author
Owner

@rihards-simanovics commented on GitHub (Dec 15, 2024):

Hey @ginsul I'm not curtain whether my issue #3042 is similar to yours, but I found that on my network of ~15-18 peers (mix of Ubuntu Linux and couple of windows peers) all of a sudden whenever Windows 11 peer wants to connect to another Linux server with a static public IP almost all peers are Relayed instead of creating P2P connection. This doesn't make sense to me, especially since it used to work fine before. From the new Relay docs it sounds as though the Relayed connection should only exist until a P2P is established but in my case it doesn't happen at all once the network "settles" after all peers are connected. I can only observe this on Windows 11 and MacOS connecting to servers but not between Linux server peers themselves.

@rihards-simanovics commented on GitHub (Dec 15, 2024): Hey @ginsul I'm not curtain whether my issue #3042 is similar to yours, but I found that on my network of ~15-18 peers (mix of Ubuntu Linux and couple of windows peers) all of a sudden whenever Windows 11 peer wants to connect to another Linux server with a static public IP almost all peers are Relayed instead of creating P2P connection. This doesn't make sense to me, especially since it used to work fine before. From the new Relay docs it sounds as though the Relayed connection should only exist until a P2P is established but in my case it doesn't happen at all once the network "settles" after all peers are connected. I can only observe this on Windows 11 and MacOS connecting to servers but not between Linux server peers themselves.
Author
Owner

@nazarewk commented on GitHub (Apr 28, 2025):

Hello @ginsul,

We're currently reviewing our open issues and would like to verify if this problem still exists in the latest NetBird version.

Could you please confirm if the issue is still there?

We may close this issue temporarily if we don't hear back from you within 2 weeks, but feel free to reopen it with updated information.

Thanks for your contribution to improving the project!

@nazarewk commented on GitHub (Apr 28, 2025): Hello @ginsul, We're currently reviewing our open issues and would like to verify if this problem still exists in the [latest NetBird version](https://github.com/netbirdio/netbird/releases). Could you please confirm if the issue is still there? We may close this issue temporarily if we don't hear back from you within **2 weeks**, but feel free to reopen it with updated information. Thanks for your contribution to improving the project!
Author
Owner

@mlsmaycon commented on GitHub (Jun 1, 2025):

closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.

@mlsmaycon commented on GitHub (Jun 1, 2025): closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1490