Netbird won't reconnect to other windows peer on reboot until I ping the peer #1405

Open
opened 2025-11-20 05:29:44 -05:00 by saavagebueno · 12 comments
Owner

Originally created by @Mikesco3 on GitHub (Nov 7, 2024).

Describe the problem

Windows clients can't reach peers untill I ping them.
I believe the issue is prevalent after reboots and I haven't het monitored if it drops later during the day.

To Reproduce

Steps to reproduce the behavior:

  1. Reboot the computer
  2. When it comes back on, I can't reconnect to the windows peer even if I wait 5-15 minutes
  3. When I ping it, there is a 3 - 5 second delay but then the peer replies to pings and I can access it normally after that.
  4. If either one reboots, were back to the same issue.

Are you using NetBird Cloud?

I'm using the NetBird SelfHosted control plane.

NetBird version

0.31.0

  • I have also made sure they are all running the latest version of netbird.
  • Windows 11 Pro

Expected behavior

A clear and concise description of what you expected to happen.

I have a few computers connected on different locations and I'm using Netbird to connect them together.

Installed the Netbird client and joined it to the control pane and did the same on other windows PCs across the internet

When I reboot the computer, I cannot reach the other Peer across netbird.
What seems to help is if I ping the computer, and then after that I can reach the computer.
I haven't checked if after a while it drops or not.

What I would hope is that if I have another windows computer connected from a different location via netbird, I would be able to reconnect to it, even if I have to wait for a bit untill the other services come back online (at least that works with zerotier and other similar products)

Screenshots

If applicable, add screenshots to help explain your problem.
image


Additional context

Then if I ping the peer I'm trying to access, after a (3 to 5 second) pause it starts replying to pings and then I can connect to it fine

ping server-cr

Slight pause (3 to 5 seconds)

Pinging server-cr.netbird.selfhosted [100.65.118.84] with 32 bytes of data:
Reply from 100.65.118.84: bytes=32 time=22ms TTL=128
Reply from 100.65.118.84: bytes=32 time=22ms TTL=128
Reply from 100.65.118.84: bytes=32 time=21ms TTL=128
Reply from 100.65.118.84: bytes=32 time=23ms TTL=128

Ping statistics for 100.65.118.84:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 21ms, Maximum = 23ms, Average = 22ms

Then we can connect to the Peer Windows PC with no issues...


Attachments

I've attached a copy of my config files:

  • My Server's docker-compose.yml
  • My Server's management.json
  • One of my Windows Netbird App config.json
  • A screenshot of the connection error

NetBird status -dA output:

If applicable, add the `netbird status -dA' command output.

I grabbed the output while the computer wasn't having the connection issues and before I pinged the peer.

> NetBird status -dA
Peers detail:
 server-nl.netbird.selfhosted:
  NetBird IP: 100.65.114.233
  Public key: MyRandomGibberishKey
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): host/host
  ICE candidate endpoints (Local/Remote): 127.0.0.1:51820/192.168.1.2:51820
  Relay server address: rels://netbird.anon-rvQ2R.domain:443
  Last connection update: 1 minute, 41 seconds ago
  Last WireGuard handshake: 1 minute, 36 seconds ago
  Transfer status (received/sent) 392 B/396 B
  Quantum resistance: false
  Routes: -
  Latency: 570µs

 server-cr.netbird.selfhosted:
  NetBird IP: 100.65.118.84
  Public key: MyRandomGibberishKey
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): srflx/srflx
  ICE candidate endpoints (Local/Remote): 198.51.100.0:27911/198.51.100.1:1032
  Relay server address: rels://netbird.anon-rvQ2R.domain:443
  Last connection update: 1 minute, 41 seconds ago
  Last WireGuard handshake: 1 minute, 36 seconds ago
  Transfer status (received/sent) 1.2 KiB/548 B
  Quantum resistance: false
  Routes: -
  Latency: 21.9377ms

 beth-lt.netbird.selfhosted:
  NetBird IP: 100.65.130.238
  Public key: MyRandomGibberishKey
  Status: Disconnected
  -- detail --
  Connection type:
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address:
  Last connection update: -
  Last WireGuard handshake: -
  Transfer status (received/sent) 0 B/0 B
  Quantum resistance: false
  Routes: -
  Latency: 0s

OS: windows/amd64
Daemon version: 0.31.0
CLI version: 0.31.0
Management: Connected to https://netbird.anon-rvQ2R.domain:443
Signal: Connected to https://netbird.anon-rvQ2R.domain:443
Relays:
  [stun:netbird.anon-rvQ2R.domain:3478] is Available
  [turn:netbird.anon-rvQ2R.domain:3478?transport=udp] is Available
  [rels://netbird.anon-rvQ2R.domain:443] is Available
Nameservers:
FQDN: beth-nlvm.netbird.selfhosted
NetBird IP: 100.65.55.28/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 2/3 Connected
Originally created by @Mikesco3 on GitHub (Nov 7, 2024). ## **Describe the problem** Windows clients can't reach peers untill I ping them. I believe the issue is prevalent after reboots and I haven't het monitored if it drops later during the day. ## **To Reproduce** Steps to reproduce the behavior: 1. Reboot the computer 2. When it comes back on, I can't reconnect to the windows peer even if I wait 5-15 minutes 3. When I ping it, there is a 3 - 5 second delay but then the peer replies to pings and I can access it normally after that. 4. If either one reboots, were back to the same issue. ### **Are you using NetBird Cloud?** I'm using the NetBird SelfHosted control plane. ### **NetBird version** `0.31.0` - I have also made sure they are all running the latest version of netbird. - Windows 11 Pro ## **Expected behavior** A clear and concise description of what you expected to happen. I have a few computers connected on different locations and I'm using Netbird to connect them together. Installed the Netbird client and joined it to the control pane and did the same on other windows PCs across the internet When I reboot the computer, I cannot reach the other Peer across netbird. What seems to help is if I ping the computer, and then after that I can reach the computer. I haven't checked if after a while it drops or not. What I would hope is that if I have another windows computer connected from a different location via netbird, I would be able to reconnect to it, even if I have to wait for a bit untill the other services come back online (at least that works with zerotier and other similar products) ### **Screenshots** If applicable, add screenshots to help explain your problem. ![image](https://github.com/user-attachments/assets/868a1c79-4e44-48f4-ab35-0772af6dc603) ___ ### **Additional context** Then if I ping the peer I'm trying to access, after a (3 to 5 second) pause it starts replying to pings and then I can connect to it fine ``` ping server-cr ``` _Slight pause (3 to 5 seconds)_ ``` Pinging server-cr.netbird.selfhosted [100.65.118.84] with 32 bytes of data: Reply from 100.65.118.84: bytes=32 time=22ms TTL=128 Reply from 100.65.118.84: bytes=32 time=22ms TTL=128 Reply from 100.65.118.84: bytes=32 time=21ms TTL=128 Reply from 100.65.118.84: bytes=32 time=23ms TTL=128 Ping statistics for 100.65.118.84: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 21ms, Maximum = 23ms, Average = 22ms ``` Then we can connect to the Peer Windows PC with no issues... ___ ### Attachments I've attached a copy of my config files: - My Server's `docker-compose.yml` - My Server's `management.json` - One of my Windows Netbird App `config.json` - A screenshot of the connection error ___ ### **NetBird status -dA output:** If applicable, add the `netbird status -dA' command output. _I grabbed the output while the computer wasn't having the connection issues and before I pinged the peer._ ``` > NetBird status -dA Peers detail: server-nl.netbird.selfhosted: NetBird IP: 100.65.114.233 Public key: MyRandomGibberishKey Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): host/host ICE candidate endpoints (Local/Remote): 127.0.0.1:51820/192.168.1.2:51820 Relay server address: rels://netbird.anon-rvQ2R.domain:443 Last connection update: 1 minute, 41 seconds ago Last WireGuard handshake: 1 minute, 36 seconds ago Transfer status (received/sent) 392 B/396 B Quantum resistance: false Routes: - Latency: 570µs server-cr.netbird.selfhosted: NetBird IP: 100.65.118.84 Public key: MyRandomGibberishKey Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:27911/198.51.100.1:1032 Relay server address: rels://netbird.anon-rvQ2R.domain:443 Last connection update: 1 minute, 41 seconds ago Last WireGuard handshake: 1 minute, 36 seconds ago Transfer status (received/sent) 1.2 KiB/548 B Quantum resistance: false Routes: - Latency: 21.9377ms beth-lt.netbird.selfhosted: NetBird IP: 100.65.130.238 Public key: MyRandomGibberishKey Status: Disconnected -- detail -- Connection type: ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: Last connection update: - Last WireGuard handshake: - Transfer status (received/sent) 0 B/0 B Quantum resistance: false Routes: - Latency: 0s OS: windows/amd64 Daemon version: 0.31.0 CLI version: 0.31.0 Management: Connected to https://netbird.anon-rvQ2R.domain:443 Signal: Connected to https://netbird.anon-rvQ2R.domain:443 Relays: [stun:netbird.anon-rvQ2R.domain:3478] is Available [turn:netbird.anon-rvQ2R.domain:3478?transport=udp] is Available [rels://netbird.anon-rvQ2R.domain:443] is Available Nameservers: FQDN: beth-nlvm.netbird.selfhosted NetBird IP: 100.65.55.28/16 Interface type: Userspace Quantum resistance: false Routes: - Peers count: 2/3 Connected ```
saavagebueno added the waiting-feedbacktriage-needed labels 2025-11-20 05:29:44 -05:00
Author
Owner

@mlsmaycon commented on GitHub (Nov 7, 2024):

It seems like a crash. @Mikesco3 can you check if there was any event in the event viewer (system and applications)?

@mlsmaycon commented on GitHub (Nov 7, 2024): It seems like a crash. @Mikesco3 can you check if there was any event in the event viewer (system and applications)?
Author
Owner

@Mikesco3 commented on GitHub (Nov 7, 2024):

I just have 2 Event ID 86

SCEP Certificate enrollment initialization for Local system via https://-KeyId-844db4655e5dcb9f989f2082a7662b019449d1bd.microsoftaik.azure.net/templates/Aik/scep failed:

GetCACaps

Method: GET(172ms)
Stage: GetCACaps
The connection with the server was terminated abnormally 0x80072efe (WinHttp: 12030 ERROR_WINHTTP_CONNECTION_ERROR)

- System 

  - Provider 

   [ Name]  Microsoft-Windows-CertificateServicesClient-CertEnroll 
   [ Guid]  {54164045-7C50-4905-963F-E5BC1EEF0CCA} 
   [ EventSourceName]  CertEnroll 
 
  - EventID 86 

   [ Qualifiers]  49754 
 
   Version 0 
 
   Level 2 
 
   Task 0 
 
   Opcode 0 
 
   Keywords 0x80000000000000 
 
  - TimeCreated 

   [ SystemTime]  2024-11-07T18:35:02.8888095Z 
 
   EventRecordID 12728 
 
   Correlation 
 
  - Execution 

   [ ProcessID]  5048 
   [ ThreadID]  0 
 
   Channel Application 
 
   Computer BethNL-VM 
 
  - Security 

   [ UserID]  S-1-5-18 
 

- EventData 

  Context WORKGROUP\BETHNL-VM$ 
  Url https://-KeyId-844db4655e5dcb9f989f2082a7662b019449d1bd.microsoftaik.azure.net/templates/Aik/scep 
  MessageText GetCACaps  
  Method GET(62ms) 
  Stage GetCACaps 
  ErrorCode The connection with the server was terminated abnormally 0x80072efe (WinHttp: 12030 ERROR_WINHTTP_CONNECTION_ERROR) 

@Mikesco3 commented on GitHub (Nov 7, 2024): I just have 2 Event ID 86 ``` SCEP Certificate enrollment initialization for Local system via https://-KeyId-844db4655e5dcb9f989f2082a7662b019449d1bd.microsoftaik.azure.net/templates/Aik/scep failed: GetCACaps Method: GET(172ms) Stage: GetCACaps The connection with the server was terminated abnormally 0x80072efe (WinHttp: 12030 ERROR_WINHTTP_CONNECTION_ERROR) ``` ___ ``` - System - Provider [ Name] Microsoft-Windows-CertificateServicesClient-CertEnroll [ Guid] {54164045-7C50-4905-963F-E5BC1EEF0CCA} [ EventSourceName] CertEnroll - EventID 86 [ Qualifiers] 49754 Version 0 Level 2 Task 0 Opcode 0 Keywords 0x80000000000000 - TimeCreated [ SystemTime] 2024-11-07T18:35:02.8888095Z EventRecordID 12728 Correlation - Execution [ ProcessID] 5048 [ ThreadID] 0 Channel Application Computer BethNL-VM - Security [ UserID] S-1-5-18 - EventData Context WORKGROUP\BETHNL-VM$ Url https://-KeyId-844db4655e5dcb9f989f2082a7662b019449d1bd.microsoftaik.azure.net/templates/Aik/scep MessageText GetCACaps Method GET(62ms) Stage GetCACaps ErrorCode The connection with the server was terminated abnormally 0x80072efe (WinHttp: 12030 ERROR_WINHTTP_CONNECTION_ERROR) ```
Author
Owner

@mlsmaycon commented on GitHub (Nov 7, 2024):

Could you please run the following command in an elevated powershell:

[System.Environment]::SetEnvironmentVariable('NB_WINDOWS_PANIC_LOG', "$env:ProgramData\netbird\netbird.err", 'Machine')

then try to reproduce the issue? If you can reproduce it, you should see a file in C:\ProgramData\netbird\netbird.err. Please share it with us.

@mlsmaycon commented on GitHub (Nov 7, 2024): Could you please run the following command in an elevated powershell: ``` [System.Environment]::SetEnvironmentVariable('NB_WINDOWS_PANIC_LOG', "$env:ProgramData\netbird\netbird.err", 'Machine') ``` then try to reproduce the issue? If you can reproduce it, you should see a file in C:\ProgramData\netbird\netbird.err. Please share it with us.
Author
Owner

@Mikesco3 commented on GitHub (Nov 7, 2024):

  1. I ran your string in an elevanted powershell...
  2. I killed netbird `taskkill /f /im netbi*"
  3. restarted netbird (still no connection) (your file was still empty)
  4. rebooted the computer
  5. attempted to open the UNC path (network drive).

I get the Error where windows cannot reach the computer

  1. ping the computer and after a 3 to 5 second delay, the peer responds.
  2. I can access the Peer's network share normally
  3. Your netbird.err file is still empty
  4. The windows Logs \ Applcation has two new Event ID 86 as I just posted.
    I also have some Information stuff in the system part of the windows logs:

Attempted to reserve URL http://*:5357/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM
Attempted to reserve URL http://+:80/Temporary_Listen_Addresses/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM
Attempted to reserve URL http://+:80/Temporary_Listen_Addresses/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM
Attempted to reserve URL https://+:5986/wsman/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM

etc...

Create URL group 0xFE00000220000001. Status 0x0. Process Id 0xAA8 Executable path \Device\HarddiskVolume3\Windows\System32\svchost.exe, User LOCAL SERVICE

etc...

@Mikesco3 commented on GitHub (Nov 7, 2024): 1. I ran your string in an elevanted powershell... 2. I killed netbird `taskkill /f /im netbi*" 3. restarted netbird (still no connection) (your file was still empty) 4. rebooted the computer 5. attempted to open the UNC path (network drive). > I get the Error where windows cannot reach the computer 6. ping the computer and after a 3 to 5 second delay, the peer responds. 7. I can access the Peer's network share normally 8. Your `netbird.err` file is still empty 9. The windows Logs \ Applcation has two new Event ID 86 as I just posted. I also have some Information stuff in the system part of the windows logs: ___ ``` Attempted to reserve URL http://*:5357/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM ``` ``` Attempted to reserve URL http://+:80/Temporary_Listen_Addresses/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM ``` ``` Attempted to reserve URL http://+:80/Temporary_Listen_Addresses/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM ``` ``` Attempted to reserve URL https://+:5986/wsman/. Status 0x0. Process Id 0x4 Executable path , User SYSTEM ``` etc... ``` Create URL group 0xFE00000220000001. Status 0x0. Process Id 0xAA8 Executable path \Device\HarddiskVolume3\Windows\System32\svchost.exe, User LOCAL SERVICE ``` etc...
Author
Owner

@Mikesco3 commented on GitHub (Nov 7, 2024):

I can refer to the system practically right away from the netbird IP...
So it seems to be related to DNS??

Update
If I add the Netbird IP address of the peer in question to the host file and reboot, then it is available right away...

Temporary solution

  1. ping the netbird peer you want to reach
  2. Grab the IP address that replies
    In my example server-cr points to100.65.118.84
  3. Enter the information into the hosts file
    C:\Windows\System32\drivers\etc\hosts
    At the end of the file add (Adjust for your case):
100.65.118.84   server-cr
  1. Rebooted and the drives were available immediately after I logged in.
@Mikesco3 commented on GitHub (Nov 7, 2024): I can refer to the system practically right away from the netbird IP... So it seems to be related to DNS?? **Update** If I add the Netbird IP address of the peer in question to the host file and reboot, then it is available right away... ### Temporary solution 1. ping the netbird peer you want to reach 2. Grab the IP address that replies In my example `server-cr` points to`100.65.118.84` 3. Enter the information into the hosts file `C:\Windows\System32\drivers\etc\hosts` At the end of the file add _(Adjust for your case)_: ``` 100.65.118.84 server-cr ``` 4. Rebooted and the drives were available immediately after I logged in.
Author
Owner

@mlsmaycon commented on GitHub (Nov 7, 2024):

can you share the client logs?

You can bundle them with:

netbird debug bundle -A

@mlsmaycon commented on GitHub (Nov 7, 2024): can you share the client logs? You can bundle them with: netbird debug bundle -A
Author
Owner

@Mikesco3 commented on GitHub (Nov 7, 2024):

Here are the logs...
I checked lighly to see if I needed to obfuscate anything and there only seem to be public keys...

netbird.debug.984634646.zip

@Mikesco3 commented on GitHub (Nov 7, 2024): Here are the logs... I checked lighly to see if I needed to obfuscate anything and there only seem to be public keys... [netbird.debug.984634646.zip](https://github.com/user-attachments/files/17669201/netbird.debug.984634646.zip)
Author
Owner

@Mikesco3 commented on GitHub (Nov 7, 2024):

BTW, I don't know if this makes a difference or not...
Before running the current setup of Netbird,

  1. I had previously setup netbird on another vps.
  2. Installed the clients and connected.
  3. Once that worked,
  4. I removed the netbird clients and the config foder from \programdata\
  5. setup the current vps
  6. re-installed netbird...

So I don't know if there are any leftovers in the registry, that could point to the old vps??

@Mikesco3 commented on GitHub (Nov 7, 2024): BTW, I don't know if this makes a difference or not... Before running the current setup of Netbird, 1. I had previously setup netbird on another vps. 2. Installed the clients and connected. 3. Once that worked, 4. I removed the netbird clients and the config foder from `\programdata\` 5. setup the current vps 6. re-installed netbird... So I don't know if there are any leftovers in the registry, that could point to the old vps??
Author
Owner

@Mikesco3 commented on GitHub (Nov 7, 2024):

BTW, I don't know if this makes a difference or not... Before running the current setup of Netbird,

  1. I had previously setup netbird on another vps.
  2. Installed the clients and connected.
  3. Once that worked,
  4. I removed the netbird clients and the config foder from \programdata\
  5. setup the current vps
  6. re-installed netbird...

So I don't know if there are any leftovers in the registry, that could point to the old vps??

However this wouldn't make sense, because I'm also having the issue on machines that never had Netbird before...

@Mikesco3 commented on GitHub (Nov 7, 2024): > BTW, I don't know if this makes a difference or not... Before running the current setup of Netbird, > > 1. I had previously setup netbird on another vps. > 2. Installed the clients and connected. > 3. Once that worked, > 4. I removed the netbird clients and the config foder from `\programdata\` > 5. setup the current vps > 6. re-installed netbird... > > So I don't know if there are any leftovers in the registry, that could point to the old vps?? However this wouldn't make sense, because I'm also having the issue on machines that never had Netbird before...
Author
Owner

@Mikesco3 commented on GitHub (Nov 9, 2024):

For now my solution has been to add the IP's for the machines I need to reach to the host file and that worked.

It's not a pretty fix but it's working like a charm.

@Mikesco3 commented on GitHub (Nov 9, 2024): For now my solution has been to add the IP's for the machines I need to reach to the host file and that worked. It's not a pretty fix but it's working like a charm.
Author
Owner

@nazarewk commented on GitHub (Apr 28, 2025):

Hello @Mikesco3,

We're currently reviewing our open issues and would like to verify if this problem still exists in the latest NetBird version.

Could you please confirm if the issue is still there?

We may close this issue temporarily if we don't hear back from you within 2 weeks, but feel free to reopen it with updated information.

Thanks for your contribution to improving the project!

@nazarewk commented on GitHub (Apr 28, 2025): Hello @Mikesco3, We're currently reviewing our open issues and would like to verify if this problem still exists in the [latest NetBird version](https://github.com/netbirdio/netbird/releases). Could you please confirm if the issue is still there? We may close this issue temporarily if we don't hear back from you within **2 weeks**, but feel free to reopen it with updated information. Thanks for your contribution to improving the project!
Author
Owner

@ergleb78 commented on GitHub (May 13, 2025):

Hi @nazarewk
We are running into somewhat similar (??) problem on windows peer, but related to DNS resolution.
We have network configured, which is supposed to resolve internal DNS records using "routing peer internal resolution".

It works everywhere, but the single Windows peer.

When I run:
netbird list routes

it shows all networks correctly, however on Windows peer the first ping of the destination peer in the target network always fails after reboot.
Then, after the ping fails, the remote's peer fqdn shows up in the resolved IPs:

  - ID: x1-dev-namespace
    Domains: *.x1-dev.svc.cluster.local
    Status: Selected
    Resolved IPs:
      [mongo-replicaset-1.mongo-replicaset-svc.x1-dev.svc.cluster.local.]: 10.<...real ip here...>
     

All pings later work fine.

@ergleb78 commented on GitHub (May 13, 2025): Hi @nazarewk We are running into somewhat similar (??) problem on windows peer, but related to DNS resolution. We have network configured, which is supposed to resolve internal DNS records using "routing peer internal resolution". It works everywhere, but the single Windows peer. When I run: netbird list routes it shows all networks correctly, however on Windows peer the first ping of the destination peer in the target network always fails after reboot. Then, after the ping fails, the remote's peer fqdn shows up in the resolved IPs: ``` - ID: x1-dev-namespace Domains: *.x1-dev.svc.cluster.local Status: Selected Resolved IPs: [mongo-replicaset-1.mongo-replicaset-svc.x1-dev.svc.cluster.local.]: 10.<...real ip here...> ``` All pings later work fine.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1405