Network Map not updating after loss of HA routing peer #1922

Closed
opened 2025-11-20 06:09:17 -05:00 by saavagebueno · 3 comments
Owner

Originally created by @drewhemm on GitHub (May 27, 2025).

Describe the problem

I am running multiple routing peers in HA for a network. When I connect a client peer to Netbird, everything works fine. If I netbird down the routing peer that the client is using for the routes, it does not receive any further network map updates. Since a network map update doesn't come through, the client peer does not switch the routes over to the other routing peer.

To Reproduce

Steps to reproduce the behavior:

  1. Deploy two (or more) routing peers and connect them to an instance of Netbird
  2. Configure one or more networks to be accessible via those peers, individually or a via a group
  3. Connect a client peer to Netbird. Ensure it has an access policy that causes it to receive the route(s)
  4. Run netbird down on the routing peer that the client is currently using for the route(s)
  5. Run netbird status -d on the client peer. You will see the status of the down routing peer as 'Connecting' and the routes remain associated with that peer and do not failover to the other peer, despite status of 'Connected'

At this point, the only workarounds are to restart the client peer, or to take action in the management UI that triggers the sending of a network map update, such as toggling the routing peer or peer group off and on.

Expected behavior

The networks/routes should fail over to the remaining routing peer(s) within a few seconds and without requiring any manual action

Are you using NetBird Cloud?

Self-hosted

NetBird version

0.45.1

Is any other VPN software installed?

No

Debug output

To help us resolve the problem, please attach the following anonymized status output

netbird status -dA

Peers detail:
 rpi4b-window.netbird.selfhosted:
  NetBird IP: 100.71.117.141
  Public key: wAKpG4Ol+aSzF9wAlhFwFYLxrIwQT3qKSndxVVUscEE=
  Status: Connecting
  -- detail --
  Connection type:
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address:
  Last connection update: 8 minutes, 6 seconds ago
  Last WireGuard handshake: -
  Transfer status (received/sent) 0 B/0 B
  Quantum resistance: false
  Networks: -
  Latency: 0s

 netbird-dev-optimiser-k8s-78dfdb66c6-zfkk6.netbird.selfhosted:
  NetBird IP: 100.71.177.114
  Public key: RwHmJKTVPU3/P0d5tXPmWww9qi4lgmAldJ7Wf3XgKDM=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): srflx/relay
  ICE candidate endpoints (Local/Remote): 198.51.100.0:39513/198.51.100.1:56945
  Relay server address:
  Last connection update: 8 minutes, 3 seconds ago
  Last WireGuard handshake: 1 minute, 36 seconds ago
  Transfer status (received/sent) 688 B/1.5 KiB
  Quantum resistance: false
  Networks: 10.243.0.0/22, 10.243.4.0/22
  Latency: 52.9895ms

 netbird-a2sdv-1-1.netbird.selfhosted:
  NetBird IP: 100.71.188.65
  Public key: glJyPm+D1gLQYtABng2oGZCyQqLF5QRBrMripmcmYTg=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): relay/srflx
  ICE candidate endpoints (Local/Remote): 198.51.100.1:57256/198.51.100.2:53772
  Relay server address:
  Last connection update: 8 minutes, 4 seconds ago
  Last WireGuard handshake: 1 minute, 54 seconds ago
  Transfer status (received/sent) 4.9 KiB/11.1 KiB
  Quantum resistance: false
  Networks: 10.7.0.0/23, 10.7.3.0/28, 192.168.7.0/24, 198.51.100.3/32
  Latency: 60.0205ms

 netbird-rpi5-1.netbird.selfhosted:
  NetBird IP: 100.71.246.55
  Public key: ZsqyMXTm0tRK3JCztt+lw51dnqo1BLZ6yKhZOMblj3E=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): relay/srflx
  ICE candidate endpoints (Local/Remote): 198.51.100.1:52976/198.51.100.2:47204
  Relay server address:
  Last connection update: 3 minutes, 55 seconds ago
  Last WireGuard handshake: 1 minute, 54 seconds ago
  Transfer status (received/sent) 9.9 KiB/5.7 KiB
  Quantum resistance: false
  Networks: -
  Latency: 37.9216ms

Events:
  [WARNING] DNS (0e7e2d68-97e5-4d71-8443-a5a172192813)
    Message: All upstream servers failed (probe failed)
    Time: 8 minutes, 6 seconds ago
    Metadata: upstreams: 10.243.4.10:53
  [WARNING] DNS (1a1f6725-b683-4eb4-8d9b-114529713fbe)
    Message: All upstream servers failed (probe failed)
    Time: 8 minutes, 6 seconds ago
    Metadata: upstreams: 10.243.4.10:53
  [WARNING] DNS (4b5b2beb-88ee-4996-9858-c4edab27e051)
    Message: All upstream servers failed (probe failed)
    Time: 8 minutes, 6 seconds ago
    Metadata: upstreams: 10.7.0.1:53
  [INFO] SYSTEM (da0c16a9-63a7-4089-8a5c-f8ea6e0d7fa6)
    Message: Network map updated
    Time: 8 minutes, 6 seconds ago
  [INFO] SYSTEM (e222a455-5dd1-4795-b756-e19e6d79f8ad)
    Message: Network map updated
    Time: 5 minutes, 45 seconds ago
  [INFO] SYSTEM (093620b4-b95c-49cf-9504-ddb29c1a2e2f)
    Message: Network map updated
    Time: 5 minutes, 16 seconds ago
  [INFO] SYSTEM (348b01e5-0dd8-4261-aa74-5da2b1376d1a)
    Message: Network map updated
    Time: 5 minutes, 15 seconds ago
  [INFO] SYSTEM (8bb8a0fb-6bdf-47c4-b797-ac6be1f90b89)
    Message: Network map updated
    Time: 4 minutes, 9 seconds ago
  [INFO] SYSTEM (3aab7764-894f-4b68-a293-9cf2703303b5)
    Message: Network map updated
    Time: 4 minutes, 8 seconds ago
  [INFO] SYSTEM (fc89477f-e309-4937-8a66-0bad56d48db1)
    Message: Network map updated
    Time: 4 minutes, 8 seconds ago
OS: windows/amd64
Daemon version: 0.45.1
CLI version: 0.45.1
Management: Connected to https://nb.anon-zkiwj.domain:33073/
Signal: Connected to http://nb.anon-zkiwj.domain:10000/
Relays:
  [stun:nb.anon-ZkiWj.domain:3478] is Available
  [turn:nb.anon-ZkiWj.domain:3478?transport=udp] is Available
Nameservers:
  [10.243.4.10:53] for [argocd.svc.dev.gcp.anon-Hjt1a.domain, jupyter.svc.dev.gcp.anon-Hjt1a.domain] is Available
  [8.8.8.8:53, 8.8.4.4:53] for [.] is Available
  [10.7.0.1:53] for [office.anon-ZkiWj.domain] is Available
  [10.7.0.1:53] for [k3s-devel.anon-ZkiWj.domain] is Available
FQDN: zbduo8406.netbird.selfhosted
NetBird IP: 100.71.137.187/16
Interface type: Userspace
Quantum resistance: false
Lazy connection: false
Networks: -
Forwarding rules: 0
Peers count: 3/4 Connected

Create and upload a debug bundle, and share the returned file key:

netbird debug for 1m -AS -U

Uploaded files are automatically deleted after 30 days.

2c083968ec611b79db72ce4a8f4aae94746840c955f0ae82ae55b08e0a33a96b/abadb810-239a-4d67-b9bd-17e3c8181cf8

Alternatively, create the file only and attach it here manually:

netbird debug for 1m -AS

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Have you tried these troubleshooting steps?

  • Reviewed client troubleshooting (if applicable)
  • Checked for newer NetBird versions
  • Searched for similar issues on GitHub (including closed ones)
  • Restarted the NetBird client
  • Disabled other VPN software
  • Checked firewall settings
Originally created by @drewhemm on GitHub (May 27, 2025). **Describe the problem** I am running multiple routing peers in HA for a network. When I connect a client peer to Netbird, everything works fine. If I `netbird down` the routing peer that the client is using for the routes, it does not receive any further network map updates. Since a network map update doesn't come through, the client peer does not switch the routes over to the other routing peer. **To Reproduce** Steps to reproduce the behavior: 1. Deploy two (or more) routing peers and connect them to an instance of Netbird 2. Configure one or more networks to be accessible via those peers, individually or a via a group 3. Connect a client peer to Netbird. Ensure it has an access policy that causes it to receive the route(s) 4. Run `netbird down` on the routing peer that the client is currently using for the route(s) 5. Run `netbird status -d` on the client peer. You will see the status of the down routing peer as 'Connecting' and the routes remain associated with that peer and do not failover to the other peer, despite status of 'Connected' At this point, the only workarounds are to restart the client peer, or to take action in the management UI that triggers the sending of a network map update, such as toggling the routing peer or peer group off and on. **Expected behavior** The networks/routes should fail over to the remaining routing peer(s) within a few seconds and without requiring any manual action **Are you using NetBird Cloud?** Self-hosted **NetBird version** 0.45.1 **Is any other VPN software installed?** No **Debug output** To help us resolve the problem, please attach the following anonymized status output netbird status -dA ``` Peers detail: rpi4b-window.netbird.selfhosted: NetBird IP: 100.71.117.141 Public key: wAKpG4Ol+aSzF9wAlhFwFYLxrIwQT3qKSndxVVUscEE= Status: Connecting -- detail -- Connection type: ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: Last connection update: 8 minutes, 6 seconds ago Last WireGuard handshake: - Transfer status (received/sent) 0 B/0 B Quantum resistance: false Networks: - Latency: 0s netbird-dev-optimiser-k8s-78dfdb66c6-zfkk6.netbird.selfhosted: NetBird IP: 100.71.177.114 Public key: RwHmJKTVPU3/P0d5tXPmWww9qi4lgmAldJ7Wf3XgKDM= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): srflx/relay ICE candidate endpoints (Local/Remote): 198.51.100.0:39513/198.51.100.1:56945 Relay server address: Last connection update: 8 minutes, 3 seconds ago Last WireGuard handshake: 1 minute, 36 seconds ago Transfer status (received/sent) 688 B/1.5 KiB Quantum resistance: false Networks: 10.243.0.0/22, 10.243.4.0/22 Latency: 52.9895ms netbird-a2sdv-1-1.netbird.selfhosted: NetBird IP: 100.71.188.65 Public key: glJyPm+D1gLQYtABng2oGZCyQqLF5QRBrMripmcmYTg= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): relay/srflx ICE candidate endpoints (Local/Remote): 198.51.100.1:57256/198.51.100.2:53772 Relay server address: Last connection update: 8 minutes, 4 seconds ago Last WireGuard handshake: 1 minute, 54 seconds ago Transfer status (received/sent) 4.9 KiB/11.1 KiB Quantum resistance: false Networks: 10.7.0.0/23, 10.7.3.0/28, 192.168.7.0/24, 198.51.100.3/32 Latency: 60.0205ms netbird-rpi5-1.netbird.selfhosted: NetBird IP: 100.71.246.55 Public key: ZsqyMXTm0tRK3JCztt+lw51dnqo1BLZ6yKhZOMblj3E= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): relay/srflx ICE candidate endpoints (Local/Remote): 198.51.100.1:52976/198.51.100.2:47204 Relay server address: Last connection update: 3 minutes, 55 seconds ago Last WireGuard handshake: 1 minute, 54 seconds ago Transfer status (received/sent) 9.9 KiB/5.7 KiB Quantum resistance: false Networks: - Latency: 37.9216ms Events: [WARNING] DNS (0e7e2d68-97e5-4d71-8443-a5a172192813) Message: All upstream servers failed (probe failed) Time: 8 minutes, 6 seconds ago Metadata: upstreams: 10.243.4.10:53 [WARNING] DNS (1a1f6725-b683-4eb4-8d9b-114529713fbe) Message: All upstream servers failed (probe failed) Time: 8 minutes, 6 seconds ago Metadata: upstreams: 10.243.4.10:53 [WARNING] DNS (4b5b2beb-88ee-4996-9858-c4edab27e051) Message: All upstream servers failed (probe failed) Time: 8 minutes, 6 seconds ago Metadata: upstreams: 10.7.0.1:53 [INFO] SYSTEM (da0c16a9-63a7-4089-8a5c-f8ea6e0d7fa6) Message: Network map updated Time: 8 minutes, 6 seconds ago [INFO] SYSTEM (e222a455-5dd1-4795-b756-e19e6d79f8ad) Message: Network map updated Time: 5 minutes, 45 seconds ago [INFO] SYSTEM (093620b4-b95c-49cf-9504-ddb29c1a2e2f) Message: Network map updated Time: 5 minutes, 16 seconds ago [INFO] SYSTEM (348b01e5-0dd8-4261-aa74-5da2b1376d1a) Message: Network map updated Time: 5 minutes, 15 seconds ago [INFO] SYSTEM (8bb8a0fb-6bdf-47c4-b797-ac6be1f90b89) Message: Network map updated Time: 4 minutes, 9 seconds ago [INFO] SYSTEM (3aab7764-894f-4b68-a293-9cf2703303b5) Message: Network map updated Time: 4 minutes, 8 seconds ago [INFO] SYSTEM (fc89477f-e309-4937-8a66-0bad56d48db1) Message: Network map updated Time: 4 minutes, 8 seconds ago OS: windows/amd64 Daemon version: 0.45.1 CLI version: 0.45.1 Management: Connected to https://nb.anon-zkiwj.domain:33073/ Signal: Connected to http://nb.anon-zkiwj.domain:10000/ Relays: [stun:nb.anon-ZkiWj.domain:3478] is Available [turn:nb.anon-ZkiWj.domain:3478?transport=udp] is Available Nameservers: [10.243.4.10:53] for [argocd.svc.dev.gcp.anon-Hjt1a.domain, jupyter.svc.dev.gcp.anon-Hjt1a.domain] is Available [8.8.8.8:53, 8.8.4.4:53] for [.] is Available [10.7.0.1:53] for [office.anon-ZkiWj.domain] is Available [10.7.0.1:53] for [k3s-devel.anon-ZkiWj.domain] is Available FQDN: zbduo8406.netbird.selfhosted NetBird IP: 100.71.137.187/16 Interface type: Userspace Quantum resistance: false Lazy connection: false Networks: - Forwarding rules: 0 Peers count: 3/4 Connected ``` Create and upload a debug bundle, and share the returned file key: netbird debug for 1m -AS -U *Uploaded files are automatically deleted after 30 days.* `2c083968ec611b79db72ce4a8f4aae94746840c955f0ae82ae55b08e0a33a96b/abadb810-239a-4d67-b9bd-17e3c8181cf8` Alternatively, create the file only and attach it here manually: netbird debug for 1m -AS **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** Add any other context about the problem here. **Have you tried these troubleshooting steps?** - [x] Reviewed [client troubleshooting](https://docs.netbird.io/how-to/troubleshooting-client) (if applicable) - [x] Checked for newer NetBird versions - [x] Searched for similar issues on GitHub (including closed ones) - [x] Restarted the NetBird client - [ ] Disabled other VPN software - [x] Checked firewall settings
saavagebueno added the triage-needed label 2025-11-20 06:09:17 -05:00
Author
Owner

@lixmal commented on GitHub (Jun 1, 2025):

Hi @drewhemm,

can you check if the issue has been fixed with https://github.com/netbirdio/netbird/releases/tag/v0.45.3

FYI, the network map is only updated on changes and not relevant for your scenario.

@lixmal commented on GitHub (Jun 1, 2025): Hi @drewhemm, can you check if the issue has been fixed with https://github.com/netbirdio/netbird/releases/tag/v0.45.3 FYI, the network map is only updated on changes and not relevant for your scenario.
Author
Owner

@drewhemm commented on GitHub (Jun 2, 2025):

Hi @lixmal, I can confirm this resolves the reported issue! I spun up two Netbird peers in a Kubernetes cluster, then ran netbird status -d on my laptop to confirm which peer was being used for the defined network, then I deleted that pod. My peer immediately switched the network to the other K8s peer.

@drewhemm commented on GitHub (Jun 2, 2025): Hi @lixmal, I can confirm this resolves the reported issue! I spun up two Netbird peers in a Kubernetes cluster, then ran `netbird status -d` on my laptop to confirm which peer was being used for the defined network, then I deleted that pod. My peer immediately switched the network to the other K8s peer.
Author
Owner

@drewhemm commented on GitHub (Jun 2, 2025):

Resolved by https://github.com/netbirdio/netbird/pull/3889

@drewhemm commented on GitHub (Jun 2, 2025): Resolved by https://github.com/netbirdio/netbird/pull/3889
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1922