How to disconnect Netbird Peering Router without affecting clients #2025

Open
opened 2025-11-20 06:11:28 -05:00 by saavagebueno · 3 comments
Owner

Originally created by @karlveezoo on GitHub (Jun 30, 2025).

Hello,

I am running netbird peering routers in 2 pods in my kubernetes cluster for HA. When either upgrading the deployment or the nodes, these pods will be recreated in a rolling fashion. However it does feel like when this happen the client gets affected and you either have to wait a few minutes or disconnect/connect the client to get a connection established again.

I added a preStop hook to my kubernetes deployment for the routers that does "netbird down", before terminating the pods, but it doesn't seem to make things noticeably better.

Any tips for how I can achieve that when a pod is terminated the router will let the client "know" so they can establish connections to the other router pod instead?

Originally created by @karlveezoo on GitHub (Jun 30, 2025). Hello, I am running netbird peering routers in 2 pods in my kubernetes cluster for HA. When either upgrading the deployment or the nodes, these pods will be recreated in a rolling fashion. However it does feel like when this happen the client gets affected and you either have to wait a few minutes or disconnect/connect the client to get a connection established again. I added a preStop hook to my kubernetes deployment for the routers that does "netbird down", before terminating the pods, but it doesn't seem to make things noticeably better. Any tips for how I can achieve that when a pod is terminated the router will let the client "know" so they can establish connections to the other router pod instead?
saavagebueno added the peer-managementclientroutesnetworkingperformance labels 2025-11-20 06:11:28 -05:00
Author
Owner

@nazarewk commented on GitHub (Jun 30, 2025):

Seems like a use case for https://github.com/netbirdio/netbird/issues/3636

Could you provide a set of debug bundles for ~2 client Peers affected by the slow reconnection times? It is expected to take seconds (not minutes), so we would like to take a look.

Could you tell if it's more than 10 minutes or less? It might be somehow related to Ephemeral Peers expiration time.

Which version are you running? Can you try with the latest one?

For gathering logs I would suggest to do following:

  1. set NB_LOG_LEVEL to trace
  2. increase NB_LOG_MAX_SIZE_MB to 200 MB to cover a few minutes of logs (there are lots of those)
  3. perform an upgrade
  4. upload a debug bundle https://docs.netbird.io/how-to/troubleshooting-client#debug-bundle-uploads
@nazarewk commented on GitHub (Jun 30, 2025): Seems like a use case for https://github.com/netbirdio/netbird/issues/3636 Could you provide a set of debug bundles for ~2 client Peers affected by the slow reconnection times? It is expected to take seconds (not minutes), so we would like to take a look. Could you tell if it's more than 10 minutes or less? It might be somehow related to Ephemeral Peers expiration time. Which version are you running? Can you try with the latest one? For gathering logs I would suggest to do following: 1. set `NB_LOG_LEVEL` to `trace` 2. increase `NB_LOG_MAX_SIZE_MB` to 200 MB to cover a few minutes of logs (there are lots of those) 3. perform an upgrade 4. upload a debug bundle https://docs.netbird.io/how-to/troubleshooting-client#debug-bundle-uploads
Author
Owner

@nazarewk commented on GitHub (Jun 30, 2025):

What I gathered from the team, the current expectations are as follows:

  • Relay - Relay connections can take up to ~3 minutes to be reestablished, but those can take seconds in ideal network conditions.
  • P2P -> Relay connections aren't expected to take less than ~30 seconds due to wireguard timeouts and connection reestablishments
  • P2P - P2P connections should kick in faster than ~30 seconds

We're planning to improve this sooner rather than later.

@nazarewk commented on GitHub (Jun 30, 2025): What I gathered from the team, the current expectations are as follows: - Relay - Relay connections can take up to ~3 minutes to be reestablished, but those can take seconds in ideal network conditions. - P2P -> Relay connections aren't expected to take less than ~30 seconds due to wireguard timeouts and connection reestablishments - P2P - P2P connections should kick in faster than ~30 seconds We're planning to improve this sooner rather than later.
Author
Owner

@karlveezoo commented on GitHub (Jul 7, 2025):

Hi,

thanks for the reply. I will do an upgrade this week and do some tracing on my client for you

@karlveezoo commented on GitHub (Jul 7, 2025): Hi, thanks for the reply. I will do an upgrade this week and do some tracing on my client for you
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#2025