Random network disruptions on 0.33.0 #1457

Closed
opened 2025-11-20 05:30:57 -05:00 by saavagebueno · 4 comments
Owner

Originally created by @christian-schlichtherle on GitHub (Nov 25, 2024).

After upgrading from version 0.32.0 to 0.33.0, we see random network disruptions although netbird status is reporting no disconnects. This issue happens several times per hour even if the nodes sit on the same LAN. Eventually, we've downgraded to 0.32.0.

To reproduce: Install a larger bunch of Linux nodes on the same LAN, put some serious workload on them and watch random network errors without being reported by netbird status. In our case, we run Longhorn on the nodes and everytime this happens almost all of the volumes degrade, causing huge network activity in order to rebuild new volume replicas on other nodes.

Originally created by @christian-schlichtherle on GitHub (Nov 25, 2024). After upgrading from version 0.32.0 to 0.33.0, we see random network disruptions although `netbird status` is reporting no disconnects. This issue happens several times per hour even if the nodes sit on the same LAN. Eventually, we've downgraded to 0.32.0. To reproduce: Install a larger bunch of Linux nodes on the same LAN, put some serious workload on them and watch random network errors without being reported by `netbird status`. In our case, we run Longhorn on the nodes and everytime this happens almost all of the volumes degrade, causing huge network activity in order to rebuild new volume replicas on other nodes.
saavagebueno added the clientwaiting-feedbacktriage-needed labels 2025-11-20 05:30:57 -05:00
Author
Owner

@mgarces commented on GitHub (Nov 26, 2024):

Hi, can you please clarify if this is on self-hosted or cloud?
Also, would it be possible to run some tests with 0.33.0 and collect logs with debug mode activated? [1] It would be helpful to understand the status of your nodes.

[1] https://docs.netbird.io/how-to/troubleshooting-client

@mgarces commented on GitHub (Nov 26, 2024): Hi, can you please clarify if this is on self-hosted or cloud? Also, would it be possible to run some tests with `0.33.0` and collect logs with debug mode activated? [[1](https://docs.netbird.io/how-to/troubleshooting-client)] It would be helpful to understand the status of your nodes. [1] https://docs.netbird.io/how-to/troubleshooting-client
Author
Owner

@christian-schlichtherle commented on GitHub (Dec 3, 2024):

This was the cloud service.

I'm sorry to tell you that the recent issues with regressions in Netbird and incompatibilities with Flannel have caused us to reengineer out network design: We have now settled on Netmaker for the overlay network and Cilium for the CNI plugin.

In all fairness, Netbird isn't to blame entirely for our struggles: It seems more like a compatibility issue with Flannel that caused frequent outages of the clusters every few hours. Nevertheless, there were definitely regressions in Netbird too, I've created tickets before. As the situation became unbearable, I had to look for a change. I hope you can solve the regression issues because this is a really impressive project.

@christian-schlichtherle commented on GitHub (Dec 3, 2024): This was the cloud service. I'm sorry to tell you that the recent issues with regressions in Netbird and incompatibilities with Flannel have caused us to reengineer out network design: We have now settled on Netmaker for the overlay network and Cilium for the CNI plugin. In all fairness, Netbird isn't to blame entirely for our struggles: It seems more like a compatibility issue with Flannel that caused frequent outages of the clusters every few hours. Nevertheless, there were definitely regressions in Netbird too, I've created tickets before. As the situation became unbearable, I had to look for a change. I hope you can solve the regression issues because this is a really impressive project.
Author
Owner

@nazarewk commented on GitHub (Apr 28, 2025):

Hello @christian-schlichtherle,

We're currently reviewing our open issues and would like to verify if this problem still exists in the latest NetBird version.

Could you please confirm if the issue is still there?

We may close this issue temporarily if we don't hear back from you within 2 weeks, but feel free to reopen it with updated information.

Thanks for your contribution to improving the project!

@nazarewk commented on GitHub (Apr 28, 2025): Hello @christian-schlichtherle, We're currently reviewing our open issues and would like to verify if this problem still exists in the [latest NetBird version](https://github.com/netbirdio/netbird/releases). Could you please confirm if the issue is still there? We may close this issue temporarily if we don't hear back from you within **2 weeks**, but feel free to reopen it with updated information. Thanks for your contribution to improving the project!
Author
Owner

@christian-schlichtherle commented on GitHub (Apr 28, 2025):

As stated above, we have abandoned Netbird for Netmaker.

@christian-schlichtherle commented on GitHub (Apr 28, 2025): As stated above, we have abandoned Netbird for Netmaker.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1457