Migration from v0.26.0 to v0.35.2 resulted in some devices failing to connect to NetBird (self-hosted) #1550

Closed
opened 2025-11-20 05:32:40 -05:00 by saavagebueno · 2 comments
Owner

Originally created by @realfresh on GitHub (Jan 12, 2025).

Hey everyone, I'm in quite a problematic situation right now, having lost connection to around 20 IOT devices.

We have had NetBird running for quite a while on v0.26.0. A couple of days ago, we decided to upgrade to the latest version. We took note that the notes said that the jsonfile storage mechanism would be automatically replaced with sqlite storage from v0.28.0 onwards. So heres how we did the upgrade:

  1. Clone NetBird v0.27.0 repo, copy artifacts directory and run the configuration script. Run the new docker compose file.
  2. While v.027.10 is running, use the migration CLI to migrate the JSON file to the new SQLite file.
  3. Clone the v0.35.2, copy artifacts directory and run the configuration script again, making sure the management.json file storage driver is set to sqlite.

After doing these steps, we noticed existing clients were all offline but slowly, over the course of several hours, 33/71 clients came back online.

We somehow managed to get to 50/71 clients back online and connecting to our self-hosted NetBird instance after trying many different things such as:

  1. Downgrading back to v0.27.10
  2. Using the old JSON file instead of the SQLite database
  3. Disabling authentication on the coturn server so we didn't have auth failures there
  4. Trying various Turns[].Password values in the management.json file
  5. And a lot more we did in a panic that I can't remember.

However, the remaining devices don't seem to want to come back. The management server logs show occasional lines like this:

WARN [accountID: UNKNOWN, peerID: <<REDACTED>>, context: GRPC, requestID: a1b1b45a-01af-4a66-a196-58dcf7c1cde7] management/server/grpcserver.go:471: failed logging in peer <<REDACTED>: no peer auth method provided, please use a setup key or interactive SSO login

I so happened to have 1 test IOT device on hand which is also unable to connect (the other devices are all over the country). Looking at the NetBird daemon logs on that device, I see this:

systemd[1]: Started netbird.service - A WireGuard-based mesh network that connects your devices into a single private network..
netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/cmd/service_controller.go:24: starting Netbird service
netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/cmd/service_controller.go:64: started daemon server: /var/run/netbird.sock
netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/internal/connect.go:119: starting NetBird client version 0.28.9 on linux/arm64
netbird[1021]: 2025-01-12T16:40:14+13:00 ERRO management/client/grpc.go:350: failed to login to Management Service: rpc error: code = PermissionDenied desc = no peer auth method provided, please use a setup key or interactive SSO login

Restarting the daemon only produces the same result.

Is there any workaround or solution for us to get the remaining devices connected again? It seems like if there was someway to temporarily bypass auth, so those devices could authenticate successfully and reconnect, things would be solved.

Any suggestions and ideas are much appreciated!

PS: I'm certain this whole problem is "user error" and not an issue with NetBird itself, hopefully it's possible to have safeguards to ensure issues like this don't happen for others.

Originally created by @realfresh on GitHub (Jan 12, 2025). Hey everyone, I'm in quite a problematic situation right now, having lost connection to around 20 IOT devices. We have had NetBird running for quite a while on v0.26.0. A couple of days ago, we decided to upgrade to the latest version. We took note that the notes said that the `jsonfile` storage mechanism would be automatically replaced with `sqlite` storage from v0.28.0 onwards. So heres how we did the upgrade: 1. Clone NetBird v0.27.0 repo, copy artifacts directory and run the configuration script. Run the new docker compose file. 2. While v.027.10 is running, use the migration CLI to migrate the JSON file to the new SQLite file. 3. Clone the v0.35.2, copy artifacts directory and run the configuration script again, making sure the `management.json` file storage driver is set to `sqlite`. After doing these steps, we noticed existing clients were all offline but slowly, over the course of several hours, 33/71 clients came back online. We somehow managed to get to 50/71 clients back online and connecting to our self-hosted NetBird instance after trying many different things such as: 1. Downgrading back to v0.27.10 2. Using the old JSON file instead of the SQLite database 3. Disabling authentication on the coturn server so we didn't have auth failures there 4. Trying various `Turns[].Password` values in the `management.json` file 5. And a lot more we did in a panic that I can't remember. However, the remaining devices don't seem to want to come back. The management server logs show occasional lines like this: ``` WARN [accountID: UNKNOWN, peerID: <<REDACTED>>, context: GRPC, requestID: a1b1b45a-01af-4a66-a196-58dcf7c1cde7] management/server/grpcserver.go:471: failed logging in peer <<REDACTED>: no peer auth method provided, please use a setup key or interactive SSO login ``` I so happened to have 1 test IOT device on hand which is also unable to connect (the other devices are all over the country). Looking at the NetBird daemon logs on that device, I see this: ``` systemd[1]: Started netbird.service - A WireGuard-based mesh network that connects your devices into a single private network.. netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/cmd/service_controller.go:24: starting Netbird service netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/cmd/service_controller.go:64: started daemon server: /var/run/netbird.sock netbird[1021]: 2025-01-12T16:40:13+13:00 INFO client/internal/connect.go:119: starting NetBird client version 0.28.9 on linux/arm64 netbird[1021]: 2025-01-12T16:40:14+13:00 ERRO management/client/grpc.go:350: failed to login to Management Service: rpc error: code = PermissionDenied desc = no peer auth method provided, please use a setup key or interactive SSO login ``` Restarting the daemon only produces the same result. Is there any workaround or solution for us to get the remaining devices connected again? It seems like if there was someway to temporarily bypass auth, so those devices could authenticate successfully and reconnect, things would be solved. Any suggestions and ideas are much appreciated! PS: I'm certain this whole problem is "user error" and not an issue with NetBird itself, hopefully it's possible to have safeguards to ensure issues like this don't happen for others.
saavagebueno added the waiting-feedback label 2025-11-20 05:32:40 -05:00
Author
Owner

@nazarewk commented on GitHub (Apr 28, 2025):

Hello @realfresh,

We're currently reviewing our open issues and would like to verify if this problem still exists in the latest NetBird version.

Could you please confirm if the issue is still there?

We may close this issue temporarily if we don't hear back from you within 2 weeks, but feel free to reopen it with updated information.

Thanks for your contribution to improving the project!

@nazarewk commented on GitHub (Apr 28, 2025): Hello @realfresh, We're currently reviewing our open issues and would like to verify if this problem still exists in the [latest NetBird version](https://github.com/netbirdio/netbird/releases). Could you please confirm if the issue is still there? We may close this issue temporarily if we don't hear back from you within **2 weeks**, but feel free to reopen it with updated information. Thanks for your contribution to improving the project!
Author
Owner

@mlsmaycon commented on GitHub (Jun 1, 2025):

closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.

@mlsmaycon commented on GitHub (Jun 1, 2025): closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1550