mirror of
https://github.com/netbirdio/netbird.git
synced 2026-06-10 17:31:53 -04:00
Regression: Frequent disconnects with version 0.30.2 #1356
Closed
opened 2025-11-20 05:28:55 -05:00 by saavagebueno
·
25 comments
No Branch/Tag Specified
main
peer-acl-multi-source
feature/affected-peers
ui-refactor
fix/ios-debug-bundle
mdm_integration
dependabot/go_modules/testcontainers-9a9ed843ba
dependabot/go_modules/github.com/fsnotify/fsnotify-1.10.1
relay-transport-observability
embedded-vnc
windows-dns-firewall
tests/enable-race-on-tests
dependabot/go_modules/aws-sdk-e0d7f0be02
dependabot/github_actions/actions-1b76ec1a46
dependabot/go_modules/pion-04391f0276
dependabot/go_modules/otel-e34c790afd
dependabot/go_modules/gorm-2271c8195b
dependabot/go_modules/wireguard-dbd6b95108
ui-refactor-gtk3
wasm-websocket-dial
feature/affected-peers-grpc
profile-id-name
profile-id
lazyconn-first-packet-fix-v2
claude/focused-gates-VMTgb
feature/immediate-handshake-on-endpoint-change
refactor/mgmt-bootstrap
dependabot/go_modules/github.com/quic-go/quic-go-0.59.1
fix/ios-login-expiry-blackhole
fix/exit-node-v6-deselect-propagation
ui-tray-linux-leftclick
dependabot/go_modules/github.com/rs/cors-1.11.1
dependabot/go_modules/github.com/ebitengine/purego-0.10.1
dependabot/go_modules/github.com/c-robinson/iplib-1.0.8
dependabot/go_modules/github.com/redis/go-redis/v9-9.20.0
dependabot/go_modules/github.com/cilium/ebpf-0.21.0
dependabot/go_modules/github.com/coreos/go-iptables-0.8.0
dependabot/go_modules/golang.org/x/mod-0.36.0
dependabot/go_modules/github.com/spf13/pflag-1.0.10
fix/ctx-enrichment
nmap/components-impl
daemon-owner
dependabot/go_modules/github.com/crowdsecurity/crowdsec-1.7.8
client-json-socket
feature/android-client-ssh
feature/ios-ssh
worktree-accept-ra-forwarding
nmap/combined-deploy
task/align_protobuff_toolset
feature/session-extend
add-json-yaml-flags
refactor/ephemeral-cleanup
claude/webtransport-relay-wasm-mUjY9
claude/vnc-udp-feasibility-6KB1U
fix-ssh-authorized-users-multi-rule
fix/wgport-config
drop-candidateviaroutes-filter
e2e-windows-dns-combined
dependabot/go_modules/github.com/Azure/go-ntlmssp-0.1.1
debug-logs
dependabot/go_modules/github.com/jackc/pgx/v5-5.9.2
fix/login-cmd-root-flags
feat/reseller-openapi-spec
github-issue-resolver
add-steamos-support
fix-darwin-uninstaller
flutter-test
dependabot/npm_and_yarn/proxy/web/postcss-8.5.12
ci/freebsd-pkg-bootstrap
cached-serial-check-on-sync
fix-mgmt-cache-bypass-overlay
revert-easyjson-5938
revert-ice-5820
revert-firewalld-5928
refactor/permissions-manager
revert-dns-5935-systemd-resolved
revert-dns-5935-5945
revert-dns-5945-mgmt-cache
feature/log-most-busy-peers
prototype/ui-wails
coderabbitai/utg/8ae8f20
feature/use-peer-fqdn-on-https
dependabot/go_modules/golang.org/x/image-0.38.0
feature/metrics-push-management-control
release/0.68.3
dependabot/go_modules/github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream-1.7.8
dependabot/go_modules/github.com/aws/aws-sdk-go-v2/service/s3-1.97.3
add-slack-channel
claude/rdp-token-passthrough-eNcqW
transparent-proxy
fix/macos-stale-route-eexist
crowdsec-selfhosted
fix/remove-otel-units
entire/checkpoints/v1
dependabot/go_modules/github.com/go-jose/go-jose/v4-4.1.4
fix/getting-started
feat/static-connectors-combined-server
feature/use-local-keys-embedded
feature/fleetdm
set-env-only-if-not-fork
feature/expose-has-channel
fix/connection-status-race
fix/filter-cgnat-cni-ice-candidates
feature/check-cert-locker-before-acme
test/proxy-fixes
test/proxy-mtu
prototype/ui-tauri
test/proxy-speed
fix-reused-ports
feat/migrate-to-embedded-idp
feature/add-serial-to-proxy-merged
deploy/proxy-serial
test/connection
feature/disable-legacy-port
feature/flag-to-disable-legacy-port
test/perftest
dependabot/go_modules/github.com/pion/dtls/v3-3.0.11
fix/http-redirect
poc-token-command
dn-reverse-proxy
prototype/reverse-proxy-rename
prototype/reverse-proxy-logs-pagination
feature/client-metrics
prototype/reverse-proxy-clusters
debug-dns-route
fix/win-dns-batch
add-extra-route-logs
job-stream-notify-disconnection-eof
deploy/secrets-manager
trigger-proxy-update
bug/update-ios-client-code-build-tags
sync-client-netmap-serial
log/conn-disconn
nmap/compaction-deploy
ci-win-test
feature/disk-encryption-check
wasm-debug
swap-dns-prio
fix/dex-config
feature/migrate-auto-groups-to-table
dependabot/go_modules/github.com/quic-go/quic-go-0.57.0
nmap/compaction
dex-nocgo-stub
feature/exclude-terraform-from-rate-limiting
test-freebsd
retries-refactor
coderabbitai/docstrings/b7e98ac
feat/integrate-zitadel
bug/ios-hanging-reconection
zitadel-idp
feat/network-map-serial
refactor/get-account-no-users
feat/auto-upgrade
feature/report-high-pat-id
feature/temporary-access-for-resource
fix/nmap-fwrules
dont-restart-dns
prototype/ui
update-gomobile
go-dns-for-ice
wasm-ldflags
test-ldflags
wasmbuild-test
feature/networks-s2s
vk/compare-nmaps
dbg/bothmaps
feature/changeset
reorder-dns-shutdown
fix/relay-reconnection-race
fix/nmap-exitnodes
vk/debug/nmap-both
move-licensed-code
feat/better-daemon-connection-lost-message
feat/auto-update-2
test/timings
refactor/getaccount-raw
tests/nmap-getaccount
refactor/nmap
refactor/nmap-limit-buffer
feature/detect-mac-wakeup
feature/extract-modules
quick-setings
feat/sync-limiter
feature/store-cache-impl
fix-install-version
feature/store-metrics
feature/metrics-on-store
feature/use-gorm-cache
loadtest-signal
unsymmetrical-squash
refactor/reducate-signaling
test/update-reduce
feature/store-cache
feature/remote-debug
cli-ws-proxy-backend-addr
feat/mgmt-map-serial
snyk-fix-d9d0081a4c7f9137bdb59d0d50a141a2
snyk-fix-7415cea5a11acd66753540ca2c598c63
job-yml-update
feature/android-allow-selecting-routes
fix/up-sequence
fix/dns-hash-update
snyk-fix-967adae9863f17f108ce8948d9117b8d
log/getaccount-by-peer
signal-suppressor
dns-exit-node
feature/auto-updates
feature/cache-srv-key
merged-fixes
fix/missed-offers-and-debug
debug-and-fixes
poc-wasm-clean-backend-s2s
test/remote-debug
debug-api
dependabot/go_modules/github.com/docker/docker-28.0.0incompatible
fix/remove-gpo-if-empty
fix/test-freebsd
fix/mysql-setup
fix/remove-logout-btn
handle-existing-domain-user
chore/unify-domain-validation
snyk-fix-c5fafc8a50ce1f29046e25a1fc346185
feat/profile-edit-btn
snyk-fix-a54966211e18d4cf67e5a2757cc006d1
log-short-id
feat/logout-ephemeral
log-checks
batch-wg-ops
nb-interface-default
feat/aws-integration
add/race-test
feature/relay-feature-versioning
fix/systemd-service-logs
poc/preprocessed-map
add-account-onboarding
bind-ipv6
fix/merge-main
logs/peerlogs-addpeer
feature/net-297-network-migration
feature/support-skip-auto-apply-exit-node-routes
set-cmd
set-command-with-cursor
feature/limit-update-channel
stop-using-locking-share
feature/poc-lazy-detection
feature/net-248-removal-of-sync-mutex-locks
test/multiple-peer-logging
preresolve
add-ns-punnycode-support
apply-routes-early
windows-search-domains
fix/connecting-route-filter
feature/management/rest-client/impersonate
debug-local-records
resource-fields-snake-case
test/grpc-rate-limit
traffic-correlation-policy
feature/rest-client-options
feat/events-metrics
feature/buf-cli
test/add-ratelimiter
test/remove-write-lock-on-add-peer
fix/add-peer-semaphore
feature/users-roles-endpoint
mlsmaycon-patch-1
debug-user-role
chore/primary-key-on-networks
feature/update-account-peers-buffer-startup
remove-ubuntu2004-runners
refactor/permissions-no-pat-allowed
ref/logrus-factory
use-conntrack-zone
deploy/permissions-account
feature/lazy-connection-idle
ref/improve-test-cov
restore-pr-3440
test/increase-grpc-timeouts
feat/buffer-account-peers-update
test/networkmapgeneration-changes
feature/base-manager
feature/flow-receiver
chore/benchmark-with-large-runner
refactor/handshake-initiator
client/ui-update-systray-icons
userspace-router
wgwatcher-test
output-if-key-already-exists
fix/relay-reconnection
feature/port-forwarding-client-codecleaning
detached2
test/callbacks-nil-iceconninfo
refactor/optimize-peer-expiration
enable-udp-port-for-docker-template
fix/relay-update
feature/apply-posture-netmap
fix/group-update-existing-resource
conntrack-stats
upgrade-okta-sdk
multi-price
test/conn-stat
set-min-parallel-tests-for-management
dns-interceptor
debug-dns
router-dns
add-static-system-info
debug-0.29.4
debug-0.33.0
account-refactoring
relay/2800_quic
route-get-account-refactoring
test/seed-random-routes
feature/get-account-refactoring
test/reconnect-race-condition
refactor/get-account-usage
feature/add-session-id-to-update-channel
improve-ipv4conn
fix/async-pion-event-handling
debug
add-offload
feature/validate-group-association-debug
fix/limit-conn-for-sqlite
test/engine-iface
test/transaction-for-jwt-sync
fix/engine-stop-in-foreground
feature/add-mysql-support
test-migration
refactor/header-size-values
relay/eliminate-gob
test/signal-dispatcher-with-relay
relay/debug
validate-icon
feature/ipv6-support
use-pre-expanded-peers-map
feature/use-signal-dispatcher
validate/peer-status
add-read-write-times
fix/sync-peer-race
feature/relay-status
netmap
evaluate/network-map-hash
fix/lower-dns-resolve-interval-on-fail
feature/relay
fix/go-mod-version
upgrade-nftables
synology-userspace-mode
fix/use-ip-for-default-routes-on-darwin
fix/proxy_close
enable-release-workflow-on-pr
deploy/peer-performance
feature/permanent-turn
feature/permanent-turn-proxy
deploy/posture-check-sqlite
feature/optimize_sqlite_save
debug-ios-behavior
fix/delete-route-only-after-adding
tshoot/windows-logger
remove-new-routing
refactor/eliminate-repo-dependency
add-arm-to-ci
refactor-demo-account-object
test/abc2
test/abc
send-ssh-rosenpass-config-meta
refactor-demo
ensure-schedule-never-runs-non-positive
feature/peer-validator-groupmgm
feature/peer-validator-fix
fix/include-active-dashboard-users
fix/handle-canceling-schedule
fix/geo-download
debug-google-workspace
yury/resolve-ip-to-location
feature/extend-sysinfo
sqlite-async-peer-status
yury/add-postgresql-store
fix/route
test-build
posture-checks-poc
debug-keycloak-idp
poc/netstack
for-pascal-tmp
peer-logout-management
manual-peer-logout
detached
chore/refactor-management
test/dns-bind
fix/enforce-acl-for-containers
yury/use-sync-map-in-updatechannel
fix/events-key-handling
filter-cache-on-load-account
fix/user-expiration
handle-user-context-cancellation
nb-client-k8s-statefulset
fake-addr
fix/iptables_in_docker
ebpf-debug
update-getting-started-flow-use-postgres
fix/peer_list_notification
feature/device-authentication-with-client-secret
feature/keep_alive
feat-groups-from-jwt
separate_proxy_from_wgconfig
fix/wg_conn
wg_conn_fix
wg_bind_parallel_processing
fix-rollback-get-acls
proxy_cfg_cleanup
performance-improvement-rego
update-lock-log-level
feat-client-side-acl
refactor/move_grpcserver_logic_to_account_manager
feature/event-storage
feature/update-idp-redeeming-invite
feature/api-peer-info
return-groupminimum-setupkey
feature/interface-bind
documentation_enhancement
fix-peer-registration
ssh
users_cache
pass-client-caller
client_caller_type
revert-283-feat-fix-windows-installer
periodic-peer-updates
ebpf
braginini/wasm
v0.72.3
v0.72.2
v0.72.1
v0.72.0
v0.71.4
v0.71.3
v0.71.2
v0.71.1
v0.71.0
v0.70.5
v0.70.4
v0.70.3
v0.70.2
v0.70.1
v0.70.0
v0.69.0
v0.68.3
v0.68.2
v0.68.1
v0.68.0
v0.67.4
v0.67.3
v0.67.2
v0.67.1
v0.67.0
v0.66.4
v0.66.3
v0.66.2
v0.66.1
v0.66.0
v0.65.3
v0.65.2
v0.65.1
v0.65.0
v0.64.6
v0.64.5
v0.64.4
v0.64.3
v0.64.2
v0.64.1
v0.64.0
v0.63.0
v0.62.3
v0.62.2
v0.62.1
v0.62.0
v0.61.2
v0.61.1
v0.61.0
v0.60.9
v0.60.8
v0.60.7
v0.60.6
v0.60.5
v0.60.4
v0.60.3
v0.60.2
v0.60.1
v0.60.0
v0.59.13
v0.59.12
v0.59.11
v0.59.10
v0.59.9
v0.59.8
v0.59.7
v0.59.6
v0.59.5
v0.59.4
v0.59.3
v0.59.2
v0.59.1
v0.59.0
v0.58.2
v0.58.1
v0.58.0
v0.57.1
v0.57.0
v0.56.1
v0.56.0
v0.55.1
v0.55.0
v0.54.2
v0.54.1
v0.54.0
v0.53.0
v0.52.2
v0.52.1
v0.52.0
v0.51.2
v0.51.1
v0.51.0
v0.50.3
v0.50.2
v0.50.1
v0.50.0
v0.49.0
v0.48.0-dev2
v0.48.0
v0.47.2
v0.47.1
v0.47.0
v0.46.0
v0.45.3
v0.45.2
v0.45.1
v0.45.0
v0.44.0
v0.43.3
v0.43.2
v0.43.1
v0.43.0
v0.42.0
v0.41.3
v0.41.2
v0.41.1
v0.41.0
v0.40.1
v0.40.0
v0.39.2
v0.39.1
v0.39.0
v0.38.2
v0.38.1
v0.38.0
v0.37.2
v0.37.1
v0.37.0
v0.36.7
v0.36.6
v0.36.5
v0.36.4
v0.36.3
v0.36.2
v0.36.1
v0.36.0
v0.35.2
v0.35.1
v0.35.0
v0.34.1
v0.34.0
v0.33.0
v0.32.0
v0.31.1
v0.31.0
v0.30.3
v0.30.2
v0.30.1
v0.30.0
v0.29.4
v0.29.3
0.29.3
v0.29.2
v0.29.1
v0.29.0
v0.28.9
v0.28.8
v0.28.7
v0.28.6
v0.28.5
v0.28.4
v0.28.3
v0.28.2
v0.28.1
v0.28.0
v0.27.10
v0.27.9
v0.27.8
v0.27.7
v0.27.6
v0.27.5
v0.27.4
v0.27.3
v0.27.2
v0.27.1
v0.27.0
v0.26.7
v0.26.6
v0.26.5
v0.26.4
v0.26.3
v0.26.2
v0.26.1
v0.26.0
v0.25.9
v0.25.8
v0.25.7
v0.25.6
v0.25.5
v0.25.4
v0.25.3
v0.25.2
v0.25.1
v0.25.0
v0.24.4
v0.24.3
v0.24.2
v0.24.1
v0.24.0
v0.23.9
v0.23.8
v0.23.7
v0.23.6
v0.23.5
v0.23.4
v0.23.3
v0.23.2
v0.23.1
v0.23.0
v0.22.7
v0.22.6
v0.22.5
v0.22.4
v0.22.3
v0.22.2
v0.22.1
v0.22.0
v0.21.11
v0.21.10
v0.21.9
v0.21.8
v0.21.7
v0.21.6
v0.21.5
v0.21.4
v0.21.3
v0.21.2
v0.21.1
v0.21.0
v0.20.8
v0.20.7
v0.20.6
v0.20.5
v0.20.4
v0.20.3
v0.20.2
v0.20.1
v0.20.0
v0.19.0
v0.18.1
v0.18.0
v0.17.0
v0.16.0
v0.15.3
v0.15.2
v0.15.1
v0.15.0
v0.14.6
v0.14.5
v0.14.4
v0.14.3
v0.14.2
v0.14.1
v0.14.0
v0.13.0
v0.12.0
v0.11.6
v0.11.5
v0.11.4
v0.11.3
v0.11.2
v0.11.1
v0.11.0
v0.10.10
v0.10.9
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.8
v0.9.7
v0.9.6
v0.9.5
v0.9.4
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.12
v0.8.11
v0.8.10
v0.8.9
v0.8.8
v0.8.7
v0.8.6
v0.8.5
v0.8.4
v0.8.3
v0.8.2
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.4
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.11
v0.5.10
v0.5.1
v0.5.0
v0.4.0
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.3
v0.2.2-beta.1
v0.2.1-beta.5
v0.2.0-beta.5
v0.2.0-beta.4
v0.2.0-beta.3
v0.2.0-beta.2
v0.2.0-beta.1
v0.1.0-beta.3
v0.1.0-beta.2
v0.1.0-beta.1
v0.1.0-rc.2
v0.1.0-rc-1
v0.0.8-hotfix-1
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
v0.0.0
Labels
Clear labels
2021 Q4
2022 Q1
2022 Q1
accessibility
acl
agent
agent
Android
Android
api
authentik
automation
azure
battery-usage
bug
cache
client
client-ui
cloud
cloud-only
cloudflare
community
compatibility
config-idp
config-issue
connection
contribution
coturn
cross-vpn
dashboard
data-usage
distribution
dns
docker
documentation
duplicate
enhancement
enhancement
event-stream
feature-request
freebsd
getting-started
go
good first issue
gui
help wanted
home-assistant
idp
inconsistency
integration
integrations
ios
ipv6
jwt
k8s
keycloak
linux
login
macos
management-service
missing-docs
mobile
moved-internal
needs-review
netbird-ui
networking
new-platform
nginx
notification
okta
openwrt
packaging
peer-management
peer-management
peer-management
performance
postgres
posture-checks
psk
pull-request
question
refactor
relay
release
rfc
routes
security
security-related
self-hosting
server
signal
sleep-issue
ssh
ssl
status
store
synology
system-compatibility-issue
test-suite
third-party-integration
triage
triage-needed
troubleshooting
UX
waiting-feedback
windows
wontfix
zitadel
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
saavagebueno
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: SVI/netbird#1356
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @christian-schlichtherle on GitHub (Oct 21, 2024).
Describe the problem
We are running an IoT project where some Linux based K3s nodes on the edge are located at customer premises and communicate with some other Linux based K3s nodes in the cloud. This project is running for almost three years now. Previously, we have been connecting and managing all nodes via OpenVPN (so we can SSH into every node, even when it's connected at customer premises) and then installed K3s on each node. A few months ago we replaced OpenVPN with Netbird because of its many advantages like performance, peer-to-peer topology with central management etc.
Ever since, we were following updates as soon as possible. We started at 0.27.10 and now we are (or were) at 0.30.2. Unfortunately, starting somewhere between version 0.28.4 and 0.30.2 we started to observe frequent network partitions (disconnects). They would happen randomly after some hours, mostly several times a day, at least once per day, following no particular pattern. I checked many potential causes, including IP address changes which happens to CPE equipment every night (according to the Internet provider's plan), but none of this was the root cause.
Recently I decided to downgrade the network from version 0.30.2 to version 0.28.4 and since then, we didn't have a single network partition / disconnect any more.
To Reproduce
Setup a bunch of nodes and run them 24/7. If you setup the nodes using Ansible, you can discover network partitions like this:
If there is no network partition, then each node produces an empty output, otherwise it lists the nodes it cannot connect to.
Expected behavior
These Linux nodes should stay connected 24/7, real Internet outages aside.
Are you using NetBird Cloud?
Yes
NetBird version
see above
NetBird status -dA output:
n/a
Do you face any (non-mobile) client issues?
n/a
Screenshots
n/a
Additional context
n/a
@wiiun commented on GitHub (Oct 22, 2024):
I also encountered the same situation
@mgarces commented on GitHub (Oct 22, 2024):
hello @christian-schlichtherle thank you for your issue; we will investigate this, also, thank you for your provided commands.
It would be beneficial to have debug logs for those clients, is it possible to turn them on and run them for a long period of time? You can achieve this by following these docs.
Thanks
@mgarces commented on GitHub (Oct 22, 2024):
Hi again! We've found a bug in our reconnection logic, and we are working on improvements for it. We currently have a PullRequest ongoing, would you be willing to test it out before we release it?
@christian-schlichtherle commented on GitHub (Oct 23, 2024):
I could install an update on our development cluster for some limited time over the weekend. Since we are using Ansible, how would I go about installing a pre-release?
@DutchCloud4Work commented on GitHub (Oct 23, 2024):
I`m also experiencing problems, whenever a workstation (Mostly windows) goes into sleep mode (because of lunch or something) it will never reconnect anymore, only fix is a reboot of the entire system.
Started after the upgrade 0.29 to 0.30
@christian-schlichtherle commented on GitHub (Oct 24, 2024):
@DutchCloud4Work you can try
netbird service restartto fix this issue. A reboot should not be required. I'm on macOS however, so your situation may be different.@DutchCloud4Work commented on GitHub (Oct 24, 2024):
@christian-schlichtherle
Doesn`t work, even restarting the service in Windows en restarting the GUI will still give me no connectivity (the gui does say its connected, but no traffic)
@ngtrthanh commented on GitHub (Oct 24, 2024):
I have nearly same problem with disconnect to some peers.
Host: Ubuntu 24.02.
Netbird version: 0.30.2
I got netbird status report in inverse
8 of 13 Accessible Peers on Dashboard but at CLI 5/13.
A down/ up cycle solve problems, but it persisted after 12h.
I want to join test new version or roll back to prev.
@mgarces commented on GitHub (Oct 24, 2024):
hi,
v0.30.3is now live! While we have conducted thorough testing, if you encounter any unusual behaviour, it might be related to this change. Your feedback will be invaluable in ensuring the stability and performance of this update. Please report back if your connectivity issues are resolved or if you encounter any other hurdle.@christian-schlichtherle commented on GitHub (Oct 24, 2024):
Will give it a try on our development cluster over the weekend - thank you so much!
@christian-schlichtherle commented on GitHub (Oct 24, 2024):
Actually, I was installing it now. Here's my findings:
Remote installation with
apt install netbird=0.30.3on the cloud nodes in the dev cluster went well.Installation on the single edge node in the dev cluster using the same command hung up. When I SSH to the node using the LAN port and do
netbird statusI get:So, apparently the service could not get restarted on upgrading from 0.28.4 to 0.30.3. I'm glad this is only a single node in the dev cluster which I can power cycle manually. For the prod cluster, this incident would be a disaster as I would have to call the customers and ask everyone to reboot the edge nodes manually.
Some more diagnostic output:
From journalctl:
Obviously it complains about an invalid argument, but I haven't configured anything different on this node than any other. Maybe this error message is a false positive?
Finally, I was doing a reboot of the node and the problem disappeared.
Now I will start to monitor the stability.
@christian-schlichtherle commented on GitHub (Oct 24, 2024):
PS: I noticed that the troubled edge node has changed it's DNS name to
my-name-1.netbird.cloud(note the additional-1). I'm not sure when that happened and how this is related, if at all.@christian-schlichtherle commented on GitHub (Oct 25, 2024):
Two more hosts failed the remote install, so there is definitely a problem.
@lixmal commented on GitHub (Oct 25, 2024):
@christian-schlichtherle could you provide the log file, please?
netbird debug bundle -AThe log from the zip should be sufficient.
Or, if that fails, just
/var/log/netbird/client.log@DutchCloud4Work commented on GitHub (Oct 25, 2024):
update went without issues at my side (0.29 / 0.30.2 > 0.30.3), seems to have fixed my issue.
After my laptop is now waking up from sleep, after +- 60 it will reconnect
@christian-schlichtherle commented on GitHub (Oct 25, 2024):
@lixmal
After installation, the netbird service is not starting up. I tried several restarts using
netbird service restartas well assystemctl restart netbirdwith no luck. This is the tail of/var/log/netbird/client.logOnly a reboot helped.
@lixmal commented on GitHub (Oct 25, 2024):
Ah, I see. I don't think we get logs.
Can you run
and return the output?
@mlsmaycon commented on GitHub (Oct 25, 2024):
Can you check if there is something at /var/log/netbird/netbird.err
@christian-schlichtherle commented on GitHub (Oct 25, 2024):
BTW: The affected systems all run Ubuntu 24.04.1 LTS.
From the same node, this is
/var/log/netbird/netbird.err:Unfortunately, it doesn't report the time, so this could be entirely unrelated.
@lixmal commented on GitHub (Oct 25, 2024):
@christian-schlichtherle
Can you please run
I need to see the preceding log
@christian-schlichtherle commented on GitHub (Oct 25, 2024):
@lixmal I'm sorry, but isn't that too late already? I rebooted the failing nodes to fix the service startup. Now everything works as designed.
@lixmal commented on GitHub (Oct 25, 2024):
I see. There must've been something gone wrong when creating nftables that was cleared up by the reboot. Thanks
@christian-schlichtherle commented on GitHub (Oct 25, 2024):
@lixmal Maybe there is another approach to isolate the issue: We are using a huge Ansible playbook to setup Netbird, K3s and a ton of other stuff on each node. This ensures that the configuration of the nodes is as identical as possible. That being said, one of the biggest differences is the base image. For the edge devices (which are the ones which have been failing the service restart after installation), we use Ubuntu 24.04.1 LTS. Maybe you can recreate the issue on such nodes exclusively?
@christian-schlichtherle commented on GitHub (Nov 1, 2024):
I'm closing this ticket because the upgrade to 0.31.0 went well for all our nodes in all our environments, including the Ubuntu 24.04.1 LTS nodes. Kudos to the team for their dedication - you rock!
@mlsmaycon commented on GitHub (Nov 1, 2024):
Thanks for the feedback @christian-schlichtherle