mirror of
https://github.com/netbirdio/netbird.git
synced 2026-05-15 20:51:57 -04:00
Peer link is being dropped #1521
Open
opened 2025-11-20 05:32:08 -05:00 by saavagebueno
·
27 comments
No Branch/Tag Specified
main
fix/wireguard-port-zero
windows-dns-firewall
ui-refactor
fix/wgport-config
feature/refactor-clusters
fix/rosenpass
drop-candidateviaroutes-filter
e2e-windows-dns-combined
refactor-combined
wasm-websocket-dial
feature/affected-peers
dependabot/go_modules/github.com/Azure/go-ntlmssp-0.1.1
debug-logs
reduce-embed-wg-pool
dependabot/go_modules/github.com/jackc/pgx/v5-5.9.2
fix/login-cmd-root-flags
feat/reseller-openapi-spec
github-issue-resolver
add-steamos-support
fix-darwin-uninstaller
flutter-test
dependabot/npm_and_yarn/proxy/web/postcss-8.5.12
ci/freebsd-pkg-bootstrap
cached-serial-check-on-sync
fix-mgmt-cache-bypass-overlay
revert-easyjson-5938
revert-ice-5820
revert-firewalld-5928
refactor/permissions-manager
wasm-js-func-release
revert-dns-5935-systemd-resolved
revert-dns-5935-5945
revert-dns-5945-mgmt-cache
feature/log-most-busy-peers
prototype/ui-wails
vnc-server
coderabbitai/utg/8ae8f20
feature/use-peer-fqdn-on-https
dependabot/go_modules/golang.org/x/image-0.38.0
feature/metrics-push-management-control
release/0.68.3
dependabot/go_modules/github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream-1.7.8
dependabot/go_modules/github.com/aws/aws-sdk-go-v2/service/s3-1.97.3
add-slack-channel
claude/rdp-token-passthrough-eNcqW
transparent-proxy
fix/macos-stale-route-eexist
crowdsec-selfhosted
fix/remove-otel-units
entire/checkpoints/v1
dependabot/go_modules/github.com/go-jose/go-jose/v4-4.1.4
fix/getting-started
feat/static-connectors-combined-server
feature/use-local-keys-embedded
feature/fleetdm
set-env-only-if-not-fork
feature/expose-has-channel
fix/connection-status-race
fix/filter-cgnat-cni-ice-candidates
feature/check-cert-locker-before-acme
test/proxy-fixes
test/proxy-mtu
prototype/ui-tauri
test/proxy-speed
fix-reused-ports
feat/migrate-to-embedded-idp
feature/add-serial-to-proxy-merged
deploy/proxy-serial
test/connection
feature/disable-legacy-port
feature/flag-to-disable-legacy-port
test/perftest
dependabot/go_modules/github.com/pion/dtls/v3-3.0.11
fix/http-redirect
poc-token-command
dn-reverse-proxy
prototype/reverse-proxy-rename
prototype/reverse-proxy-logs-pagination
feature/client-metrics
prototype/reverse-proxy-clusters
debug-dns-route
fix/win-dns-batch
add-extra-route-logs
job-stream-notify-disconnection-eof
deploy/secrets-manager
trigger-proxy-update
bug/update-ios-client-code-build-tags
sync-client-netmap-serial
log/conn-disconn
nmap/compaction-deploy
ci-win-test
feature/disk-encryption-check
wasm-debug
swap-dns-prio
fix/dex-config
feature/migrate-auto-groups-to-table
dependabot/go_modules/github.com/quic-go/quic-go-0.57.0
nmap/compaction
dex-nocgo-stub
feature/exclude-terraform-from-rate-limiting
test-freebsd
retries-refactor
coderabbitai/docstrings/b7e98ac
feat/integrate-zitadel
bug/ios-hanging-reconection
zitadel-idp
feat/network-map-serial
refactor/get-account-no-users
feat/auto-upgrade
feature/report-high-pat-id
feature/temporary-access-for-resource
fix/nmap-fwrules
dont-restart-dns
prototype/ui
update-gomobile
go-dns-for-ice
wasm-ldflags
test-ldflags
wasmbuild-test
feature/networks-s2s
vk/compare-nmaps
dbg/bothmaps
feature/changeset
reorder-dns-shutdown
fix/relay-reconnection-race
fix/nmap-exitnodes
vk/debug/nmap-both
move-licensed-code
feat/better-daemon-connection-lost-message
feat/auto-update-2
test/timings
refactor/getaccount-raw
tests/nmap-getaccount
refactor/nmap
refactor/nmap-limit-buffer
feature/detect-mac-wakeup
feature/extract-modules
quick-setings
feat/sync-limiter
feature/store-cache-impl
fix-install-version
feature/store-metrics
feature/metrics-on-store
feature/use-gorm-cache
loadtest-signal
unsymmetrical-squash
refactor/reducate-signaling
test/update-reduce
feature/store-cache
feature/remote-debug
cli-ws-proxy-backend-addr
feat/mgmt-map-serial
snyk-fix-d9d0081a4c7f9137bdb59d0d50a141a2
snyk-fix-7415cea5a11acd66753540ca2c598c63
job-yml-update
feature/android-allow-selecting-routes
fix/up-sequence
fix/dns-hash-update
snyk-fix-967adae9863f17f108ce8948d9117b8d
log/getaccount-by-peer
signal-suppressor
dns-exit-node
feature/auto-updates
feature/cache-srv-key
merged-fixes
fix/missed-offers-and-debug
debug-and-fixes
poc-wasm-clean-backend-s2s
test/remote-debug
debug-api
dependabot/go_modules/github.com/docker/docker-28.0.0incompatible
fix/remove-gpo-if-empty
fix/test-freebsd
fix/mysql-setup
fix/remove-logout-btn
handle-existing-domain-user
chore/unify-domain-validation
snyk-fix-c5fafc8a50ce1f29046e25a1fc346185
feat/profile-edit-btn
snyk-fix-a54966211e18d4cf67e5a2757cc006d1
log-short-id
feat/logout-ephemeral
log-checks
batch-wg-ops
nb-interface-default
feat/aws-integration
add/race-test
feature/relay-feature-versioning
fix/systemd-service-logs
poc/preprocessed-map
add-account-onboarding
bind-ipv6
fix/merge-main
logs/peerlogs-addpeer
feature/net-297-network-migration
feature/support-skip-auto-apply-exit-node-routes
set-cmd
set-command-with-cursor
feature/limit-update-channel
stop-using-locking-share
feature/poc-lazy-detection
feature/net-248-removal-of-sync-mutex-locks
test/multiple-peer-logging
preresolve
add-ns-punnycode-support
apply-routes-early
windows-search-domains
fix/connecting-route-filter
feature/management/rest-client/impersonate
debug-local-records
resource-fields-snake-case
test/grpc-rate-limit
traffic-correlation-policy
feature/rest-client-options
feat/events-metrics
feature/buf-cli
test/add-ratelimiter
test/remove-write-lock-on-add-peer
fix/add-peer-semaphore
feature/users-roles-endpoint
mlsmaycon-patch-1
debug-user-role
chore/primary-key-on-networks
feature/update-account-peers-buffer-startup
remove-ubuntu2004-runners
refactor/permissions-no-pat-allowed
ref/logrus-factory
use-conntrack-zone
deploy/permissions-account
feature/lazy-connection-idle
ref/improve-test-cov
restore-pr-3440
test/increase-grpc-timeouts
feat/buffer-account-peers-update
test/networkmapgeneration-changes
feature/base-manager
feature/flow-receiver
chore/benchmark-with-large-runner
refactor/handshake-initiator
client/ui-update-systray-icons
userspace-router
wgwatcher-test
output-if-key-already-exists
fix/relay-reconnection
feature/port-forwarding-client-codecleaning
detached2
test/callbacks-nil-iceconninfo
refactor/optimize-peer-expiration
enable-udp-port-for-docker-template
fix/relay-update
feature/apply-posture-netmap
fix/group-update-existing-resource
conntrack-stats
upgrade-okta-sdk
multi-price
test/conn-stat
set-min-parallel-tests-for-management
dns-interceptor
debug-dns
router-dns
add-static-system-info
debug-0.29.4
debug-0.33.0
account-refactoring
relay/2800_quic
route-get-account-refactoring
test/seed-random-routes
feature/get-account-refactoring
test/reconnect-race-condition
refactor/get-account-usage
feature/add-session-id-to-update-channel
improve-ipv4conn
fix/async-pion-event-handling
debug
add-offload
feature/validate-group-association-debug
fix/limit-conn-for-sqlite
test/engine-iface
test/transaction-for-jwt-sync
fix/engine-stop-in-foreground
feature/add-mysql-support
test-migration
refactor/header-size-values
relay/eliminate-gob
test/signal-dispatcher-with-relay
relay/debug
validate-icon
feature/ipv6-support
use-pre-expanded-peers-map
feature/use-signal-dispatcher
validate/peer-status
add-read-write-times
fix/sync-peer-race
feature/relay-status
netmap
evaluate/network-map-hash
fix/lower-dns-resolve-interval-on-fail
feature/relay
fix/go-mod-version
upgrade-nftables
synology-userspace-mode
fix/use-ip-for-default-routes-on-darwin
fix/proxy_close
enable-release-workflow-on-pr
deploy/peer-performance
feature/permanent-turn
feature/permanent-turn-proxy
deploy/posture-check-sqlite
feature/optimize_sqlite_save
debug-ios-behavior
fix/delete-route-only-after-adding
tshoot/windows-logger
remove-new-routing
refactor/eliminate-repo-dependency
add-arm-to-ci
refactor-demo-account-object
test/abc2
test/abc
send-ssh-rosenpass-config-meta
refactor-demo
ensure-schedule-never-runs-non-positive
feature/peer-validator-groupmgm
feature/peer-validator-fix
fix/include-active-dashboard-users
fix/handle-canceling-schedule
fix/geo-download
debug-google-workspace
yury/resolve-ip-to-location
feature/extend-sysinfo
sqlite-async-peer-status
yury/add-postgresql-store
fix/route
test-build
posture-checks-poc
debug-keycloak-idp
poc/netstack
for-pascal-tmp
peer-logout-management
manual-peer-logout
detached
chore/refactor-management
test/dns-bind
fix/enforce-acl-for-containers
yury/use-sync-map-in-updatechannel
fix/events-key-handling
filter-cache-on-load-account
fix/user-expiration
handle-user-context-cancellation
nb-client-k8s-statefulset
fake-addr
fix/iptables_in_docker
ebpf-debug
update-getting-started-flow-use-postgres
fix/peer_list_notification
feature/device-authentication-with-client-secret
feature/keep_alive
feat-groups-from-jwt
separate_proxy_from_wgconfig
fix/wg_conn
wg_conn_fix
wg_bind_parallel_processing
fix-rollback-get-acls
proxy_cfg_cleanup
performance-improvement-rego
update-lock-log-level
feat-client-side-acl
refactor/move_grpcserver_logic_to_account_manager
feature/event-storage
feature/update-idp-redeeming-invite
feature/api-peer-info
return-groupminimum-setupkey
feature/interface-bind
documentation_enhancement
fix-peer-registration
ssh
users_cache
pass-client-caller
client_caller_type
revert-283-feat-fix-windows-installer
periodic-peer-updates
ebpf
braginini/wasm
v0.71.1
v0.71.0
v0.70.5
v0.70.4
v0.70.3
v0.70.2
v0.70.1
v0.70.0
v0.69.0
v0.68.3
v0.68.2
v0.68.1
v0.68.0
v0.67.4
v0.67.3
v0.67.2
v0.67.1
v0.67.0
v0.66.4
v0.66.3
v0.66.2
v0.66.1
v0.66.0
v0.65.3
v0.65.2
v0.65.1
v0.65.0
v0.64.6
v0.64.5
v0.64.4
v0.64.3
v0.64.2
v0.64.1
v0.64.0
v0.63.0
v0.62.3
v0.62.2
v0.62.1
v0.62.0
v0.61.2
v0.61.1
v0.61.0
v0.60.9
v0.60.8
v0.60.7
v0.60.6
v0.60.5
v0.60.4
v0.60.3
v0.60.2
v0.60.1
v0.60.0
v0.59.13
v0.59.12
v0.59.11
v0.59.10
v0.59.9
v0.59.8
v0.59.7
v0.59.6
v0.59.5
v0.59.4
v0.59.3
v0.59.2
v0.59.1
v0.59.0
v0.58.2
v0.58.1
v0.58.0
v0.57.1
v0.57.0
v0.56.1
v0.56.0
v0.55.1
v0.55.0
v0.54.2
v0.54.1
v0.54.0
v0.53.0
v0.52.2
v0.52.1
v0.52.0
v0.51.2
v0.51.1
v0.51.0
v0.50.3
v0.50.2
v0.50.1
v0.50.0
v0.49.0
v0.48.0-dev2
v0.48.0
v0.47.2
v0.47.1
v0.47.0
v0.46.0
v0.45.3
v0.45.2
v0.45.1
v0.45.0
v0.44.0
v0.43.3
v0.43.2
v0.43.1
v0.43.0
v0.42.0
v0.41.3
v0.41.2
v0.41.1
v0.41.0
v0.40.1
v0.40.0
v0.39.2
v0.39.1
v0.39.0
v0.38.2
v0.38.1
v0.38.0
v0.37.2
v0.37.1
v0.37.0
v0.36.7
v0.36.6
v0.36.5
v0.36.4
v0.36.3
v0.36.2
v0.36.1
v0.36.0
v0.35.2
v0.35.1
v0.35.0
v0.34.1
v0.34.0
v0.33.0
v0.32.0
v0.31.1
v0.31.0
v0.30.3
v0.30.2
v0.30.1
v0.30.0
v0.29.4
v0.29.3
0.29.3
v0.29.2
v0.29.1
v0.29.0
v0.28.9
v0.28.8
v0.28.7
v0.28.6
v0.28.5
v0.28.4
v0.28.3
v0.28.2
v0.28.1
v0.28.0
v0.27.10
v0.27.9
v0.27.8
v0.27.7
v0.27.6
v0.27.5
v0.27.4
v0.27.3
v0.27.2
v0.27.1
v0.27.0
v0.26.7
v0.26.6
v0.26.5
v0.26.4
v0.26.3
v0.26.2
v0.26.1
v0.26.0
v0.25.9
v0.25.8
v0.25.7
v0.25.6
v0.25.5
v0.25.4
v0.25.3
v0.25.2
v0.25.1
v0.25.0
v0.24.4
v0.24.3
v0.24.2
v0.24.1
v0.24.0
v0.23.9
v0.23.8
v0.23.7
v0.23.6
v0.23.5
v0.23.4
v0.23.3
v0.23.2
v0.23.1
v0.23.0
v0.22.7
v0.22.6
v0.22.5
v0.22.4
v0.22.3
v0.22.2
v0.22.1
v0.22.0
v0.21.11
v0.21.10
v0.21.9
v0.21.8
v0.21.7
v0.21.6
v0.21.5
v0.21.4
v0.21.3
v0.21.2
v0.21.1
v0.21.0
v0.20.8
v0.20.7
v0.20.6
v0.20.5
v0.20.4
v0.20.3
v0.20.2
v0.20.1
v0.20.0
v0.19.0
v0.18.1
v0.18.0
v0.17.0
v0.16.0
v0.15.3
v0.15.2
v0.15.1
v0.15.0
v0.14.6
v0.14.5
v0.14.4
v0.14.3
v0.14.2
v0.14.1
v0.14.0
v0.13.0
v0.12.0
v0.11.6
v0.11.5
v0.11.4
v0.11.3
v0.11.2
v0.11.1
v0.11.0
v0.10.10
v0.10.9
v0.10.8
v0.10.7
v0.10.6
v0.10.5
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.8
v0.9.7
v0.9.6
v0.9.5
v0.9.4
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.12
v0.8.11
v0.8.10
v0.8.9
v0.8.8
v0.8.7
v0.8.6
v0.8.5
v0.8.4
v0.8.3
v0.8.2
v0.8.1
v0.8.0
v0.7.1
v0.7.0
v0.6.4
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.11
v0.5.10
v0.5.1
v0.5.0
v0.4.0
v0.3.5
v0.3.4
v0.3.3
v0.3.2
v0.3.1
v0.3.0
v0.2.3
v0.2.2-beta.1
v0.2.1-beta.5
v0.2.0-beta.5
v0.2.0-beta.4
v0.2.0-beta.3
v0.2.0-beta.2
v0.2.0-beta.1
v0.1.0-beta.3
v0.1.0-beta.2
v0.1.0-beta.1
v0.1.0-rc.2
v0.1.0-rc-1
v0.0.8-hotfix-1
v0.0.8
v0.0.7
v0.0.6
v0.0.5
v0.0.4
v0.0.3
v0.0.2
v0.0.1
v0.0.0
Labels
Clear labels
2021 Q4
2022 Q1
2022 Q1
accessibility
acl
agent
agent
Android
Android
api
authentik
automation
azure
battery-usage
bug
cache
client
client-ui
cloud
cloud-only
cloudflare
community
compatibility
config-idp
config-issue
connection
contribution
coturn
cross-vpn
dashboard
data-usage
distribution
dns
docker
documentation
duplicate
enhancement
enhancement
event-stream
feature-request
freebsd
getting-started
go
good first issue
gui
help wanted
home-assistant
idp
inconsistency
integration
integrations
ios
ipv6
jwt
k8s
keycloak
linux
login
macos
management-service
missing-docs
mobile
moved-internal
needs-review
netbird-ui
networking
new-platform
nginx
notification
okta
openwrt
packaging
peer-management
peer-management
peer-management
performance
postgres
posture-checks
psk
pull-request
question
refactor
relay
release
rfc
routes
security
security-related
self-hosting
server
signal
sleep-issue
ssh
ssl
status
store
synology
system-compatibility-issue
test-suite
third-party-integration
triage
triage-needed
troubleshooting
UX
waiting-feedback
windows
wontfix
zitadel
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
saavagebueno
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: SVI/netbird#1521
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @bmansfie on GitHub (Dec 28, 2024).
Describe the problem
I have netbird installed as an overlay network. I have an ingress server and another server in another location. Neither are behind NAT. As far as I can tell everything is working properly, and generally does. The overlay peer network got dropped and the ingress server stopped talking to the other server. I noticed it quickly because of my host down alerts. I had to restart netbird on the ingress server to get the peer connection back up.
I've been running netbird for more than a year.
To Reproduce
Steps to reproduce the behavior:
1: Run netbird for a long amount of time
2: Monitor connections for loss
3: Run upgrades and configuration changes
I suspect that this happens around an upgrade or configuration change somewhere in the overall system. I am not certain as this only happens rarely. I suspect that there are bugs, probably race conditions, in the teardown and setup procedures that create this condition.
Expected behavior
Does not drop peers.
Are you using NetBird Cloud?
Self-hosted
NetBird version
Ingress was 0.29.2 on this latest time (has been observed with numerous versions), and the other server was on 0.35.0.
Additional context
I don't have time to track this down and be more specific as it's not consistent. This last incident was a production outage that I never had with nebula, so I'm switching back. The system looks nice and I've seen a lot of improvements. But I need reliability above all else and I haven't found it here. Good luck.
@rihards-simanovics commented on GitHub (Dec 28, 2024):
Hey just to chime in, based on my testing, on linux peers with versions equal or above 0.34.0 after about 5min the connection drops without recovering. In my case this is the setup:
Management Server :
netbird-mgmt version 0.35.1(docker),Peer Server 1:
0.34.0-0.35.1Peer Server 2:
0.34.0-0.35.1Peer Server 3:
0.34.0-0.35.1All servers run on static IPs and all three peers would be running the same version of the client.
Peer Server 1would drop the connection to other server peers roughly after 5 min from starting Netbird.I have already attempted adding common allow All UDP ports but no use. So essentially even if we assume that management and peer servers running the same
0.35.1one server will always fail after some time, specificallyPeer Server 1. Just to clarify I've been running Netbird since version 0.27.0, and everything was working fine up until recently.I can get the logs however as it is on live environment it will generate downtime so I will have to wait until maintenance window which will be some time in early January. In the meantime I will try to get at least logs for the
Peer server 1so there is at least some data.@hadleyrich commented on GitHub (Dec 28, 2024):
Interesting timing. I've been seeing some more link instability in the last few days. Since 0.35 maybe. Requiring a restart of some peers to reconnect. Sometimes they think they are connected but are not passing traffic.
I did have some stability issues back around pre-0.20 or so and required restarting clients. Then things have been quite stable for the last many months.
I know this is very vague and doesn't provide useful information in of itself but just wanted to add in my anecdotal experience that the current instability hasn't shown up in my environment for quite some time.
@hadleyrich commented on GitHub (Dec 28, 2024):
Logs from a peer at the time it dropped off:
@rihards-simanovics commented on GitHub (Dec 29, 2024):
Hey @hadleyrich, I agree that's pretty much what I've been battling with for the past couple of weeks. I have a load balancer which uses the VPN to connect to various other VPS peers so that we can have a simple
HTTPreverse proxy on port:80. As of0.34.0, the load balancer drops the connection to the other VPS peers without retrying to connect, needing a manual restart of the Netbird client.That's pretty much my experience. I joined at around version
0.27.0, I think. I fully converted from a traditional VPN by around0.28.0, and things were relatively stable, so I stayed. That said, I think they need to have a nightly and stable release at this point, as I agree with @bmansfie having this run in production, I, first before anything, need stability. Yesterday had a 2-hour downtime because the0.29.4client did something when I was applying the access policy and took down all external ports, which absolutely wrecked all my DNS server and all DNS records for a good 4 hours; thankfully, nowadays, it only takes around 2 hours to re-propagate. That said, I'd like for that not to happen again...I wouldn't really call it "anecdotal". I have a monthly maintenance window during which I upgrade all of the packages on the OS, so when I do eventually upgrade, I may jump many minor and patch releases. Because things were more or less stable, I had no issues upgrading to the latest. Right now, all of my servers are sitting on a downgraded version of
0.33.0as it seems to be the last stable release, at least for the previous 24, before it was 0.29.4. That said, after yesterday, I am fearful of all versions 😅.@hadleyrich commented on GitHub (Dec 30, 2024):
I just noticed on a peer that had lost communication with another peer that "Last WireGuard handshake" was hours old and "Last connection update" was minutes so it certainly points to something at the WG level becoming out of sync.
I think you're probably right, I think I probably saw stability issues reappearing around 0.34. I had become quite (probably overly) comfortable with the level of stability over the past months and been happily tracking the latest releases. I don't yet run netbird in a production setting. More of a long term stability test on my homelab "production" services before deploying to real customer facing workloads.
@freebs65 commented on GitHub (Dec 30, 2024):
Hmm.. it's funny I have one machine that drops and it's a Windows Server 2022 .. I don't see other clients stop. A simple restart fixes it, but i have to do every day. I have Linux clients and an older Windows SBS server all seem to be ok..Also have Windows 11 clients.. again seem fine.. even my arch desktop is fine. Very odd.
@hadleyrich commented on GitHub (Dec 31, 2024):
Another data point. A long running ping in screen to keep traffic going over the link appears to keep the peer connected.
@rihards-simanovics commented on GitHub (Dec 31, 2024):
It seems like the issue is with the WireGuard handshake. For instance, my Windows 11 PC seemingly struggles to connect to other Linux Server Peers despite everything running the latest Netbird version, in this case, Netbird 0.35.2. One of my Load Balancer servers running Ubuntu 22.04 just refuses to keep the connection to other Linux servers for longer than 5 minutes before dying and needing to be restarted. I don't know what I'm doing wrong, but I always update the management server first and only then move on to the client nodes, first on the Linux servers and then on devices such as PCs/Laptops/Phones.
@rihards-simanovics commented on GitHub (Jan 1, 2025):
Hi Everyone, happy New Year!
Hey @mlsmaycon, sorry to ping you directly. Would you like me to run the same steps as listed last time? I will email the logs so you have a better picture. I am approaching a maintenance window for all our org servers and will be able to run a full debugging trace like last time. Also, I need to know if the logging persists across client updates or whether I need to run it first on the old version and then after the upgrade.
@hadleyrich commented on GitHub (Jan 1, 2025):
I think (in my case at least) this appears to be something triggered by, or relating to relaying.
Previously I was not running the relay in my set up and only running coturn. The peer I was having most trouble with was connecting over relay.
Adding in the new relay service appears to have made that peer more stable for the last 12 hours or so.
@rihards-simanovics commented on GitHub (Jan 1, 2025):
Hmm, interesting. In my case, I am already running a new relay service. Strangely, some client versions seem to overuse the relay, and some underuse it; since
0.35.0, the client seems to bypass it altogether and go straight for P2P.Okay, you know what? It's late at night here in the UK, so let me try upgrading and getting at least some logs.
@rihards-simanovics commented on GitHub (Jan 2, 2025):
Ok, without looking at the trace logs generated by the client, my anecdotal research log shows this:
@mlsmaycon I've collected a full trace from UK1 and UK7 using the method listed in https://github.com/netbirdio/netbird/issues/3112#issuecomment-2562361089 and am now parsing it to see if there is anything obvious. I will send it to the support email once I've reviewed everything.
@fiikra commented on GitHub (Jan 7, 2025):
We're encountering the same issue with our Netbird instance. Version 0.33 was incredibly stable for over 90 days, maintaining continuous communication with our peer. However, after upgrading directly to version 0.35, we observed the same problem that @hadleyrich mentioned starting from version 0.34. Although the peer is online and connected to our server, there is no communication. It seems the bug might have been introduced in that version. We've debugged the issue and found a temporary workaround: disabling and re-enabling the policy in access control, which restores communication. I'm happy to provide more details to help resolve this.
@TomSipacom commented on GitHub (Jan 14, 2025):
We have the same issue with some peers. @fiikra I have tested your workaround but this is not working.
when I check the peer with the command netbird status --detail and I check the peer where I have no connection to.
I see it's connected but no Last WireGuard handshake.
Status: Connected
-- detail --
Connection type: Relayed
ICE candidate (Local/Remote): -/-
ICE candidate endpoints (Local/Remote): -/-
Relay server address: rel://vpn.example.com:33080 <-- here stands my real domain, obviously
Last connection update: 25 seconds ago
Last WireGuard handshake: -
Transfer status (received/sent) 0 B/740 B
Quantum resistance: false
Routes: -
Networks: -
Latency: 0s
and I have other peers that just work fine
Status: Connected
-- detail --
Connection type: Relayed
ICE candidate (Local/Remote): -/-
ICE candidate endpoints (Local/Remote): -/-
Relay server address: rel://vpn.example.com:33080 <-- here stands my real domain, obviously
Last connection update: 9 minutes, 33 seconds ago
Last WireGuard handshake: 1 minute, 27 seconds ago
Transfer status (received/sent) 22.0 KiB/16.7 KiB
Quantum resistance: false
Routes: -
Networks: -
Latency: 0s
It worked before and all peers mentioned are using version 0.35.2.
When I reinstall my client with version 0.27.3 it works.
My own peer is installed on windows, the 2 peers from the example are linux
@rihards-simanovics commented on GitHub (Jan 18, 2025):
Hey @mlsmaycon, I am beginning to get really frustrated by this. We are getting new releases that introduce more features, but none address the issue of the peer link being dropped. I've sent an email to support with the attached debug trace logs for two servers that keep dropping links after upgrading. Has anyone looked at it? I really don't want to be that kind of person, but at this point, I'm getting frustrated enough that I am this 🤏🏻 far away from trying and perhaps even switching to headscale.
@the-project-group commented on GitHub (Jan 20, 2025):
Can you guys check if you have "redundant" ACLs like:
Toggling one of the ACLs off / on brings connectivity back for me:
@rihards-simanovics commented on GitHub (Jan 20, 2025):
I just spoke with someone from the head scale community who has used Netbird before. They suggested disabling
rosenpassandrosenpass-permissivemodes on the affected clients. After doing that and upgrading all clients, the issue appears to have disappeared—though I will keep monitoring it. I assume the problem is somewhere in the rotation of the rosenpass keys, as the peers drop connection almost exactly 5 minutes after establishing a link.@mlsmaycon commented on GitHub (Jan 20, 2025):
@rihards-simanovics did you have peers with different versions of NetBird and rosenpass enabled? After the upgrade, did you enable rosenpass again?
@rihards-simanovics commented on GitHub (Jan 20, 2025):
Hi @mlsmaycon, thanks for replying. No, the versions were precisely the same across all peers when the issue occurred. To better illustrate the environment, all 9 peers (Ubuntu 22.04/24.04 servers):
0.36.2or.3,0.33.0within roughly 2 minutes of one another,rosenpassandrosenpass-premissiveflags set totrue,When the peer dropped the connection, which happened roughly every 5 minutes, I restarted all of the servers so that if there was anything strange with the OS, it would have been accounted for. However, after around 10 minutes of things going down, I had to revert back to 0.33.0.
By the way, here is a quick update on the stability after disabling
rosenpassandrosenpass-permissivemode; all 9 peers have been running0.36.3since I posted this comment and nothing dropped connection yet.@drixtol commented on GitHub (Jan 22, 2025):
Piping in to state that my org is also having this same issue. We do not have rosenpass options enabled.
All windows clients, all on versions >0.34.1. Only a subset of users are having issues, and only users who have authentication; Expiration disabled clients have not had any issues.
Server version is 0.35.2; have upgraded several times trying to resolve this issue.
Typically the effected clients are when coming back from idle, but can be a fresh connection. Wireguard handshake never completes.
If i change the peer group it immediately resolves the handshake issue.
@Bonnevie commented on GitHub (Feb 3, 2025):
I am also affected by this issue (as far I can tell), with
netbird status -dsporadically showing no recent WireGuard handshake. The issues are with a specific ssh-enabled server only, the other peers in the network always seem to have recent handshakes listed. The workaround with cycling the access control policy seems to work.My computer is on the bottom, the problematic peer up top. I have tried various versions though, and a colleague on 35.2 can connect without issue.

@ugurtam commented on GitHub (Feb 14, 2025):
Hi,
Same issues no rosenpass or rosenpass-premissive activated. My practical fix is to disable and enable policy in the problem group. But need a solution, we can't do that everyday
@SuperKali commented on GitHub (May 22, 2025):
I think i have the same issue, my issue is that Peer 1 is the node from which I access the resources of Peer 2. However, if I reboot Peer 2 for maintenance, Peer 1 can no longer access the subnets on Peer 2—unless I also reboot Peer 1. Only after restarting Peer 1 do the resources on Peer 2 become accessible again.
@pscriptos commented on GitHub (Jun 10, 2025):
I have exactly the same problem. Thanks for the tip with peer number 1, I have just restarted it and now access to the resource is working again for the time being.
@nazarewk commented on GitHub (Jun 10, 2025):
@pscriptos @SuperKali Can you update to
0.46.0, watch out for the issue/further new versions and report back with results after some time?We have identified some form of race condition that was partially fixed in https://github.com/netbirdio/netbird/pull/3910 and is still being worked on in https://github.com/netbirdio/netbird/pull/3929
@SuperKali commented on GitHub (Jun 10, 2025):
Hi @nazarewk,
thanks for mentioning me. I usually keep all peers updated to the latest version to check if the issue has already been resolved. Unfortunately, recently—and randomly—my Uptime Kuma reported a loss of connectivity to some VPCs from one of the peers, and I had to restart the current peer using netbird service restart.
Let me know if you need any additional information.
Thanks again!
@pscriptos commented on GitHub (Jun 10, 2025):
All affected peers have already been updated to version 0.46.0 a few days ago.
When I got stuck with my problem, I looked at this Github repo and saw that at that time the update to version 0.46.0 had just been out for 3 hours. However, I also wrote about it here: #3699