Critical: peer connection issue with 0.37.0 and 0.37.1 #1658

Closed
opened 2025-11-20 06:04:15 -05:00 by saavagebueno · 9 comments
Owner

Originally created by @IanMoroney on GitHub (Feb 27, 2025).

Describe the problem

Peers when upgraded from 0.36.6 or 0.36.7 to 0.37.0 or 0.37.1 are unable to connect.

When I downgrade to 0.36.6 or 0.36.7, they are able to connect again.

Debug log from 0.37.0 and 0.37.1 (they are the same)

2025-02-27T11:57:17Z INFO client/internal/config.go:188: generating new config /etc/netbird/config.json
2025-02-27T11:57:17Z INFO client/internal/config.go:254: using default Management URL https://api.netbird.io:443
2025-02-27T11:57:17Z INFO client/internal/config.go:261: new Management URL provided, updated to "https://endpoint.my-domain.com:443" (old value "https://api.netbird.io:443")
2025-02-27T11:57:17Z INFO client/internal/config.go:278: using default Admin URL https://api.netbird.io:443
2025-02-27T11:57:17Z INFO client/internal/config.go:285: new Admin Panel URL provided, updated to "https://dashboard-endpoint.my-domain.com:443" (old value "https://app.netbird.io:443")
2025-02-27T11:57:17Z INFO client/internal/config.go:296: generated new Wireguard key
2025-02-27T11:57:17Z INFO client/internal/config.go:302: generated new SSH key
2025-02-27T11:57:17Z INFO client/internal/config.go:318: using default Wireguard port 51820
2025-02-27T11:57:17Z INFO client/internal/config.go:329: using default Wireguard interface wt0
2025-02-27T11:57:17Z INFO client/internal/config.go:382: filling in interface blacklist with defaults: [ wt0 wt utun tun0 zt ZeroTier wg ts Tailscale tailscale docker veth br- lo ]
2025-02-27T11:57:17Z INFO client/internal/config.go:428: using default DNS route interval 1m0s
2025-02-27T11:57:17Z DEBG client/internal/login.go:94: connecting to the Management service https://endpoint.my-domain.com:443
2025-02-27T11:57:17Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T11:57:19Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T11:57:20Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T11:57:22Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T11:57:27Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T11:57:33Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T11:57:43Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T11:57:47Z INFO util/grpc/dialer.go:89: DialContext error: context deadline exceeded
2025-02-27T11:57:47Z INFO management/client/grpc.go:57: createConnection error: context deadline exceeded
2025-02-27T11:57:47Z ERRO management/client/grpc.go:65: failed creating connection to Management Service: context deadline exceeded
2025-02-27T11:57:47Z ERRO client/internal/login.go:97: failed connecting to the Management service https://endpoint.my-domain.com:443 context deadline exceeded

Debug log from 0.36.7:

2025-02-27T12:05:23Z INFO client/internal/config.go:178: generating new config /etc/netbird/config.json
2025-02-27T12:05:23Z INFO client/internal/config.go:244: using default Management URL https://api.netbird.io:443
2025-02-27T12:05:23Z INFO client/internal/config.go:251: new Management URL provided, updated to "https://endpoint.my-domain.com:443" (old value "https://api.netbird.io:443")
2025-02-27T12:05:23Z INFO client/internal/config.go:268: using default Admin URL https://api.netbird.io:443
2025-02-27T12:05:23Z INFO client/internal/config.go:275: new Admin Panel URL provided, updated to "https://dashboard-endpoint.my-domain.com:443" (old value "https://app.netbird.io:443")
2025-02-27T12:05:23Z INFO client/internal/config.go:286: generated new Wireguard key
2025-02-27T12:05:23Z INFO client/internal/config.go:292: generated new SSH key
2025-02-27T12:05:23Z INFO client/internal/config.go:308: using default Wireguard port 51820
2025-02-27T12:05:23Z INFO client/internal/config.go:319: using default Wireguard interface wt0
2025-02-27T12:05:23Z INFO client/internal/config.go:372: filling in interface blacklist with defaults: [ wt0 wt utun tun0 zt ZeroTier wg ts Tailscale tailscale docker veth br- lo ]
2025-02-27T12:05:23Z INFO client/internal/config.go:418: using default DNS route interval 1m0s
2025-02-27T12:05:23Z DEBG client/internal/login.go:94: connecting to the Management service https://endpoint.my-domain.com:443
2025-02-27T12:05:23Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T12:05:23Z DEBG client/internal/login.go:64: connected to the Management service https://endpoint.my-domain.com:443
2025-02-27T12:05:23Z ERRO management/client/grpc.go:350: failed to login to Management Service: rpc error: code = PermissionDenied desc = no peer auth method provided, please use a setup key or interactive SSO login
2025-02-27T12:05:23Z DEBG client/internal/login.go:73: peer registration required
2025-02-27T12:05:23Z DEBG client/internal/login.go:94: connecting to the Management service https://endpoint.my-domain.com:443
2025-02-27T12:05:23Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T12:05:23Z DEBG client/internal/login.go:64: connected to the Management service https://endpoint.my-domain.com:443
2025-02-27T12:05:23Z ERRO management/client/grpc.go:350: failed to login to Management Service: rpc error: code = PermissionDenied desc = no peer auth method provided, please use a setup key or interactive SSO login
2025-02-27T12:05:23Z DEBG client/internal/login.go:73: peer registration required
2025-02-27T12:05:23Z DEBG client/internal/login.go:132: sending peer registration request to Management Service
2025-02-27T12:05:24Z INFO client/internal/login.go:149: peer has been successfully registered on Management Service
2025-02-27T12:05:24Z INFO client/internal/connect.go:111: starting NetBird client version 0.36.7 on linux/amd64
2025-02-27T12:05:24Z INFO util/net/env_linux.go:61: advanced routing has been requested to be disabled
2025-02-27T12:05:24Z DEBG client/internal/connect.go:168: connecting to the Management service endpoint.my-domain.com:443
2025-02-27T12:05:24Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T12:05:24Z DEBG client/internal/connect.go:176: connected to the Management service endpoint.my-domain.com:443
2025-02-27T12:05:24Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T12:05:24Z DEBG signal/client/grpc.go:83: connected to Signal Service: endpoint.my-domain.com:443
2025-02-27T12:05:24Z INFO client/internal/connect.go:245: connecting to the Relay service(s): rels://endpoint.my-domain.com:443/relay
2025-02-27T12:05:24Z DEBG relay/client/manager.go:109: starting relay client manager with [rels://endpoint.my-domain.com:443/relay] relay servers
2025-02-27T12:05:24Z DEBG relay/client/picker.go:45: pick server from list: [rels://endpoint.my-domain.com:443/relay]
2025-02-27T12:05:24Z INFO relay/client/picker.go:72: try to connecting to relay server: rels://endpoint.my-domain.com:443/relay
2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/client.go:164: create new relay connection: local peerID: cs3HWssRaz3hDNluAEgncvDIwt2H2EbkEarGt70taUo=, local peer hashedID: sha-2oE+eU2xI4ryfsf09cCzkZZ06dqldRPmS2sspJMFo/o=
2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/client.go:170: connecting to relay server
2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:64: dialing Relay server via quic
2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:64: dialing Relay server via WS
2025-02-27T12:05:24Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443
2025-02-27T12:05:24Z ERRO relay/client/dialer/quic/quic.go:46: failed to resolve UDP address: lookup udp/443/relay: unknown port
2025-02-27T12:05:24Z ERRO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:77: failed to dial via quic: lookup udp/443/relay: unknown port
2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:89: successfully dialed via: WS
2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/client.go:186: relay connection established
2025-02-27T12:05:24Z INFO relay/client/picker.go:90: connected to Relay server: rels://endpoint.my-domain.com:443/relay
2025-02-27T12:05:24Z INFO relay/client/picker.go:64: chosen home Relay server: rels://endpoint.my-domain.com:443/relay

and so-on and so on until it's connected.

Yes, I am currently relying on a relayed connection for my peers (P2P in progress but not ready yet).
In comparing the two logs, it seems the version starting with 0.37.0 doesn't connect to the management service, the relay connection doesn't trigger, and the grpc connection doesn't trigger.

The format of the admin and management urls is the same in the client in both versions (has the format of the url changed between versions?) so i can't find a reason why the newer version fails to connect.

To Reproduce

Steps to reproduce the behavior:

  1. sudo docker run -e NB_SETUP_KEY=mysetupkey -e NB_MANAGEMENT_URL=https://endpoint.my-domain.com:443 -e NB_ADMIN_URL=https://dashboard-endpoint.my-domain.com:443 -e NB_HOSTNAME=docker-peer -e NB_LOG_LEVEL=debug -e NB_FOREGROUND_MODE=true -e NB_USE_NETSTACK_MODE=true netbirdio/netbird:0.37.0
  2. See error
  3. sudo docker run -e NB_SETUP_KEY=mysetupkey -e NB_MANAGEMENT_URL=https://endpoint.my-domain.com:443 -e NB_ADMIN_URL=https://dashboard-endpoint.my-domain.com:443 -e NB_HOSTNAME=docker-peer -e NB_LOG_LEVEL=debug -e NB_FOREGROUND_MODE=true -e NB_USE_NETSTACK_MODE=true netbirdio/netbird:0.36.7
  4. it works!

Are you using NetBird Cloud?

self-hosted control plane.

NetBird version

Client: 0.37.0 or 0.37.1
Server: 0.36.5 or 0.37.1 (tried both)

Originally created by @IanMoroney on GitHub (Feb 27, 2025). **Describe the problem** Peers when upgraded from 0.36.6 or 0.36.7 to 0.37.0 or 0.37.1 are unable to connect. When I downgrade to 0.36.6 or 0.36.7, they are able to connect again. Debug log from 0.37.0 and 0.37.1 (they are the same) ``` 2025-02-27T11:57:17Z INFO client/internal/config.go:188: generating new config /etc/netbird/config.json 2025-02-27T11:57:17Z INFO client/internal/config.go:254: using default Management URL https://api.netbird.io:443 2025-02-27T11:57:17Z INFO client/internal/config.go:261: new Management URL provided, updated to "https://endpoint.my-domain.com:443" (old value "https://api.netbird.io:443") 2025-02-27T11:57:17Z INFO client/internal/config.go:278: using default Admin URL https://api.netbird.io:443 2025-02-27T11:57:17Z INFO client/internal/config.go:285: new Admin Panel URL provided, updated to "https://dashboard-endpoint.my-domain.com:443" (old value "https://app.netbird.io:443") 2025-02-27T11:57:17Z INFO client/internal/config.go:296: generated new Wireguard key 2025-02-27T11:57:17Z INFO client/internal/config.go:302: generated new SSH key 2025-02-27T11:57:17Z INFO client/internal/config.go:318: using default Wireguard port 51820 2025-02-27T11:57:17Z INFO client/internal/config.go:329: using default Wireguard interface wt0 2025-02-27T11:57:17Z INFO client/internal/config.go:382: filling in interface blacklist with defaults: [ wt0 wt utun tun0 zt ZeroTier wg ts Tailscale tailscale docker veth br- lo ] 2025-02-27T11:57:17Z INFO client/internal/config.go:428: using default DNS route interval 1m0s 2025-02-27T11:57:17Z DEBG client/internal/login.go:94: connecting to the Management service https://endpoint.my-domain.com:443 2025-02-27T11:57:17Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T11:57:19Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T11:57:20Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T11:57:22Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T11:57:27Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T11:57:33Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T11:57:43Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T11:57:47Z INFO util/grpc/dialer.go:89: DialContext error: context deadline exceeded 2025-02-27T11:57:47Z INFO management/client/grpc.go:57: createConnection error: context deadline exceeded 2025-02-27T11:57:47Z ERRO management/client/grpc.go:65: failed creating connection to Management Service: context deadline exceeded 2025-02-27T11:57:47Z ERRO client/internal/login.go:97: failed connecting to the Management service https://endpoint.my-domain.com:443 context deadline exceeded ``` Debug log from 0.36.7: ``` 2025-02-27T12:05:23Z INFO client/internal/config.go:178: generating new config /etc/netbird/config.json 2025-02-27T12:05:23Z INFO client/internal/config.go:244: using default Management URL https://api.netbird.io:443 2025-02-27T12:05:23Z INFO client/internal/config.go:251: new Management URL provided, updated to "https://endpoint.my-domain.com:443" (old value "https://api.netbird.io:443") 2025-02-27T12:05:23Z INFO client/internal/config.go:268: using default Admin URL https://api.netbird.io:443 2025-02-27T12:05:23Z INFO client/internal/config.go:275: new Admin Panel URL provided, updated to "https://dashboard-endpoint.my-domain.com:443" (old value "https://app.netbird.io:443") 2025-02-27T12:05:23Z INFO client/internal/config.go:286: generated new Wireguard key 2025-02-27T12:05:23Z INFO client/internal/config.go:292: generated new SSH key 2025-02-27T12:05:23Z INFO client/internal/config.go:308: using default Wireguard port 51820 2025-02-27T12:05:23Z INFO client/internal/config.go:319: using default Wireguard interface wt0 2025-02-27T12:05:23Z INFO client/internal/config.go:372: filling in interface blacklist with defaults: [ wt0 wt utun tun0 zt ZeroTier wg ts Tailscale tailscale docker veth br- lo ] 2025-02-27T12:05:23Z INFO client/internal/config.go:418: using default DNS route interval 1m0s 2025-02-27T12:05:23Z DEBG client/internal/login.go:94: connecting to the Management service https://endpoint.my-domain.com:443 2025-02-27T12:05:23Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T12:05:23Z DEBG client/internal/login.go:64: connected to the Management service https://endpoint.my-domain.com:443 2025-02-27T12:05:23Z ERRO management/client/grpc.go:350: failed to login to Management Service: rpc error: code = PermissionDenied desc = no peer auth method provided, please use a setup key or interactive SSO login 2025-02-27T12:05:23Z DEBG client/internal/login.go:73: peer registration required 2025-02-27T12:05:23Z DEBG client/internal/login.go:94: connecting to the Management service https://endpoint.my-domain.com:443 2025-02-27T12:05:23Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T12:05:23Z DEBG client/internal/login.go:64: connected to the Management service https://endpoint.my-domain.com:443 2025-02-27T12:05:23Z ERRO management/client/grpc.go:350: failed to login to Management Service: rpc error: code = PermissionDenied desc = no peer auth method provided, please use a setup key or interactive SSO login 2025-02-27T12:05:23Z DEBG client/internal/login.go:73: peer registration required 2025-02-27T12:05:23Z DEBG client/internal/login.go:132: sending peer registration request to Management Service 2025-02-27T12:05:24Z INFO client/internal/login.go:149: peer has been successfully registered on Management Service 2025-02-27T12:05:24Z INFO client/internal/connect.go:111: starting NetBird client version 0.36.7 on linux/amd64 2025-02-27T12:05:24Z INFO util/net/env_linux.go:61: advanced routing has been requested to be disabled 2025-02-27T12:05:24Z DEBG client/internal/connect.go:168: connecting to the Management service endpoint.my-domain.com:443 2025-02-27T12:05:24Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T12:05:24Z DEBG client/internal/connect.go:176: connected to the Management service endpoint.my-domain.com:443 2025-02-27T12:05:24Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T12:05:24Z DEBG signal/client/grpc.go:83: connected to Signal Service: endpoint.my-domain.com:443 2025-02-27T12:05:24Z INFO client/internal/connect.go:245: connecting to the Relay service(s): rels://endpoint.my-domain.com:443/relay 2025-02-27T12:05:24Z DEBG relay/client/manager.go:109: starting relay client manager with [rels://endpoint.my-domain.com:443/relay] relay servers 2025-02-27T12:05:24Z DEBG relay/client/picker.go:45: pick server from list: [rels://endpoint.my-domain.com:443/relay] 2025-02-27T12:05:24Z INFO relay/client/picker.go:72: try to connecting to relay server: rels://endpoint.my-domain.com:443/relay 2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/client.go:164: create new relay connection: local peerID: cs3HWssRaz3hDNluAEgncvDIwt2H2EbkEarGt70taUo=, local peer hashedID: sha-2oE+eU2xI4ryfsf09cCzkZZ06dqldRPmS2sspJMFo/o= 2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/client.go:170: connecting to relay server 2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:64: dialing Relay server via quic 2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:64: dialing Relay server via WS 2025-02-27T12:05:24Z DEBG util/net/dialer_dial.go:52: Dialing tcp endpoint.my-domain.com:443 2025-02-27T12:05:24Z ERRO relay/client/dialer/quic/quic.go:46: failed to resolve UDP address: lookup udp/443/relay: unknown port 2025-02-27T12:05:24Z ERRO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:77: failed to dial via quic: lookup udp/443/relay: unknown port 2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/dialer/race_dialer.go:89: successfully dialed via: WS 2025-02-27T12:05:24Z INFO [relay: rels://endpoint.my-domain.com:443/relay] relay/client/client.go:186: relay connection established 2025-02-27T12:05:24Z INFO relay/client/picker.go:90: connected to Relay server: rels://endpoint.my-domain.com:443/relay 2025-02-27T12:05:24Z INFO relay/client/picker.go:64: chosen home Relay server: rels://endpoint.my-domain.com:443/relay ``` and so-on and so on until it's connected. Yes, I am currently relying on a relayed connection for my peers (P2P in progress but not ready yet). In comparing the two logs, it seems the version starting with 0.37.0 doesn't connect to the management service, the relay connection doesn't trigger, and the grpc connection doesn't trigger. The format of the admin and management urls is the same in the client in both versions (has the format of the url changed between versions?) so i can't find a reason why the newer version fails to connect. **To Reproduce** Steps to reproduce the behavior: 1. sudo docker run -e NB_SETUP_KEY=mysetupkey -e NB_MANAGEMENT_URL=https://endpoint.my-domain.com:443 -e NB_ADMIN_URL=https://dashboard-endpoint.my-domain.com:443 -e NB_HOSTNAME=docker-peer -e NB_LOG_LEVEL=debug -e NB_FOREGROUND_MODE=true -e NB_USE_NETSTACK_MODE=true netbirdio/netbird:0.37.0 2. See error 3. sudo docker run -e NB_SETUP_KEY=mysetupkey -e NB_MANAGEMENT_URL=https://endpoint.my-domain.com:443 -e NB_ADMIN_URL=https://dashboard-endpoint.my-domain.com:443 -e NB_HOSTNAME=docker-peer -e NB_LOG_LEVEL=debug -e NB_FOREGROUND_MODE=true -e NB_USE_NETSTACK_MODE=true netbirdio/netbird:0.36.7 4. it works! **Are you using NetBird Cloud?** self-hosted control plane. **NetBird version** Client: `0.37.0` or `0.37.1` Server: `0.36.5` or `0.37.1` (tried both)
saavagebueno added the triage-needed label 2025-11-20 06:04:15 -05:00
Author
Owner

@IanMoroney commented on GitHub (Feb 27, 2025):

Additionally, it's worth mentioning that SSO based logins also suffer from the same issue, so it's not just Setup Key based auth which isn't working.

@IanMoroney commented on GitHub (Feb 27, 2025): Additionally, it's worth mentioning that SSO based logins also suffer from the same issue, so it's not just Setup Key based auth which isn't working.
Author
Owner

@mlsmaycon commented on GitHub (Feb 27, 2025):

@IanMoroney, can you please run another test using the GRPC debug environment variables below?

GRPC_GO_LOG_VERBOSITY_LEVEL=99
GRPC_GO_LOG_SEVERITY_LEVEL=info
@mlsmaycon commented on GitHub (Feb 27, 2025): @IanMoroney, can you please run another test using the GRPC debug environment variables below? ``` GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info ```
Author
Owner

@IanMoroney commented on GitHub (Feb 27, 2025):

Looks like there is a hidden error returned:

2025/02/27 16:50:53 INFO: [core] Creating new client transport to "{Addr: \"endpoint.my-domain.com:443\", ServerName: \"endpoint.my-domain.com:443\", }": connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434"
2025/02/27 16:50:53 WARNING: [core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "endpoint.my-domain.com:443", ServerName: "endpoint.my-domain.com:443", }. Err: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434"
2025/02/27 16:50:53 INFO: [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434"
2025/02/27 16:50:53 INFO: [pick-first-lb] [pick-first-lb 0xc000ac1dd0] Received SubConn state update: 0xc000aad3b0, {ConnectivityState:TRANSIENT_FAILURE ConnectionError:connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}}
2025/02/27 16:50:53 INFO: [core] [Channel #1]Channel Connectivity change to TRANSIENT_FAILURE
2025/02/27 16:50:54 INFO: [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to IDLE, last error: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434"
2025/02/27 16:50:54 INFO: [pick-first-lb] [pick-first-lb 0xc000ac1dd0] Received SubConn state update: 0xc000aad3b0, {ConnectivityState:IDLE ConnectionError:connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}}
@IanMoroney commented on GitHub (Feb 27, 2025): Looks like there is a hidden error returned: ``` 2025/02/27 16:50:53 INFO: [core] Creating new client transport to "{Addr: \"endpoint.my-domain.com:443\", ServerName: \"endpoint.my-domain.com:443\", }": connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" 2025/02/27 16:50:53 WARNING: [core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "endpoint.my-domain.com:443", ServerName: "endpoint.my-domain.com:443", }. Err: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" 2025/02/27 16:50:53 INFO: [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" 2025/02/27 16:50:53 INFO: [pick-first-lb] [pick-first-lb 0xc000ac1dd0] Received SubConn state update: 0xc000aad3b0, {ConnectivityState:TRANSIENT_FAILURE ConnectionError:connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}} 2025/02/27 16:50:53 INFO: [core] [Channel #1]Channel Connectivity change to TRANSIENT_FAILURE 2025/02/27 16:50:54 INFO: [core] [Channel #1 SubChannel #2]Subchannel Connectivity change to IDLE, last error: connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" 2025/02/27 16:50:54 INFO: [pick-first-lb] [pick-first-lb 0xc000ac1dd0] Received SubConn state update: 0xc000aad3b0, {ConnectivityState:IDLE ConnectionError:connection error: desc = "transport: authentication handshake failed: credentials: cannot check peer: missing selected ALPN property. If you upgraded from a grpc-go version earlier than 1.67, your TLS connections may have stopped working due to ALPN enforcement. For more details, see: https://github.com/grpc/grpc-go/issues/434" connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}} ```
Author
Owner

@IanMoroney commented on GitHub (Feb 27, 2025):

Sounds like it's because of this:
a854660402

@IanMoroney commented on GitHub (Feb 27, 2025): Sounds like it's because of this: https://github.com/netbirdio/netbird/commit/a854660402f68400ee1e0ab618e151c9bc7e67d6
Author
Owner

@kasimeka commented on GitHub (Mar 2, 2025):

it is because of a854660402; following this, i added GRPC_ENFORCE_ALPN_ENABLED="false" to my netbird systemd service env vars and was able to connect with client v0.37.1 after previously failing to do so.

@kasimeka commented on GitHub (Mar 2, 2025): it is because of https://github.com/netbirdio/netbird/commit/a854660402f68400ee1e0ab618e151c9bc7e67d6; following [this](https://github.com/grpc/grpc-go/issues/434#issuecomment-2586217197), i added `GRPC_ENFORCE_ALPN_ENABLED="false"` to my netbird systemd service env vars and was able to connect with client v0.37.1 after previously failing to do so.
Author
Owner

@kasimeka commented on GitHub (Mar 2, 2025):

i can see the management server sets its ALPN protocols to h2 and HTTP/1.1, which means the issue doesn't exist in netbird's code, so i traced it back to the aws network load balancer our netbird setup is behind, NLBs have their ALPN policy set to none by default. I'll change it to h2 preferred and update the issue if this fixes it for me.

@kasimeka commented on GitHub (Mar 2, 2025): i can see the management server sets its ALPN protocols to `h2` and `HTTP/1.1`, which means the issue doesn't exist in netbird's code, so i traced it back to the aws network load balancer our netbird setup is behind, NLBs have their ALPN policy set to none by default. I'll change it to h2 preferred and update the issue if this fixes it for me.
Author
Owner

@kasimeka commented on GitHub (Mar 3, 2025):

can confirm setting the NLB's ALPN policy to h2 preferred fixed the issue for me

@kasimeka commented on GitHub (Mar 3, 2025): can confirm setting the NLB's ALPN policy to h2 preferred fixed the issue for me ✅
Author
Owner

@IanMoroney commented on GitHub (Mar 3, 2025):

In terraform for the NLB, this would be done on the listener config:
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lb_listener

@IanMoroney commented on GitHub (Mar 3, 2025): In terraform for the NLB, this would be done on the listener config: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lb_listener
Author
Owner

@IanMoroney commented on GitHub (Mar 3, 2025):

I can also confirm that setting the NLB's ALPN policy to h2 worked 👍
Thank you @janw4ld for finding that :)

@IanMoroney commented on GitHub (Mar 3, 2025): I can also confirm that setting the NLB's ALPN policy to h2 worked 👍 Thank you @janw4ld for finding that :)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1658