Fresh install not working in modern kubernetes #1810

Closed
opened 2025-11-20 06:07:13 -05:00 by saavagebueno · 7 comments
Owner

Originally created by @jalberto on GitHub (Apr 11, 2025).

Originally assigned to: @nazarewk on GitHub.

Describe the problem

Deployed the operator following the instructions, but I am not able to connect or resolve any service in the cluster.

To Reproduce

  1. install operator
  2. Add a DNS and enable wildcard dns
  3. add a policy to connect to k8s control plane
ingress:
  enabled: true
  router:
    enabled: true
  kubernetesAPI:
    enabled: true
cluster:
  name: k8s-vl-scw
  dns: svc.cluster.local
netbirdAPI:
  keyFromSecret: "netbird-mgmt-api-key"
% netbird networks ls
Available Networks:

  - ID: default-kubernetes-api
    Domains: kubernetes.default.svc.cluster.local
    Status: Not Selected
    Resolved IPs: -
% kubectl exec -n netbird -it router-69d994fc45-dn6wb -- nslookup kubernetes.default.svc.cluster.local
Server:         10.32.0.10
Address:        10.32.0.10:53


Name:   kubernetes.default.svc.cluster.local
Address: 10.32.0.1
  1. not IP resolved
% curl kubernetes.default.svc.cluster.local:443
curl: (6) Could not resolve host: kubernetes.default.svc.cluster.local

Expected behavior

Resolve and connect to k8s services.

Are you using NetBird Cloud?

Yes

NetBird version

0.40.1

Is any other VPN software installed?

yes, Tailscale but it is disabled

Debug output

Peers detail:
 router-69d994fc45-9jlfx.netbird.cloud:
  NetBird IP: 100.120.23.61
  Public key: jf/0/Vd5lrWaemixrlreQsHdbRNX9ey9+PtMhvaKcwI=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): srflx/srflx
  ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.1:51820
  Relay server address: rels://streamline-es-mad1-1.relay.netbird.io:443
  Last connection update: 4 minutes, 14 seconds ago
  Last WireGuard handshake: 1 second ago
  Transfer status (received/sent) 276 B/1008 B
  Quantum resistance: false
  Networks: -
  Latency: 41.916536ms

 router-69d994fc45-4cvvl.netbird.cloud:
  NetBird IP: 100.120.82.252
  Public key: i8ejSuOVRmMH7zhKZewUSRqVdX6Qn9kcc9ySQ+iJJ3U=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: rels://streamline-es-mad1-1.relay.netbird.io:443
  Last connection update: 4 minutes, 17 seconds ago
  Last WireGuard handshake: 6 seconds ago
  Transfer status (received/sent) 276 B/860 B
  Quantum resistance: false
  Networks: -
  Latency: 0s

router-69d994fc45-dn6wb.netbird.cloud:
  NetBird IP: 100.120.156.217
  Public key: 2tQSCNwNCEqYOFbD7oC7+LQfPBdfmVo1eE3TKjhK/Hw=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): srflx/srflx
  ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.2:51820
  Relay server address: rels://streamline-es-mad1-0.relay.netbird.io:443
  Last connection update: 4 minutes, 14 seconds ago
  Last WireGuard handshake: 1 second ago
  Transfer status (received/sent) 276 B/1008 B
  Quantum resistance: false
  Networks: -
  Latency: 42.451535ms

Events:
  [INFO] SYSTEM (ab92baea-4aaf-45f4-9082-9446a8a78e7d)
    Message: Network map updated
    Time: 5 minutes, 49 seconds ago
  [INFO] SYSTEM (04ddf39e-0b88-4275-abd8-335ef61998c9)
    Message: Network map updated
    Time: 4 minutes, 40 seconds ago
  [INFO] SYSTEM (f55a8850-264f-4dbe-9d1d-ee8bccedb6e1)
    Message: Network map updated
    Time: 4 minutes, 18 seconds ago
  [INFO] SYSTEM (147c52b9-f490-4902-aa90-aad1531104ca)
    Message: Network map updated
    Time: 3 minutes, 7 seconds ago
OS: linux/amd64
Daemon version: 0.40.1
CLI version: 0.40.1
Management: Connected to https://api.netbird.io:443
Signal: Connected to https://signal.netbird.io:443
Relays:
  [stun:stun.netbird.io:5555] is Available
  [turns:turn.netbird.io:443?transport=tcp] is Available
  [rels://streamline-es-mad1-0.relay.netbird.io:443] is Available
Nameservers:
  [9.9.9.9:53, 149.112.112.112:53] for [svc.anon-0qS8k.domain] is Available
FQDN: midori.netbird.cloud
NetBird IP: 100.120.15.88/16
Interface type: Kernel
Quantum resistance: false
Networks: -
Forwarding rules: 0
Peers count: 3/3 Connected

As well as the file created by

netbird debug for 1m -AS

Additional context

  • k8s 1.32.3
  • cilium and coreDNS
  • managed by scaleway
  • there are several issues related to k8s that are 1-2 years old and that can be impacting

Have you tried these troubleshooting steps?

  • Checked for newer NetBird versions
  • Searched for similar issues on GitHub (including closed ones)
  • Restarted the NetBird client
  • Disabled other VPN software
  • Checked firewall settings
Originally created by @jalberto on GitHub (Apr 11, 2025). Originally assigned to: @nazarewk on GitHub. **Describe the problem** Deployed the operator following the instructions, but I am not able to connect or resolve any service in the cluster. **To Reproduce** 1. install operator 2. Add a DNS and enable wildcard dns 3. add a policy to connect to k8s control plane ``` ingress: enabled: true router: enabled: true kubernetesAPI: enabled: true cluster: name: k8s-vl-scw dns: svc.cluster.local netbirdAPI: keyFromSecret: "netbird-mgmt-api-key" ``` ``` % netbird networks ls Available Networks: - ID: default-kubernetes-api Domains: kubernetes.default.svc.cluster.local Status: Not Selected Resolved IPs: - ``` ``` % kubectl exec -n netbird -it router-69d994fc45-dn6wb -- nslookup kubernetes.default.svc.cluster.local Server: 10.32.0.10 Address: 10.32.0.10:53 Name: kubernetes.default.svc.cluster.local Address: 10.32.0.1 ``` 4. not IP resolved ``` % curl kubernetes.default.svc.cluster.local:443 curl: (6) Could not resolve host: kubernetes.default.svc.cluster.local ``` **Expected behavior** Resolve and connect to k8s services. **Are you using NetBird Cloud?** Yes **NetBird version** 0.40.1 **Is any other VPN software installed?** yes, Tailscale but it is disabled **Debug output** ``` Peers detail: router-69d994fc45-9jlfx.netbird.cloud: NetBird IP: 100.120.23.61 Public key: jf/0/Vd5lrWaemixrlreQsHdbRNX9ey9+PtMhvaKcwI= Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.1:51820 Relay server address: rels://streamline-es-mad1-1.relay.netbird.io:443 Last connection update: 4 minutes, 14 seconds ago Last WireGuard handshake: 1 second ago Transfer status (received/sent) 276 B/1008 B Quantum resistance: false Networks: - Latency: 41.916536ms router-69d994fc45-4cvvl.netbird.cloud: NetBird IP: 100.120.82.252 Public key: i8ejSuOVRmMH7zhKZewUSRqVdX6Qn9kcc9ySQ+iJJ3U= Status: Connected -- detail -- Connection type: Relayed ICE candidate (Local/Remote): -/- ICE candidate endpoints (Local/Remote): -/- Relay server address: rels://streamline-es-mad1-1.relay.netbird.io:443 Last connection update: 4 minutes, 17 seconds ago Last WireGuard handshake: 6 seconds ago Transfer status (received/sent) 276 B/860 B Quantum resistance: false Networks: - Latency: 0s router-69d994fc45-dn6wb.netbird.cloud: NetBird IP: 100.120.156.217 Public key: 2tQSCNwNCEqYOFbD7oC7+LQfPBdfmVo1eE3TKjhK/Hw= Status: Connected -- detail -- Connection type: P2P ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): 198.51.100.0:51820/198.51.100.2:51820 Relay server address: rels://streamline-es-mad1-0.relay.netbird.io:443 Last connection update: 4 minutes, 14 seconds ago Last WireGuard handshake: 1 second ago Transfer status (received/sent) 276 B/1008 B Quantum resistance: false Networks: - Latency: 42.451535ms Events: [INFO] SYSTEM (ab92baea-4aaf-45f4-9082-9446a8a78e7d) Message: Network map updated Time: 5 minutes, 49 seconds ago [INFO] SYSTEM (04ddf39e-0b88-4275-abd8-335ef61998c9) Message: Network map updated Time: 4 minutes, 40 seconds ago [INFO] SYSTEM (f55a8850-264f-4dbe-9d1d-ee8bccedb6e1) Message: Network map updated Time: 4 minutes, 18 seconds ago [INFO] SYSTEM (147c52b9-f490-4902-aa90-aad1531104ca) Message: Network map updated Time: 3 minutes, 7 seconds ago OS: linux/amd64 Daemon version: 0.40.1 CLI version: 0.40.1 Management: Connected to https://api.netbird.io:443 Signal: Connected to https://signal.netbird.io:443 Relays: [stun:stun.netbird.io:5555] is Available [turns:turn.netbird.io:443?transport=tcp] is Available [rels://streamline-es-mad1-0.relay.netbird.io:443] is Available Nameservers: [9.9.9.9:53, 149.112.112.112:53] for [svc.anon-0qS8k.domain] is Available FQDN: midori.netbird.cloud NetBird IP: 100.120.15.88/16 Interface type: Kernel Quantum resistance: false Networks: - Forwarding rules: 0 Peers count: 3/3 Connected ``` As well as the file created by netbird debug for 1m -AS **Additional context** - k8s 1.32.3 - cilium and coreDNS - managed by scaleway - there are several issues related to k8s that are 1-2 years old and that can be impacting **Have you tried these troubleshooting steps?** - [x] Checked for newer NetBird versions - [x] Searched for similar issues on GitHub (including closed ones) - [x] Restarted the NetBird client - [x] Disabled other VPN software - [x] Checked firewall settings
saavagebueno added the waiting-feedbacktriage-needed labels 2025-11-20 06:07:13 -05:00
Author
Owner

@lixmal commented on GitHub (Apr 14, 2025):

Hi @jalberto,

I believe we have resolved this issue in Slack, the route was not selected:

% netbird networks ls
Available Networks:

  - ID: default-kubernetes-api
    Domains: kubernetes.default.svc.cluster.local
    Status: Not Selected
    Resolved IPs: -

If you can confirm and close the issue please

@lixmal commented on GitHub (Apr 14, 2025): Hi @jalberto, I believe we have resolved this issue in Slack, the route was not selected: ``` % netbird networks ls Available Networks: - ID: default-kubernetes-api Domains: kubernetes.default.svc.cluster.local Status: Not Selected Resolved IPs: - ``` If you can confirm and close the issue please
Author
Owner

@jalberto commented on GitHub (Apr 15, 2025):

hi @lixmal is not really solved, now the network is selected, and it resolves internal names, but it seems not to be routing:

130 % curl http://api.api-stag.svc.cluster.local -v
* Host api.api-stag.svc.cluster.local:80 was resolved.
* IPv6: (none)
* IPv4: 10.34.174.96
*   Trying 10.34.174.96:80...

Even odder:

WARN client/internal/dnsfwd/forwarder.go:160: failed to resolve query for domain=api.api-stag.svc.cluster.local. server=10.32.0.10:53: lookup api.api-stag.svc.cluster.local. on 10.32.0.10:53: no such host

Still the IP is resolving

@jalberto commented on GitHub (Apr 15, 2025): hi @lixmal is not really solved, now the network is selected, and it resolves internal names, but it seems not to be routing: ``` 130 % curl http://api.api-stag.svc.cluster.local -v * Host api.api-stag.svc.cluster.local:80 was resolved. * IPv6: (none) * IPv4: 10.34.174.96 * Trying 10.34.174.96:80... ``` Even odder: ``` WARN client/internal/dnsfwd/forwarder.go:160: failed to resolve query for domain=api.api-stag.svc.cluster.local. server=10.32.0.10:53: lookup api.api-stag.svc.cluster.local. on 10.32.0.10:53: no such host ``` Still the IP is resolving
Author
Owner

@nazarewk commented on GitHub (Apr 15, 2025):

This looks like the client peer is unable to access the nameserver by IP?
How is the domain forwarding configured (Nameservers vs Network Resource?)?

@nazarewk commented on GitHub (Apr 15, 2025): This looks like the client peer is unable to access the nameserver by IP? How is the domain forwarding configured (Nameservers vs Network Resource?)?
Author
Owner

@jalberto commented on GitHub (Apr 15, 2025):

in netbird console you mean?

Image

then I have a network set by netbird operator, and a policy that allows any peer to access k8s peers

Image

@jalberto commented on GitHub (Apr 15, 2025): in netbird console you mean? ![Image](https://github.com/user-attachments/assets/5269969f-ec39-4ece-a95a-894e0875931c) then I have a network set by netbird operator, and a policy that allows any peer to access k8s peers ![Image](https://github.com/user-attachments/assets/736f928d-abe4-4e7c-af1b-13c098bfa073)
Author
Owner

@nazarewk commented on GitHub (Apr 15, 2025):

Can you also show the Network Resource for the domain?

Unless you only have a Nameserver routing svc.cluster.local through Quad9, which I don't think would work in any scenario? Then you will need to set up a Network Resource for svc.cluster.local, then you can remove the Nameserver for clients v0.40.0 or newer.

@nazarewk commented on GitHub (Apr 15, 2025): Can you also show the Network Resource for the domain? Unless you only have a `Nameserver` routing `svc.cluster.local` through Quad9, which I don't think would work in any scenario? Then you will need to set up a Network Resource for `svc.cluster.local`, then you can remove the `Nameserver` for clients v0.40.0 or newer.
Author
Owner

@nazarewk commented on GitHub (May 8, 2025):

@jalberto did you manage to resolve your issue or still need help with it?

@nazarewk commented on GitHub (May 8, 2025): @jalberto did you manage to resolve your issue or still need help with it?
Author
Owner

@mlsmaycon commented on GitHub (Jun 1, 2025):

closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.

@mlsmaycon commented on GitHub (Jun 1, 2025): closing issue due to no recent feedback. Feel free to open a new one if the issue persist or reopen if this was a feature request.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1810