Can't run netbird relay inside of digital ocean DOKS within a VPC. Relay can't connect to the management domain. Curl connects to netbird management domain just fine #1338

Closed
opened 2025-11-20 05:28:41 -05:00 by saavagebueno · 3 comments
Owner

Originally created by @adrian-moisa on GitHub (Oct 15, 2024).

Describe the problem

I'm trying to install a relay in a digital ocean DOKS cluster. The cluster is hosted in a VPC. The goal is to use CoreDNS as a custom DNS inside of the VPN to route traffic to the relevant ports of the DOKS. Problem that I have now is that with current setup I have to point traffic from CoreDNS to a public loadbalancer.

My goal is to eliminate the public IP. I could not find how to setup a private load balancer in DO so therefore I tried the next trick: setting up a netbird relay as ClusterIp. Now maybe this is wrong as well but can't find enough info on the web. My hope was that using a relay as ClusterIp I'm able to maintain all the traffic inside of the VPN and VPC with no public IPs involved.

  • The k8s config I'm using.
  • I checked the secret, I am convinced it's fine.
  • Maybe the entire config is complete junk. I was trying hard to shape it with GPT and claude. They halucinate really hard on this one.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vpn-relay
  labels:
    app: vpn-relay
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vpn-relay
  template:
    metadata:
      labels:
        app: vpn-relay
    spec:
      containers:
      - name: vpn-relay
        image: netbirdio/netbird:latest
        ports:
          - containerPort: 51820
            protocol: UDP
        env:
          - name: NB_MANAGEMENT_URL
            value: "https://my-vpn-addr" # management domain, not relay
          - name: NB_RELAY
            value: "true"
          - name: NB_SETUP_KEY
            valueFrom:
              secretKeyRef:
                name: netbird-peer-key
                key: NETBIRD_KEY
          - name: NB_LOG_LEVEL
            value: "debug"
---
apiVersion: v1
kind: Service
metadata:
  name: vpn-relay
spec:
  selector:
    app: vpn-relay
  ports:
    - protocol: UDP
      port: 51820
      targetPort: 51820
  type: ClusterIP 

To Reproduce

  • Start the relay by applying the k8s yaml
  • Check status kubectl logs -l app=vpn-relay
    It will keep saying:
2024-10-15T09:50:37Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443
2024-10-15T09:50:40Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443
2024-10-15T09:50:45Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443
2024-10-15T09:50:52Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443
2024-10-15T09:51:03Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443
2024-10-15T09:51:05Z INFO util/grpc/dialer.go:75: DialContext error: context deadline exceeded
2024-10-15T09:51:05Z INFO management/client/grpc.go:56: createConnection error: context deadline exceeded
2024-10-15T09:51:05Z ERRO management/client/grpc.go:64: failed creating connection to Management Service: context deadline exceeded
2024-10-15T09:51:05Z ERRO client/internal/login.go:96: failed connecting to the Management service https://my-vpn-addr:443 context deadline exceeded
Error: foreground login failed: backoff cycle failed: context deadline exceeded

my-vpn-addr right now is the netbird management domain not the relay domain. Afaik, I need to use management one.

I tested access to the VPN via curl, Works just fine.
run test-curl --rm -it --image=curlimages/curl -- curl -v https://my-vpn-addr

Expected behavior

Should create a new peer and use the created key.

Are you using NetBird Cloud?

No, I'm using self hosted.

NetBird version

0.29.4

NetBird status -dA output:

Can't share, too sensitive. All I can say it works 100% ok with other peers (droplets with netbird client installed).

Do you face any (non-mobile) client issues?

Not afaik

Screenshots

Too sensitive

Additional context

-The entire cluster is supposed to be completely hidden by the VPN. So far I can hide the DNS. I wasn't able to start a relay. Maybe I'm completely misunderstanding the entire thing.

  • I was able to hide droplets in the VPN by installing the netbird client. I used private IP for DNS records. And I blocked everything with UFW. I could go one step further by placing the droplet in a VPC to hide it's IP, though I think I will end up in the same spot, something is publicly exposed.
  • Is the thing that I'm trying to do even making sense? My concern is that a public load balancer can be the target of DDOS attack and since the DNS is private I can't use Cloudflare to secure it.
  • I saw the Release Notes for v0.29.0 and I read about the new relay. But I have no clue how this could fit in my request. As I said prior, maybe my yaml makes no sense. I created it with gen AI, and could not find any useful samples on the web. So it's potentially complete garbage?
Originally created by @adrian-moisa on GitHub (Oct 15, 2024). **Describe the problem** I'm trying to install a relay in a digital ocean DOKS cluster. The cluster is hosted in a VPC. The goal is to use CoreDNS as a custom DNS inside of the VPN to route traffic to the relevant ports of the DOKS. Problem that I have now is that with current setup I have to point traffic from CoreDNS to a public loadbalancer. My goal is to eliminate the public IP. I could not find how to setup a private load balancer in DO so therefore I tried the next trick: setting up a netbird relay as ClusterIp. Now maybe this is wrong as well but can't find enough info on the web. My hope was that using a relay as ClusterIp I'm able to maintain all the traffic inside of the VPN and VPC with no public IPs involved. - The k8s config I'm using. - I checked the secret, I am convinced it's fine. - Maybe the entire config is complete junk. I was trying hard to shape it with GPT and claude. They halucinate really hard on this one. ``` apiVersion: apps/v1 kind: Deployment metadata: name: vpn-relay labels: app: vpn-relay spec: replicas: 1 selector: matchLabels: app: vpn-relay template: metadata: labels: app: vpn-relay spec: containers: - name: vpn-relay image: netbirdio/netbird:latest ports: - containerPort: 51820 protocol: UDP env: - name: NB_MANAGEMENT_URL value: "https://my-vpn-addr" # management domain, not relay - name: NB_RELAY value: "true" - name: NB_SETUP_KEY valueFrom: secretKeyRef: name: netbird-peer-key key: NETBIRD_KEY - name: NB_LOG_LEVEL value: "debug" --- apiVersion: v1 kind: Service metadata: name: vpn-relay spec: selector: app: vpn-relay ports: - protocol: UDP port: 51820 targetPort: 51820 type: ClusterIP ``` **To Reproduce** - Start the relay by applying the k8s yaml - Check status `kubectl logs -l app=vpn-relay` It will keep saying: ```sh 2024-10-15T09:50:37Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443 2024-10-15T09:50:40Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443 2024-10-15T09:50:45Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443 2024-10-15T09:50:52Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443 2024-10-15T09:51:03Z DEBG util/net/dialer_nonios.go:52: Dialing tcp my-vpn-addr:443 2024-10-15T09:51:05Z INFO util/grpc/dialer.go:75: DialContext error: context deadline exceeded 2024-10-15T09:51:05Z INFO management/client/grpc.go:56: createConnection error: context deadline exceeded 2024-10-15T09:51:05Z ERRO management/client/grpc.go:64: failed creating connection to Management Service: context deadline exceeded 2024-10-15T09:51:05Z ERRO client/internal/login.go:96: failed connecting to the Management service https://my-vpn-addr:443 context deadline exceeded Error: foreground login failed: backoff cycle failed: context deadline exceeded ``` my-vpn-addr right now is the netbird management domain not the relay domain. Afaik, I need to use management one. I tested access to the VPN via curl, Works just fine. `run test-curl --rm -it --image=curlimages/curl -- curl -v https://my-vpn-addr` **Expected behavior** Should create a new peer and use the created key. **Are you using NetBird Cloud?** No, I'm using self hosted. **NetBird version** `0.29.4` **NetBird status -dA output:** Can't share, too sensitive. All I can say it works 100% ok with other peers (droplets with netbird client installed). **Do you face any (non-mobile) client issues?** Not afaik **Screenshots** Too sensitive **Additional context** -The entire cluster is supposed to be completely hidden by the VPN. So far I can hide the DNS. I wasn't able to start a relay. Maybe I'm completely misunderstanding the entire thing. - I was able to hide droplets in the VPN by installing the netbird client. I used private IP for DNS records. And I blocked everything with UFW. I could go one step further by placing the droplet in a VPC to hide it's IP, though I think I will end up in the same spot, something is publicly exposed. - Is the thing that I'm trying to do even making sense? My concern is that a public load balancer can be the target of DDOS attack and since the DNS is private I can't use Cloudflare to secure it. - I saw the Release Notes for v0.29.0 and I read about the new relay. But I have no clue how this could fit in my request. As I said prior, maybe my yaml makes no sense. I created it with gen AI, and could not find any useful samples on the web. So it's potentially complete garbage?
saavagebueno added the triage-needed label 2025-11-20 05:28:41 -05:00
Author
Owner

@adrian-moisa commented on GitHub (Oct 15, 2024):

Found the official tutorial. I adapted my yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sys-netbird
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sys-netbird
  template:
    metadata:
      labels:
        app: sys-netbird
    spec:
      containers:
        - name: sys-netbird
          image: netbirdio/netbird:latest
          env:
            - name: NB_SETUP_KEY
              valueFrom:
                secretKeyRef:
                  name: netbird-peer-key
                  key: NETBIRD_KEY
            - name: NB_MANAGEMENT_URL
              value: "https://my-vpn-addr:33073"
            - name: NB_HOSTNAME
              value: "sys-netbird"
            - name: NB_LOG_LEVEL
              value: "info"
          securityContext:
            capabilities:
              add:
                - NET_ADMIN
                - SYS_RESOURCE
                - SYS_ADMIN

I managed to get the pod connected as a peer. now I'm trying to figure out how to route subdomains to the cluster. I already configured the custom coreDNS to point the desired subdomain to one of the pods I want to publish. So far I can't connect to the demo app pod.

@adrian-moisa commented on GitHub (Oct 15, 2024): Found the official tutorial. I adapted my yaml: ```sh apiVersion: apps/v1 kind: Deployment metadata: name: sys-netbird namespace: default spec: replicas: 1 selector: matchLabels: app: sys-netbird template: metadata: labels: app: sys-netbird spec: containers: - name: sys-netbird image: netbirdio/netbird:latest env: - name: NB_SETUP_KEY valueFrom: secretKeyRef: name: netbird-peer-key key: NETBIRD_KEY - name: NB_MANAGEMENT_URL value: "https://my-vpn-addr:33073" - name: NB_HOSTNAME value: "sys-netbird" - name: NB_LOG_LEVEL value: "info" securityContext: capabilities: add: - NET_ADMIN - SYS_RESOURCE - SYS_ADMIN ``` I managed to get the pod connected as a peer. now I'm trying to figure out how to route subdomains to the cluster. I already configured the custom coreDNS to point the desired subdomain to one of the pods I want to publish. So far I can't connect to the demo app pod.
Author
Owner

@adrian-moisa commented on GitHub (Oct 15, 2024):

Ok, one more step forward. Looks like I'm finally getting access to my pods via the VPN. For now via clusterIp. I still need to fix via subdomain. But at least I'm getting something.

  • Get cluster IP - kubectl get svc argocd-server -n argocd - Managed to get the pod ip for my current subdomain/pod combo I'm attempting to publish. Look likes I needed the cluster IP of the pod. Added a DNS entry in CoreDNS.
  • Routes for Pods and Services CIDR - Looks like the routes are super important. Didn't need them for setting up individual droplets as peers but they are needed for whole DOKS clusters. In Digital ocean found these in the DOKS VPC page. I configured them as routes in netbird with masquerade option enabled. Peer group points to the netbird pod. Distribution to the users interested.
    • Explanation: Without explicit routes, your local machine won't know how to send traffic to the cluster's Pod and Service networks. By adding routes for the Pod and Service CIDR ranges to Netbird, you're instructing your system to send traffic for these networks through the VPN.
  • Access Policies - Nothing fancy here, just allowed certain groups to access the netbird peer.

TBD - Setting up certificates for the domains. This one is easy by now, did it many times, Nothing fancy here. Just worth mentioning that I'm not using ingress to auto generate them. I'm defining them by hand. They need DNS challenge to work.

These links helped me piece everything together.

@adrian-moisa commented on GitHub (Oct 15, 2024): Ok, one more step forward. Looks like I'm finally getting access to my pods via the VPN. For now via clusterIp. I still need to fix via subdomain. But at least I'm getting something. - **Get cluster IP** - `kubectl get svc argocd-server -n argocd` - Managed to get the pod ip for my current subdomain/pod combo I'm attempting to publish. Look likes I needed the cluster IP of the pod. Added a DNS entry in CoreDNS. - **Routes for Pods and Services CIDR** - Looks like the routes are super important. Didn't need them for setting up individual droplets as peers but they are needed for whole DOKS clusters. In Digital ocean found these in the DOKS VPC page. I configured them as routes in netbird with masquerade option enabled. Peer group points to the netbird pod. Distribution to the users interested. - Explanation: Without explicit routes, your local machine won't know how to send traffic to the cluster's Pod and Service networks. By adding routes for the Pod and Service CIDR ranges to Netbird, you're instructing your system to send traffic for these networks through the VPN. - **Access Policies** - Nothing fancy here, just allowed certain groups to access the netbird peer. TBD - Setting up certificates for the domains. This one is easy by now, did it many times, Nothing fancy here. Just worth mentioning that I'm not using ingress to auto generate them. I'm defining them by hand. They need DNS challenge to work. These links helped me piece everything together. - Official - [Deploy routing peers to a Kubernetes cluster](https://docs.netbird.io/how-to/routing-peers-and-kubernetes) - Further reference - [Using NetBird for Kubernetes Access](https://dev.to/braginini/using-netbird-for-kubernetes-access-3fc2)
Author
Owner

@adrian-moisa commented on GitHub (Oct 16, 2024):

Looks like the subdomain was not responding because of local DNS cache. After computer restart I was able to visit it. Obviously, does not have a certificate for now, but I can fix that. So, consider my question resolved.

Just one minor curiosity. In my new setup I used NB_MANAGEMENT_URL. My netbird instance is on a separate droplet. In the tutorials above I did not see any mention of this setting, neither mention of adding it as a peer. I saw management url in the create peer modal in docker command. What can you advise. Is adding a peer needed or not? Or can I just add the routes and that's it?

            - name: NB_MANAGEMENT_URL
              value: "https://my-vpn-addr:33073"
@adrian-moisa commented on GitHub (Oct 16, 2024): Looks like the subdomain was not responding because of local DNS cache. After computer restart I was able to visit it. Obviously, does not have a certificate for now, but I can fix that. So, consider my question resolved. Just one minor curiosity. In my new setup I used `NB_MANAGEMENT_URL`. My netbird instance is on a separate droplet. In the tutorials above I did not see any mention of this setting, neither mention of adding it as a peer. I saw management url in the create peer modal in docker command. What can you advise. Is adding a peer needed or not? Or can I just add the routes and that's it? ``` - name: NB_MANAGEMENT_URL value: "https://my-vpn-addr:33073" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1338