Problems with peer-to-peer connection for name resolutions with self-hosted Netbird/DNS server #1310

Closed
opened 2025-11-20 05:28:00 -05:00 by saavagebueno · 12 comments
Owner

Originally created by @joao-aveiro on GitHub (Oct 6, 2024).

Describe the problem

I have deployed a self-hosted Netbird instance alongside a CoreDNS server which I use for local/private name resolution (both in the same machine, deployed in containers). The DNS server was mapped to an interface in the localhost and access to it was achieved by installing a Netbird client in the host side of the Netbird/CoreDNS machine and creating a policy to allow clients connecting to the DNS server port on that specific peer. A custom nameserver was also configured in the dashboard.

Since the initial deployment (version 0.28.x), this has been working without a problem. However, with a recent upgrade to version 0.30.0 (both the server and the clients), the clients can no longer resolve internal domains. It seems the connection to the DNS port on the server peer is being blocked. Downgrading the client in the host of the CoreDNS server solves this issue.

To Reproduce

My setup was achieved by:

  1. Having the Netbird client installed in the server
  2. Under "Nameservers" configura a new nameserver matching my local/private domain and use the Netbird-provided IP for the Netbird/CoreDNS server
  3. Assign the Netbird/CoreDNS server peer to a group (e.g. vm-grp-dns)
  4. Create a policy allowing connecting user groups (e.g., usr-grp-xxx) to the DNS server port, i.e. usr-grp-1,usr-grp-2 -> vm-grp-dns

Running dig @100.XXX.XXX.XXX -p8053 something.internal.org (where 100.XXX.XXX.XXX is the Netbird-assigned IP of the CoreDNS/Netbird host machine peer) results in:

  • The domain being resolved if the target peer has version <0.30.0
  • Timeout if the target peer has version =0.30.0

Regardless of the client version of the origin peer (i.e., my local machine) I use.

Expected behavior

I expected that my setup would still work.
I've seen that with this release there are changes to how the network routes work, but from what I understood it shouldn't make a difference in this case, since I am not using Network Route for this purpose and instead I'm relying on direct peer connection. I can create a Network Route to solve this, but previously that wasn't needed and I don't see a point in having to create a route for a single peer which is already in the Netbird network.

I am not sure if this is a bug or intended behaviour, but in case it is the latter, I believe the details provided in the release and in the documentation regarding the changes in network routes seem insufficient.

Are you using NetBird Cloud?
No, a self-hosted instance.

NetBird version
Server: tried with 0.30.0 and 0.29.4
Clients: tried with 0.30.0 and 0.29.4 on both the origin and the target peers

Originally created by @joao-aveiro on GitHub (Oct 6, 2024). **Describe the problem** I have deployed a self-hosted Netbird instance alongside a CoreDNS server which I use for local/private name resolution (both in the same machine, deployed in containers). The DNS server was mapped to an interface in the localhost and access to it was achieved by installing a Netbird client in the host side of the Netbird/CoreDNS machine and creating a policy to allow clients connecting to the DNS server port on that specific peer. A custom nameserver was also configured in the dashboard. Since the initial deployment (version 0.28.x), this has been working without a problem. However, with a recent upgrade to version 0.30.0 (both the server and the clients), the clients can no longer resolve internal domains. It seems the connection to the DNS port on the server peer is being blocked. Downgrading the client in the host of the CoreDNS server solves this issue. **To Reproduce** My setup was achieved by: 1. Having the Netbird client installed in the server 2. Under "Nameservers" configura a new nameserver matching my local/private domain and use the Netbird-provided IP for the Netbird/CoreDNS server 3. Assign the Netbird/CoreDNS server peer to a group (e.g. `vm-grp-dns`) 4. Create a policy allowing connecting user groups (e.g., `usr-grp-xxx`) to the DNS server port, i.e. `usr-grp-1,usr-grp-2 -> vm-grp-dns` Running `dig @100.XXX.XXX.XXX -p8053 something.internal.org` (where `100.XXX.XXX.XXX` is the Netbird-assigned IP of the CoreDNS/Netbird host machine peer) results in: - The domain being resolved if the target peer has version `<0.30.0` - Timeout if the target peer has version `=0.30.0` Regardless of the client version of the origin peer (i.e., my local machine) I use. **Expected behavior** I expected that my setup would still work. I've seen that with this release there are changes to how the network routes work, but from what I understood it shouldn't make a difference in this case, since I am not using Network Route for this purpose and instead I'm relying on direct peer connection. I can create a Network Route to solve this, but previously that wasn't needed and I don't see a point in having to create a route for a single peer which is already in the Netbird network. I am not sure if this is a bug or intended behaviour, but in case it is the latter, I believe the details provided in the release and in the documentation regarding the changes in network routes seem insufficient. **Are you using NetBird Cloud?** No, a self-hosted instance. **NetBird version** Server: tried with 0.30.0 and 0.29.4 Clients: tried with 0.30.0 and 0.29.4 on both the origin and the target peers
saavagebueno added the clienttriage-neededself-hosting labels 2025-11-20 05:28:00 -05:00
Author
Owner

@Marcus1Pierce commented on GitHub (Oct 6, 2024):

Same here. After I downgraded to version 0.29.4, peers were able to connect to the DNS server via the Netbird network.

@Marcus1Pierce commented on GitHub (Oct 6, 2024): Same here. After I downgraded to version 0.29.4, peers were able to connect to the DNS server via the Netbird network.
Author
Owner

@joao-aveiro commented on GitHub (Oct 7, 2024):

My temporary workaround was creating a custom route with the target peer IP address, such as 100.xxx.xxx.xxx/0, and setting the allowed peer group as the one I already had configured for the policy client-peers -> dns VM.

But, once again, I don't think this should be required. More so, even with this workaround, I still can't connect to some internal web apps using the iOS client (probably not related - solely - to this issue).

To all the devs, I appreciate the great work you have put into Netbird, but for me this lack of consistency, stability and reliable testing in new releases hints that it is not ready for enterprise production environments.

@joao-aveiro commented on GitHub (Oct 7, 2024): My temporary workaround was creating a custom route with the target peer IP address, such as 100.xxx.xxx.xxx/0, and setting the allowed peer group as the one I already had configured for the policy `client-peers -> dns VM`. But, once again, I don't think this should be required. More so, even with this workaround, I still can't connect to some internal web apps using the iOS client (probably not related - solely - to this issue). To all the devs, I appreciate the great work you have put into Netbird, but for me this lack of consistency, stability and reliable testing in new releases hints that it is not ready for enterprise production environments.
Author
Owner

@mgarces commented on GitHub (Oct 9, 2024):

We regret knowing this is causing you pain and we are improving our release lifecycle to avoid future issues.

We are working on a fix, your current workaround is valid.

@mgarces commented on GitHub (Oct 9, 2024): We regret knowing this is causing you pain and we are improving our release lifecycle to avoid future issues. We are working on a fix, your current workaround is valid.
Author
Owner

@flotpg commented on GitHub (Oct 10, 2024):

Same here.
It's seems like a general issue where peers can't access docker containers if the docker host has Netbird agent 0.30.0 installed. It's not only DNS UDP 53, basically every exposed by the containers.

I reported this as well here: https://netbirdio.slack.com/archives/C05T5K65X7U/p1728084027960169

You can mitigate by downgrading the agent to 0.29.4

sudo apt install netbird=0.29.4
sudo apt-mark hold netbird=0.29.4

CleanShot 2024-10-10 at 15 56 29@2x

@mlsmaycon just announced that they are already working on a fix ❤️
https://netbirdio.slack.com/archives/C02KHAE8VLZ/p1728568216276659?thread_ts=1728129199.226249&cid=C02KHAE8VLZ

@flotpg commented on GitHub (Oct 10, 2024): Same here. It's seems like a general issue where peers can't access docker containers if the docker host has Netbird agent 0.30.0 installed. It's not only DNS UDP 53, basically every exposed by the containers. I reported this as well here: https://netbirdio.slack.com/archives/C05T5K65X7U/p1728084027960169 You can mitigate by downgrading the agent to 0.29.4 ``` sudo apt install netbird=0.29.4 sudo apt-mark hold netbird=0.29.4 ``` ![CleanShot 2024-10-10 at 15 56 29@2x](https://github.com/user-attachments/assets/ca703c30-4107-49cb-8600-f11e2a39fa2b) @mlsmaycon just announced that they are already working on a fix ❤️ https://netbirdio.slack.com/archives/C02KHAE8VLZ/p1728568216276659?thread_ts=1728129199.226249&cid=C02KHAE8VLZ
Author
Owner

@mgarces commented on GitHub (Oct 10, 2024):

hi there; can you please update to and try our latest release v0.30.1 ?

@mgarces commented on GitHub (Oct 10, 2024): hi there; can you please update to and try our latest release `v0.30.1` ?
Author
Owner

@joao-aveiro commented on GitHub (Oct 10, 2024):

hi there; can you please update to and try our latest release v0.30.1 ?

Hey there. I am out of office but my colleagues were able to perform the update. It seems it is working now! Though it seems that both peers (and the server) need to be updated for this to work. Next week, once I'm back in office, I'll test this further and let you know if there's anything else wrong. I'll mark this issue as solved once I do that.

Thank you for your good work!

@joao-aveiro commented on GitHub (Oct 10, 2024): > hi there; can you please update to and try our latest release `v0.30.1` ? Hey there. I am out of office but my colleagues were able to perform the update. It seems it is working now! Though it seems that both peers (and the server) need to be updated for this to work. Next week, once I'm back in office, I'll test this further and let you know if there's anything else wrong. I'll mark this issue as solved once I do that. Thank you for your good work!
Author
Owner

@Marcus1Pierce commented on GitHub (Oct 10, 2024):

@mgarces After update to 0.30.1 i can reach my dns server via netbird network again and from netbird status -d all my dns server became available. Not just the DNS server, but also the services in the Docker container that expose ports via Netbird Network I can access again. Haven’t found any other issues at the moment.

OS: windows/amd64
Daemon version: 0.30.1
CLI version: 0.30.1
Management: Connected to https://domain1.tld:443
Signal: Connected to https://domain1.tld:443
Relays:
  [stun:domain1.tld:3478] is Available
  [turn:domain1.tld:3478?transport=udp] is Available
  [rels://domain1.tld:443] is Available
Nameservers:
  [100.111.19.240:53] for [domain2.tld] is Available
  [100.111.134.235:53] for [domain1.tld] is Available
FQDN: it-pc.netbird
NetBird IP: 100.111.52.41/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 14/42 Connected
@Marcus1Pierce commented on GitHub (Oct 10, 2024): @mgarces After update to 0.30.1 i can reach my dns server via netbird network again and from netbird status -d all my dns server became available. Not just the DNS server, but also the services in the Docker container that expose ports via Netbird Network I can access again. Haven’t found any other issues at the moment. ``` OS: windows/amd64 Daemon version: 0.30.1 CLI version: 0.30.1 Management: Connected to https://domain1.tld:443 Signal: Connected to https://domain1.tld:443 Relays: [stun:domain1.tld:3478] is Available [turn:domain1.tld:3478?transport=udp] is Available [rels://domain1.tld:443] is Available Nameservers: [100.111.19.240:53] for [domain2.tld] is Available [100.111.134.235:53] for [domain1.tld] is Available FQDN: it-pc.netbird NetBird IP: 100.111.52.41/16 Interface type: Userspace Quantum resistance: false Routes: - Peers count: 14/42 Connected ```
Author
Owner

@florian-obradovic commented on GitHub (Oct 12, 2024):

Fixed it for me as well.
Thanks a lot for the fast fix!

@florian-obradovic commented on GitHub (Oct 12, 2024): Fixed it for me as well. Thanks a lot for the fast fix!
Author
Owner

@joao-aveiro commented on GitHub (Oct 12, 2024):

Just a quick update, my team has been experiencing a lot of connectivity and DNS problems since then. I don't know if it is related to this, but it seems that there are still some problems that need to be addressed. Ill provide more info once I'm able to test it myself.

@joao-aveiro commented on GitHub (Oct 12, 2024): Just a quick update, my team has been experiencing a lot of connectivity and DNS problems since then. I don't know if it is related to this, but it seems that there are still some problems that need to be addressed. Ill provide more info once I'm able to test it myself.
Author
Owner

@florian-obradovic commented on GitHub (Oct 12, 2024):

@joao-aveiro: can you specify "since then" :)
are your machine AD / Active Directory domain joined? (https://github.com/netbirdio/netbird/issues/987)

@florian-obradovic commented on GitHub (Oct 12, 2024): @joao-aveiro: can you specify "since then" :) are your machine AD / Active Directory domain joined? (https://github.com/netbirdio/netbird/issues/987)
Author
Owner

@joao-aveiro commented on GitHub (Oct 12, 2024):

@joao-aveiro: can you specify "since then" :)

are your machine AD / Active Directory domain joined? (https://github.com/netbirdio/netbird/issues/987)

Since the update to version 0.30.1.
No AD in this case. All clients are either MacOS or Linux.

@joao-aveiro commented on GitHub (Oct 12, 2024): > @joao-aveiro: can you specify "since then" :) > > are your machine AD / Active Directory domain joined? (https://github.com/netbirdio/netbird/issues/987) Since the update to version 0.30.1. No AD in this case. All clients are either MacOS or Linux.
Author
Owner

@joao-aveiro commented on GitHub (Oct 15, 2024):

An update on accessing the private DNS server (port 8053) in the newest version (v0.30.1), after removing the route to the DNS server:

  • I can use the namerserver. My laptop is in a policy which allows all traffic between my device and the nameserver (not port restricted)
  • Other team members from the technical team which have similar rules also can also resolve names
  • Non-technical team members, with policies only allowing port 8053 TCP/UDP, cannot resolve names/access the nameserver
  • Extending the policy of non-technical team members to allow all connections (non-port restricted) seems to solve the issue.

So it seems there is still a problem on using port 8053 when the policy is more restrictive (defining protocol/port).
@mgarces What do you think about this? Am I doing something wrong? Should I open an issue?

@joao-aveiro commented on GitHub (Oct 15, 2024): An update on accessing the private DNS server (port 8053) in the newest version (v0.30.1), after removing the route to the DNS server: - I can use the namerserver. My laptop is in a policy which allows all traffic between my device and the nameserver (not port restricted) - Other team members from the technical team which have similar rules also can also resolve names - Non-technical team members, with policies only allowing port 8053 TCP/UDP, **cannot** resolve names/access the nameserver - Extending the policy of non-technical team members to allow all connections (non-port restricted) seems to solve the issue. So it seems there is still a problem on using port 8053 when the policy is more restrictive (defining protocol/port). @mgarces What do you think about this? Am I doing something wrong? Should I open an issue?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1310