Client 0.30.0 Exit Node not working as 0.29.4 #1318

Open
opened 2025-11-20 05:28:25 -05:00 by saavagebueno · 12 comments
Owner

Originally created by @rkleivel on GitHub (Oct 8, 2024).

Problem:
After upgrading clients to 0.30.0, nodes in a exit node distribution group looses internet connection if exit node is restarted

To Reproduce

  1. Create 2 groups is_exit_node and uses_exit_node
  2. Add a node in each group (preferably behind different public IPs for easier testing)
  3. Create a policy that allows the groups to communicate (unless the Default policy All <-> All is active)
    image
  4. Add an exit node under network routes that makes the node in is_exit_node the Exit Node of the node in uses_exit_node
    image
  5. Run curl ipinfo.io on each node and verify that the public IPs are identical
  6. Run netbird down && netbird up on the exit node
  7. Wait a minute to allow settings to be updated
  8. Run curl ipinfo.io on each node

Expected behavior
Each node should still appear to be behind the same public IP.

  • With 0.30.0 I mostly experience that the node in uses_exit_node does not regain internet access.
  • Downgrading both nodes to 0.29.4 the setup behaves as expected.

Are you using NetBird Cloud?
Yes

NetBird version
0.30.0 (failing) and 0.29.4 (working)

Additional info
Both nodes are running Ubuntu 24.04 server

As this has been fairly easy to reproduce, I do not attach any logs at this stage. Please let me know if they will be necessary, and I'll happily provide :)

Originally created by @rkleivel on GitHub (Oct 8, 2024). **Problem:** After upgrading clients to 0.30.0, nodes in a exit node distribution group looses internet connection if exit node is restarted **To Reproduce** 1) Create 2 groups `is_exit_node` and `uses_exit_node` 2) Add a node in each group (preferably behind different public IPs for easier testing) 3) Create a policy that allows the groups to communicate (unless the Default policy `All <-> All` is active) ![image](https://github.com/user-attachments/assets/d288d6cd-3d86-4e23-a7fa-67b3d86dbfbc) 4) Add an exit node under network routes that makes the node in `is_exit_node` the Exit Node of the node in `uses_exit_node` ![image](https://github.com/user-attachments/assets/53a5c4b1-54c2-4083-91d5-a05e0510112e) 5) Run `curl ipinfo.io` on each node and verify that the public IPs are identical 6) Run `netbird down && netbird up` on the exit node 7) Wait a minute to allow settings to be updated 8) Run `curl ipinfo.io` on each node **Expected behavior** Each node should still appear to be behind the same public IP. * With 0.30.0 I mostly experience that the node in `uses_exit_node` does not regain internet access. * Downgrading both nodes to 0.29.4 the setup behaves as expected. **Are you using NetBird Cloud?** Yes **NetBird version** 0.30.0 (failing) and 0.29.4 (working) **Additional info** Both nodes are running Ubuntu 24.04 server As this has been fairly easy to reproduce, I do not attach any logs at this stage. Please let me know if they will be necessary, and I'll happily provide :)
saavagebueno added the triage-needed label 2025-11-20 05:28:25 -05:00
Author
Owner

@mlsmaycon commented on GitHub (Oct 9, 2024):

Hello @rkleivel can you please share the output from nft list ruleset from the exit node?

@mlsmaycon commented on GitHub (Oct 9, 2024): Hello @rkleivel can you please share the output from `nft list ruleset` from the exit node?
Author
Owner

@rkleivel commented on GitHub (Oct 9, 2024):

Thanks @mlsmaycon!
Here is the ruleset after netbird down / up on the exit node:

table ip filter {
	chain INPUT {
		type filter hook input priority filter; policy accept;
	}

	chain OUTPUT {
		type filter hook output priority filter; policy accept;
	}

	chain FORWARD {
		type filter hook forward priority filter; policy accept;
		oifname "wt0" ct state established,related counter packets 0 bytes 0 accept
		iifname "wt0" counter packets 0 bytes 0 accept
	}
}
table ip nat {
	chain POSTROUTING {
		type nat hook postrouting priority srcnat; policy accept;
	}
}
table ip netbird {
	set nb0000001 {
		type ipv4_addr
		flags dynamic
		elements = { 100.93.17.98 }
	}

	set nb0000002 {
		type ipv4_addr
		flags dynamic
		elements = { 100.93.17.98 }
	}

	chain netbird-rt-fwd {
		ct state established,related accept
		counter packets 0 bytes 0 accept
	}

	chain netbird-rt-nat {
		type nat hook postrouting priority srcnat - 1; policy accept;
		iifname "wt0" counter packets 1 bytes 176 masquerade
		oifname "wt0" counter packets 0 bytes 0 masquerade
	}

	chain netbird-acl-input-rules {
		ct state established,related accept
		ip saddr @nb0000001 accept
	}

	chain netbird-acl-output-rules {
		ct state established,related accept
		ip daddr @nb0000002 accept
	}

	chain netbird-acl-input-filter {
		type filter hook input priority filter; policy accept;
		iifname "wt0" jump netbird-acl-input-rules
		iifname "wt0" drop
	}

	chain netbird-acl-output-filter {
		type filter hook output priority filter; policy accept;
		oifname "wt0" ip daddr != 100.93.0.0/16 accept
		oifname "wt0" jump netbird-acl-output-rules
		oifname "wt0" drop
	}

	chain netbird-acl-forward-filter {
		type filter hook forward priority filter; policy accept;
		iifname "wt0" jump netbird-rt-fwd
		iifname "wt0" drop
	}
}

The diff from when it was working looks like this does not seem significant:

diff exit_node_working.txt exit_node_not_working.txt 
12,13c12,13
< 		oifname "wt0" ct state established,related counter packets 1048 bytes 2078521 accept
< 		iifname "wt0" counter packets 909 bytes 54393 accept
---
> 		oifname "wt0" ct state established,related counter packets 0 bytes 0 accept
> 		iifname "wt0" counter packets 0 bytes 0 accept
36c36
< 		counter packets 25 bytes 1580 accept
---
> 		counter packets 0 bytes 0 accept
41c41
< 		iifname "wt0" counter packets 18 bytes 1160 masquerade
---
> 		iifname "wt0" counter packets 1 bytes 176 masquerade
@rkleivel commented on GitHub (Oct 9, 2024): Thanks @mlsmaycon! Here is the ruleset after netbird down / up on the exit node: ``` table ip filter { chain INPUT { type filter hook input priority filter; policy accept; } chain OUTPUT { type filter hook output priority filter; policy accept; } chain FORWARD { type filter hook forward priority filter; policy accept; oifname "wt0" ct state established,related counter packets 0 bytes 0 accept iifname "wt0" counter packets 0 bytes 0 accept } } table ip nat { chain POSTROUTING { type nat hook postrouting priority srcnat; policy accept; } } table ip netbird { set nb0000001 { type ipv4_addr flags dynamic elements = { 100.93.17.98 } } set nb0000002 { type ipv4_addr flags dynamic elements = { 100.93.17.98 } } chain netbird-rt-fwd { ct state established,related accept counter packets 0 bytes 0 accept } chain netbird-rt-nat { type nat hook postrouting priority srcnat - 1; policy accept; iifname "wt0" counter packets 1 bytes 176 masquerade oifname "wt0" counter packets 0 bytes 0 masquerade } chain netbird-acl-input-rules { ct state established,related accept ip saddr @nb0000001 accept } chain netbird-acl-output-rules { ct state established,related accept ip daddr @nb0000002 accept } chain netbird-acl-input-filter { type filter hook input priority filter; policy accept; iifname "wt0" jump netbird-acl-input-rules iifname "wt0" drop } chain netbird-acl-output-filter { type filter hook output priority filter; policy accept; oifname "wt0" ip daddr != 100.93.0.0/16 accept oifname "wt0" jump netbird-acl-output-rules oifname "wt0" drop } chain netbird-acl-forward-filter { type filter hook forward priority filter; policy accept; iifname "wt0" jump netbird-rt-fwd iifname "wt0" drop } } ``` The diff from when it was working looks like this does not seem significant: ``` diff exit_node_working.txt exit_node_not_working.txt 12,13c12,13 < oifname "wt0" ct state established,related counter packets 1048 bytes 2078521 accept < iifname "wt0" counter packets 909 bytes 54393 accept --- > oifname "wt0" ct state established,related counter packets 0 bytes 0 accept > iifname "wt0" counter packets 0 bytes 0 accept 36c36 < counter packets 25 bytes 1580 accept --- > counter packets 0 bytes 0 accept 41c41 < iifname "wt0" counter packets 18 bytes 1160 masquerade --- > iifname "wt0" counter packets 1 bytes 176 masquerade ```
Author
Owner

@mgarces commented on GitHub (Oct 10, 2024):

hi there; can you try our latest release v0.30.1 please?

@mgarces commented on GitHub (Oct 10, 2024): hi there; can you try our latest release `v0.30.1` please?
Author
Owner

@rkleivel commented on GitHub (Oct 11, 2024):

hi there; can you try our latest release v0.30.1 please?

Sure! Unfortunately I cannot see any improvement since 0.30.0

@rkleivel commented on GitHub (Oct 11, 2024): > hi there; can you try our latest release `v0.30.1` please? Sure! Unfortunately I cannot see any improvement since 0.30.0
Author
Owner

@mlsmaycon commented on GitHub (Oct 11, 2024):

Hello @rkleivel can you please run the following commands?

On exit node:

sysctl net.ipv4.ip_forward
sudo tcpdump -i any -nn host 1.1.1.1 and port 443 # keep this running while testing on client

On client:

ip route get 1.1.1.1
nc -vw 5 -z 1.1.1.1 443

Then, share the output with us.

@mlsmaycon commented on GitHub (Oct 11, 2024): Hello @rkleivel can you please run the following commands? On exit node: ```shell sysctl net.ipv4.ip_forward sudo tcpdump -i any -nn host 1.1.1.1 and port 443 # keep this running while testing on client ``` On client: ```shell ip route get 1.1.1.1 nc -vw 5 -z 1.1.1.1 443 ``` Then, share the output with us.
Author
Owner

@rkleivel commented on GitHub (Oct 14, 2024):

Hi @mlsmaycon,

As opposed to my comment Oct 11 at 10.11 GMT, I am currently not able to reproduce the issue. Below I will provide the output from your commands.

However, High Availability with 2 exit nodes still does not seem to work. I did not mention that earlier because I did not get time to test it thoroughly, but noticed it last week while debugging the issue of this thread, and have a feeling it might be related. I will provide similar outputs in another comment.

Here the logging for only one exit node that goes down and up, showing that the it now works as expected. (I did add some timestamps to make it easier to relate the two). Both nodes on 0.30.1:

Exit node:

admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443
Mon Oct 14 07:46:36 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:46:48.250437 wt0   In  IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [S], seq 1080105978, win 64480, options [mss 1240,sackOK,TS val 4127900581 ecr 0,nop,wscale 7], length 0
07:46:48.250465 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [S], seq 1080105978, win 64480, options [mss 1240,sackOK,TS val 4127900581 ecr 0,nop,wscale 7], length 0
07:46:48.259774 ens18 In  IP 1.1.1.1.443 > 192.168.7.26.32836: Flags [S.], seq 3227839981, ack 1080105979, win 65535, options [mss 1460,sackOK,TS val 2170826905 ecr 4127900581,nop,wscale 13], length 0
07:46:48.259793 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.32836: Flags [S.], seq 3227839981, ack 1080105979, win 65535, options [mss 1460,sackOK,TS val 2170826905 ecr 4127900581,nop,wscale 13], length 0
07:46:48.321426 wt0   In  IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.321442 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.322012 wt0   In  IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.322025 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0
07:46:48.331567 ens18 In  IP 1.1.1.1.443 > 192.168.7.26.32836: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 2170826977 ecr 4127900651], length 0
07:46:48.331598 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.32836: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 2170826977 ecr 4127900651], length 0
07:46:48.377526 wt0   In  IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127900708 ecr 2170826977], length 0
07:46:48.377546 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127900708 ecr 2170826977], length 0
^C
12 packets captured
14 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ netbird down && netbird up
Disconnected
Connected
admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443
Mon Oct 14 07:47:47 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:47:51.625745 wt0   In  IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [S], seq 2392515285, win 64480, options [mss 1240,sackOK,TS val 4127963956 ecr 0,nop,wscale 7], length 0
07:47:51.625771 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [S], seq 2392515285, win 64480, options [mss 1240,sackOK,TS val 4127963956 ecr 0,nop,wscale 7], length 0
07:47:51.635630 ens18 In  IP 1.1.1.1.443 > 192.168.7.26.56110: Flags [S.], seq 1033369258, ack 2392515286, win 65535, options [mss 1460,sackOK,TS val 3968834669 ecr 4127963956,nop,wscale 13], length 0
07:47:51.635670 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.56110: Flags [S.], seq 1033369258, ack 2392515286, win 65535, options [mss 1460,sackOK,TS val 3968834669 ecr 4127963956,nop,wscale 13], length 0
07:47:51.680578 wt0   In  IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.680599 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.680832 wt0   In  IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.680852 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0
07:47:51.691850 ens18 In  IP 1.1.1.1.443 > 192.168.7.26.56110: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 3968834725 ecr 4127964011], length 0
07:47:51.691877 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.56110: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 3968834725 ecr 4127964011], length 0
07:47:51.738217 wt0   In  IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127964069 ecr 3968834725], length 0
07:47:51.738254 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127964069 ecr 3968834725], length 0
^C
12 packets captured
13 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1
admin@exit1:~$

Client:

admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 07:46:38 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 
    cache 
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:46:48 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:07 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 07:47:13 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 
    cache 
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:16 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:22 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:29 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:35 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:42 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:47:51 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$ 
@rkleivel commented on GitHub (Oct 14, 2024): Hi @mlsmaycon, As opposed to my comment Oct 11 at 10.11 GMT, I am currently not able to reproduce the issue. Below I will provide the output from your commands. **However**, High Availability with 2 exit nodes still does not seem to work. I did not mention that earlier because I did not get time to test it thoroughly, but noticed it last week while debugging the issue of this thread, and have a feeling it might be related. I will provide similar outputs in another comment. Here the logging for only one exit node that goes down and up, showing that the it now works as expected. (I did add some timestamps to make it easier to relate the two). Both nodes on 0.30.1: Exit node: ``` admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 Mon Oct 14 07:46:36 UTC 2024 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 07:46:48.250437 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [S], seq 1080105978, win 64480, options [mss 1240,sackOK,TS val 4127900581 ecr 0,nop,wscale 7], length 0 07:46:48.250465 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [S], seq 1080105978, win 64480, options [mss 1240,sackOK,TS val 4127900581 ecr 0,nop,wscale 7], length 0 07:46:48.259774 ens18 In IP 1.1.1.1.443 > 192.168.7.26.32836: Flags [S.], seq 3227839981, ack 1080105979, win 65535, options [mss 1460,sackOK,TS val 2170826905 ecr 4127900581,nop,wscale 13], length 0 07:46:48.259793 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.32836: Flags [S.], seq 3227839981, ack 1080105979, win 65535, options [mss 1460,sackOK,TS val 2170826905 ecr 4127900581,nop,wscale 13], length 0 07:46:48.321426 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0 07:46:48.321442 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0 07:46:48.322012 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0 07:46:48.322025 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127900651 ecr 2170826905], length 0 07:46:48.331567 ens18 In IP 1.1.1.1.443 > 192.168.7.26.32836: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 2170826977 ecr 4127900651], length 0 07:46:48.331598 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.32836: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 2170826977 ecr 4127900651], length 0 07:46:48.377526 wt0 In IP 100.93.158.105.32836 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127900708 ecr 2170826977], length 0 07:46:48.377546 ens18 Out IP 192.168.7.26.32836 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127900708 ecr 2170826977], length 0 ^C 12 packets captured 14 packets received by filter 0 packets dropped by kernel admin@exit1:~$ netbird down && netbird up Disconnected Connected admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 Mon Oct 14 07:47:47 UTC 2024 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 07:47:51.625745 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [S], seq 2392515285, win 64480, options [mss 1240,sackOK,TS val 4127963956 ecr 0,nop,wscale 7], length 0 07:47:51.625771 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [S], seq 2392515285, win 64480, options [mss 1240,sackOK,TS val 4127963956 ecr 0,nop,wscale 7], length 0 07:47:51.635630 ens18 In IP 1.1.1.1.443 > 192.168.7.26.56110: Flags [S.], seq 1033369258, ack 2392515286, win 65535, options [mss 1460,sackOK,TS val 3968834669 ecr 4127963956,nop,wscale 13], length 0 07:47:51.635670 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.56110: Flags [S.], seq 1033369258, ack 2392515286, win 65535, options [mss 1460,sackOK,TS val 3968834669 ecr 4127963956,nop,wscale 13], length 0 07:47:51.680578 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0 07:47:51.680599 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0 07:47:51.680832 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0 07:47:51.680852 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4127964011 ecr 3968834669], length 0 07:47:51.691850 ens18 In IP 1.1.1.1.443 > 192.168.7.26.56110: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 3968834725 ecr 4127964011], length 0 07:47:51.691877 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.56110: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 3968834725 ecr 4127964011], length 0 07:47:51.738217 wt0 In IP 100.93.158.105.56110 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127964069 ecr 3968834725], length 0 07:47:51.738254 ens18 Out IP 192.168.7.26.56110 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4127964069 ecr 3968834725], length 0 ^C 12 packets captured 13 packets received by filter 0 packets dropped by kernel admin@exit1:~$ sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 1 admin@exit1:~$ ``` Client: ``` admin@client:~$ date && ip route get 1.1.1.1 Mon Oct 14 07:46:38 UTC 2024 1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 cache admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:46:48 UTC 2024 Connection to 1.1.1.1 443 port [tcp/https] succeeded! admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:47:07 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && ip route get 1.1.1.1 Mon Oct 14 07:47:13 UTC 2024 1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 cache admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:47:16 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:47:22 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:47:29 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:47:35 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:47:42 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:47:51 UTC 2024 Connection to 1.1.1.1 443 port [tcp/https] succeeded! admin@client:~$ ```
Author
Owner

@mlsmaycon commented on GitHub (Oct 14, 2024):

Thanks, @rkleivel, for sharing the outputs.

Ok, to confirm what we see in with the timestamps, the failure was after you restarted the connection and is probably related to the time it took for the peers to connect. Right?

With the previous release, we fixed an issue with forwarding rules caused by the number of peers in an access control rule, which shouldn't affect nodes with exit nodes and no access control groups set in any of the routing peer routes. So it may not have affected you unless you had an access control group for a network route.

We will wait for your check with HA as well.

@mlsmaycon commented on GitHub (Oct 14, 2024): Thanks, @rkleivel, for sharing the outputs. Ok, to confirm what we see in with the timestamps, the failure was after you restarted the connection and is probably related to the time it took for the peers to connect. Right? With the previous release, we fixed an issue with forwarding rules caused by the number of peers in an access control rule, which shouldn't affect nodes with exit nodes and no access control groups set in any of the routing peer routes. So it may not have affected you unless you had an access control group for a network route. We will wait for your check with HA as well.
Author
Owner

@rkleivel commented on GitHub (Oct 14, 2024):

As promised, here follows the outputs for 1 client with 2 exit nodes. Initially both exit nodes are up. I confirm that routing through exit node 1 is OK, then I take exit node 1 down. Routing does not switch to exit node 2 until I manually deactivate and activate the exit node entry in the web GUI.

Client:

admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 07:58:39 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 
    cache 
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:59:10 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:59:19 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 07:59:35 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$ 
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:00:11 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:00:27 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:00:44 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:01:10 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:01:41 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:03:04 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 08:04:25 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 
    cache 
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:04:42 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:04:56 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:05:12 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:05:35 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:06:00 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:06:26 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ 
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:06:57 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:07:32 UTC 2024
nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress
admin@client:~$ date && ip route get 1.1.1.1
Mon Oct 14 08:08:28 UTC 2024
1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 
    cache 
admin@client:~$

EXIT Node deactivated and activated in GUI at this point

admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443
Mon Oct 14 08:08:30 UTC 2024
Connection to 1.1.1.1 443 port [tcp/https] succeeded!
admin@client:~$ date
Mon Oct 14 08:08:54 UTC 2024
admin@client:~$

Exit Node 1:

admin@exit1:~$ date && sysctl net.ipv4.ip_forward
Mon Oct 14 07:58:46 UTC 2024
net.ipv4.ip_forward = 1
admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 
Mon Oct 14 07:59:01 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
07:59:39.508343 wt0   In  IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [S], seq 2382693363, win 64480, options [mss 1240,sackOK,TS val 4128671838 ecr 0,nop,wscale 7], length 0
07:59:39.508381 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [S], seq 2382693363, win 64480, options [mss 1240,sackOK,TS val 4128671838 ecr 0,nop,wscale 7], length 0
07:59:39.517940 ens18 In  IP 1.1.1.1.443 > 192.168.7.26.47854: Flags [S.], seq 2277416614, ack 2382693364, win 65535, options [mss 1460,sackOK,TS val 1861873 ecr 4128671838,nop,wscale 13], length 0
07:59:39.517982 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.47854: Flags [S.], seq 2277416614, ack 2382693364, win 65535, options [mss 1460,sackOK,TS val 1861873 ecr 4128671838,nop,wscale 13], length 0
07:59:39.563320 wt0   In  IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.563339 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.564156 wt0   In  IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.564175 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0
07:59:39.573752 ens18 In  IP 1.1.1.1.443 > 192.168.7.26.47854: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 1861929 ecr 4128671893], length 0
07:59:39.573792 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.47854: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 1861929 ecr 4128671893], length 0
07:59:39.618865 wt0   In  IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4128671948 ecr 1861929], length 0
07:59:39.618890 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4128671948 ecr 1861929], length 0
^C
12 packets captured
14 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ date && netbird down
Mon Oct 14 08:00:03 UTC 2024
Disconnected
admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 
Mon Oct 14 08:04:40 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel
admin@exit1:~$ date
Mon Oct 14 08:08:58 UTC 2024
admin@exit1:~$

Exit Node 2:

admin@exit2:~$ date && sysctl net.ipv4.ip_forward
Mon Oct 14 07:58:49 UTC 2024
net.ipv4.ip_forward = 1
admin@exit2:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 
Mon Oct 14 07:59:02 UTC 2024
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
08:08:30.973700 wt0   In  IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [S], seq 1509625333, win 64480, options [mss 1240,sackOK,TS val 4129203321 ecr 0,nop,wscale 7], length 0
08:08:30.973717 eth0  Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [S], seq 1509625333, win 64480, options [mss 1240,sackOK,TS val 4129203321 ecr 0,nop,wscale 7], length 0
08:08:30.985416 eth0  In  IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [S.], seq 1235011614, ack 1509625334, win 65535, options [mss 1460,sackOK,TS val 560577716 ecr 4129203321,nop,wscale 13], length 0
08:08:30.985430 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [S.], seq 1235011614, ack 1509625334, win 65535, options [mss 1460,sackOK,TS val 560577716 ecr 4129203321,nop,wscale 13], length 0
08:08:30.996615 wt0   In  IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4129203343 ecr 560577716], length 0
08:08:30.996627 eth0  Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4129203343 ecr 560577716], length 0
08:08:30.996632 wt0   In  IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4129203344 ecr 560577716], length 0
08:08:30.996636 eth0  Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4129203344 ecr 560577716], length 0
08:08:31.009041 eth0  In  IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [.], ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.009041 eth0  In  IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.009062 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [.], ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.009081 wt0   Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0
08:08:31.030793 wt0   In  IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4129203377 ecr 560577740], length 0
08:08:31.030800 eth0  Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4129203377 ecr 560577740], length 0
^C
14 packets captured
16 packets received by filter
0 packets dropped by kernel
admin@exit2:~$ date
Mon Oct 14 08:08:50 UTC 2024
admin@exit2:~$
@rkleivel commented on GitHub (Oct 14, 2024): As promised, here follows the outputs for 1 client with 2 exit nodes. Initially both exit nodes are up. I confirm that routing through exit node 1 is OK, then I take exit node 1 down. Routing does not switch to exit node 2 until I manually deactivate and activate the exit node entry in the web GUI. **Client:** ``` admin@client:~$ date && ip route get 1.1.1.1 Mon Oct 14 07:58:39 UTC 2024 1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 cache admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:59:10 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:59:19 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 07:59:35 UTC 2024 Connection to 1.1.1.1 443 port [tcp/https] succeeded! admin@client:~$ admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:00:11 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:00:27 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:00:44 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:01:10 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:01:41 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:03:04 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && ip route get 1.1.1.1 Mon Oct 14 08:04:25 UTC 2024 1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 cache admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:04:42 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:04:56 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:05:12 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:05:35 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:06:00 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:06:26 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:06:57 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:07:32 UTC 2024 nc: connect to 1.1.1.1 port 443 (tcp) timed out: Operation now in progress admin@client:~$ date && ip route get 1.1.1.1 Mon Oct 14 08:08:28 UTC 2024 1.1.1.1 dev wt0 table netbird src 100.93.158.105 uid 1000 cache admin@client:~$ ``` **EXIT Node deactivated and activated in GUI at this point** ``` admin@client:~$ date && nc -vw 5 -z 1.1.1.1 443 Mon Oct 14 08:08:30 UTC 2024 Connection to 1.1.1.1 443 port [tcp/https] succeeded! admin@client:~$ date Mon Oct 14 08:08:54 UTC 2024 admin@client:~$ ``` **Exit Node 1:** ``` admin@exit1:~$ date && sysctl net.ipv4.ip_forward Mon Oct 14 07:58:46 UTC 2024 net.ipv4.ip_forward = 1 admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 Mon Oct 14 07:59:01 UTC 2024 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 07:59:39.508343 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [S], seq 2382693363, win 64480, options [mss 1240,sackOK,TS val 4128671838 ecr 0,nop,wscale 7], length 0 07:59:39.508381 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [S], seq 2382693363, win 64480, options [mss 1240,sackOK,TS val 4128671838 ecr 0,nop,wscale 7], length 0 07:59:39.517940 ens18 In IP 1.1.1.1.443 > 192.168.7.26.47854: Flags [S.], seq 2277416614, ack 2382693364, win 65535, options [mss 1460,sackOK,TS val 1861873 ecr 4128671838,nop,wscale 13], length 0 07:59:39.517982 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.47854: Flags [S.], seq 2277416614, ack 2382693364, win 65535, options [mss 1460,sackOK,TS val 1861873 ecr 4128671838,nop,wscale 13], length 0 07:59:39.563320 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0 07:59:39.563339 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0 07:59:39.564156 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0 07:59:39.564175 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4128671893 ecr 1861873], length 0 07:59:39.573752 ens18 In IP 1.1.1.1.443 > 192.168.7.26.47854: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 1861929 ecr 4128671893], length 0 07:59:39.573792 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.47854: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 1861929 ecr 4128671893], length 0 07:59:39.618865 wt0 In IP 100.93.158.105.47854 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4128671948 ecr 1861929], length 0 07:59:39.618890 ens18 Out IP 192.168.7.26.47854 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4128671948 ecr 1861929], length 0 ^C 12 packets captured 14 packets received by filter 0 packets dropped by kernel admin@exit1:~$ date && netbird down Mon Oct 14 08:00:03 UTC 2024 Disconnected admin@exit1:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 Mon Oct 14 08:04:40 UTC 2024 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes ^C 0 packets captured 2 packets received by filter 0 packets dropped by kernel admin@exit1:~$ date Mon Oct 14 08:08:58 UTC 2024 admin@exit1:~$ ``` **Exit Node 2:** ``` admin@exit2:~$ date && sysctl net.ipv4.ip_forward Mon Oct 14 07:58:49 UTC 2024 net.ipv4.ip_forward = 1 admin@exit2:~$ date && sudo tcpdump -i any -nn host 1.1.1.1 and port 443 Mon Oct 14 07:59:02 UTC 2024 tcpdump: data link type LINUX_SLL2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 08:08:30.973700 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [S], seq 1509625333, win 64480, options [mss 1240,sackOK,TS val 4129203321 ecr 0,nop,wscale 7], length 0 08:08:30.973717 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [S], seq 1509625333, win 64480, options [mss 1240,sackOK,TS val 4129203321 ecr 0,nop,wscale 7], length 0 08:08:30.985416 eth0 In IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [S.], seq 1235011614, ack 1509625334, win 65535, options [mss 1460,sackOK,TS val 560577716 ecr 4129203321,nop,wscale 13], length 0 08:08:30.985430 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [S.], seq 1235011614, ack 1509625334, win 65535, options [mss 1460,sackOK,TS val 560577716 ecr 4129203321,nop,wscale 13], length 0 08:08:30.996615 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4129203343 ecr 560577716], length 0 08:08:30.996627 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [.], ack 1, win 504, options [nop,nop,TS val 4129203343 ecr 560577716], length 0 08:08:30.996632 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4129203344 ecr 560577716], length 0 08:08:30.996636 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [F.], seq 1, ack 1, win 504, options [nop,nop,TS val 4129203344 ecr 560577716], length 0 08:08:31.009041 eth0 In IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [.], ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0 08:08:31.009041 eth0 In IP 1.1.1.1.443 > 172.17.0.4.35580: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0 08:08:31.009062 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [.], ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0 08:08:31.009081 wt0 Out IP 1.1.1.1.443 > 100.93.158.105.35580: Flags [F.], seq 1, ack 2, win 8, options [nop,nop,TS val 560577740 ecr 4129203344], length 0 08:08:31.030793 wt0 In IP 100.93.158.105.35580 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4129203377 ecr 560577740], length 0 08:08:31.030800 eth0 Out IP 172.17.0.4.35580 > 1.1.1.1.443: Flags [.], ack 2, win 504, options [nop,nop,TS val 4129203377 ecr 560577740], length 0 ^C 14 packets captured 16 packets received by filter 0 packets dropped by kernel admin@exit2:~$ date Mon Oct 14 08:08:50 UTC 2024 admin@exit2:~$ ```
Author
Owner

@rkleivel commented on GitHub (Oct 14, 2024):

Thanks, @rkleivel, for sharing the outputs.

Ok, to confirm what we see in with the timestamps, the failure was after you restarted the connection and is probably related to the time it took for the peers to connect. Right?

With the previous release, we fixed an issue with forwarding rules caused by the number of peers in an access control rule, which shouldn't affect nodes with exit nodes and no access control groups set in any of the routing peer routes. So it may not have affected you unless you had an access control group for a network route.

We will wait for your check with HA as well.

I can confirm that the failure was after netbird down && netbird up on the exit node. Now it just takes up to a couple of minutes till the connection is restored. Last week, when creating this issue, I could easily wait for half an hour and still connection was not restored.
I can also confirm that Access Control Groups (optioinal) in the web GUI Exit node definition is empty during my tests.

@rkleivel commented on GitHub (Oct 14, 2024): > Thanks, @rkleivel, for sharing the outputs. > > Ok, to confirm what we see in with the timestamps, the failure was after you restarted the connection and is probably related to the time it took for the peers to connect. Right? > > With the previous release, we fixed an issue with forwarding rules caused by the number of peers in an access control rule, which shouldn't affect nodes with exit nodes and no access control groups set in any of the routing peer routes. So it may not have affected you unless you had an access control group for a network route. > > We will wait for your check with HA as well. I can confirm that the failure was after `netbird down && netbird up` on the exit node. Now it just takes up to a couple of minutes till the connection is restored. Last week, when creating this issue, I could easily wait for half an hour and still connection was not restored. I can also confirm that `Access Control Groups (optioinal)` in the web GUI Exit node definition is empty during my tests.
Author
Owner

@mgarces commented on GitHub (Nov 12, 2024):

hi there, we have released 0.31.1, that addresses some of the issues described here; can you please test it with this version?

@mgarces commented on GitHub (Nov 12, 2024): hi there, we have released [0.31.1](https://github.com/netbirdio/netbird/releases/tag/v0.31.1), that addresses some of the issues described here; can you please test it with this version?
Author
Owner

@rkleivel commented on GitHub (Nov 26, 2024):

My apology for late response on this.
I have tested the initial scenario, as well as the high availability aspect on 0.33.0, and all now seems to work as expected. So thanks a lot for that!

I do, however, see some strange effects on docker containers that run inside a netbird node that uses_exit_node (see test system setup in my initial post):

  • If a process inside the container tries to access some https endpoint
    AND
  • the exit node currently in use is a VM in Azure

the request times out. This does not happen if the exit node in use is hosted elsewhere, or if I do not use an exit node. If I run the same curl https://... on the docker host, it is also always fine no matter where the exit node is hosted. Http endpoints are always OK.
All nodes involved in the test have the same setup of Ubuntu 24.04, with netbird 0.33.0

I do not necessarily expect this to be a netbird issue, but would be very thankful for any thoughts on where Netbird possibly could intersect with a VM in Azure causing such an effect.

@rkleivel commented on GitHub (Nov 26, 2024): My apology for late response on this. I have tested the initial scenario, as well as the high availability aspect on 0.33.0, and all now seems to work as expected. So thanks a lot for that! I do, however, see some strange effects on docker containers that run inside a netbird node that `uses_exit_node` (see test system setup in my initial post): * If a process inside the container tries to access some https endpoint AND * the exit node currently in use is a VM in Azure the request times out. This does not happen if the exit node in use is hosted elsewhere, or if I do not use an exit node. If I run the same curl https://... on the docker host, it is also always fine no matter where the exit node is hosted. Http endpoints are always OK. All nodes involved in the test have the same setup of Ubuntu 24.04, with netbird 0.33.0 I do not necessarily expect this to be a netbird issue, but would be very thankful for any thoughts on where Netbird possibly could intersect with a VM in Azure causing such an effect.
Author
Owner

@alexkorotysh commented on GitHub (Apr 28, 2025):

I have the same. Exit nodes started working only with 0.29.4 (I used 0.41-0.43); the last two versions don't work. NAT just doesn't work.

Ruleset 0.29.4

table ip netbird {
        set nb0000001 {
                type ipv4_addr
                flags dynamic
                elements = { 0.0.0.0 }
        }

        set nb0000002 {
                type ipv4_addr
                flags dynamic
                elements = { 0.0.0.0 }
        }

        chain netbird-rt-fwd {
                ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 2870 bytes 2096850 accept
                ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 0 bytes 0 accept
        }

        chain netbird-rt-nat {
                type nat hook postrouting priority srcnat - 1; policy accept;
                oifname "lo" return
                ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 120 bytes 24235 masquerade
                ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 0 bytes 0 masquerade
        }

        chain netbird-acl-input-rules {
                iifname "wt0" accept
        }

        chain netbird-acl-output-rules {
                oifname "wt0" accept
        }

        chain netbird-acl-input-filter {
                type filter hook input priority filter; policy accept;
                iifname "wt0" ip saddr 100.83.0.0/16 ip daddr != 100.83.0.0/16 accept
                iifname "wt0" ip saddr != 100.83.0.0/16 ip daddr 100.83.0.0/16 accept
                iifname "wt0" ip saddr 100.83.0.0/16 ip daddr 100.83.0.0/16 jump netbird-acl-input-rules
                iifname "wt0" drop
        }

        chain netbird-acl-output-filter {
                type filter hook output priority filter; policy accept;
                oifname "wt0" ip saddr != 100.83.0.0/16 ip daddr 100.83.0.0/16 accept
                oifname "wt0" ip saddr 100.83.0.0/16 ip daddr != 100.83.0.0/16 accept
                oifname "wt0" ip saddr 100.83.0.0/16 ip daddr 100.83.0.0/16 jump netbird-acl-output-rules
                oifname "wt0" drop
        }

        chain netbird-acl-forward-filter {
                type filter hook forward priority filter; policy accept;
                iifname "wt0" jump netbird-rt-fwd
                oifname "wt0" jump netbird-rt-fwd
                iifname "wt0" meta mark 0x000007e4 accept
                oifname "wt0" meta mark 0x000007e4 accept
                iifname "wt0" jump netbird-acl-input-rules
                iifname "wt0" drop
        }

        chain netbird-acl-prerouting-filter {
                type filter hook prerouting priority mangle; policy accept;
                iifname "wt0" ip saddr != 100.83.0.0/16 ip daddr 100.83.65.243 meta mark set 0x000007e4
        }
}

Ruleset 0.43

table ip netbird {
        set nb0000001 {
                type ipv4_addr
                flags dynamic
                elements = { 0.0.0.0 }
        }

        chain netbird-rt-fwd {
                ct state established,related counter packets 0 bytes 0 accept
        }

        chain netbird-rt-postrouting {
                type nat hook postrouting priority srcnat - 1; policy accept;
                meta mark 0x0001bd21 oifname != "lo" counter packets 0 bytes 0 masquerade
                meta mark 0x0001bd22 oifname "wt0" counter packets 0 bytes 0 masquerade
        }

        chain netbird-rt-redirect {
                type nat hook prerouting priority dstnat; policy accept;
        }

        chain netbird-mangle-postrouting {
                type filter hook postrouting priority mangle; policy accept;
                oifname "wt0" ct state new ct mark set 0x0001bd11
        }

        chain netbird-mangle-prerouting {
                type filter hook prerouting priority mangle; policy accept;
                iifname != "wt0" ct state new meta mark set 0x0001bd22
                iifname "wt0" ct state new meta mark set 0x0001bd21
                iifname "wt0" ct state new ct mark set 0x0001bd10
                iifname "wt0" fib daddr type local meta mark set 0x0001bd20
        }

        chain netbird-acl-input-rules {
                ct state established,related counter packets 0 bytes 0 accept
                accept
        }

        chain netbird-acl-input-filter {
                type filter hook input priority filter; policy accept;
                iifname "wt0" jump netbird-acl-input-rules
                iifname "wt0" drop
        }

        chain netbird-acl-forward-filter {
                type filter hook forward priority filter; policy accept;
                meta mark 0x0001bd20 accept
                iifname "wt0" jump netbird-rt-fwd
                iifname "wt0" drop
        }
}

docker-compose.yml

services:
  netbird:
    image: netbirdio/netbird:0.29.4 
    container_name: netbird-exit-node
    hostname: eu.proxy.XXXX.YYYYY
    restart: unless-stopped
    environment:
      NB_SETUP_KEY: "REDACTED"  
      NB_MANAGEMENT_URL: "https://netbird.XXXX.YYYYY:33073"
    volumes:
      - ./netbird:/etc/netbird
    privileged: true
    network_mode: "host"

@alexkorotysh commented on GitHub (Apr 28, 2025): I have the same. Exit nodes started working only with 0.29.4 (I used 0.41-0.43); the last two versions don't work. NAT just doesn't work. Ruleset 0.29.4 ``` table ip netbird { set nb0000001 { type ipv4_addr flags dynamic elements = { 0.0.0.0 } } set nb0000002 { type ipv4_addr flags dynamic elements = { 0.0.0.0 } } chain netbird-rt-fwd { ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 2870 bytes 2096850 accept ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 0 bytes 0 accept } chain netbird-rt-nat { type nat hook postrouting priority srcnat - 1; policy accept; oifname "lo" return ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 120 bytes 24235 masquerade ip saddr 0.0.0.0/0 ip daddr 0.0.0.0/0 counter packets 0 bytes 0 masquerade } chain netbird-acl-input-rules { iifname "wt0" accept } chain netbird-acl-output-rules { oifname "wt0" accept } chain netbird-acl-input-filter { type filter hook input priority filter; policy accept; iifname "wt0" ip saddr 100.83.0.0/16 ip daddr != 100.83.0.0/16 accept iifname "wt0" ip saddr != 100.83.0.0/16 ip daddr 100.83.0.0/16 accept iifname "wt0" ip saddr 100.83.0.0/16 ip daddr 100.83.0.0/16 jump netbird-acl-input-rules iifname "wt0" drop } chain netbird-acl-output-filter { type filter hook output priority filter; policy accept; oifname "wt0" ip saddr != 100.83.0.0/16 ip daddr 100.83.0.0/16 accept oifname "wt0" ip saddr 100.83.0.0/16 ip daddr != 100.83.0.0/16 accept oifname "wt0" ip saddr 100.83.0.0/16 ip daddr 100.83.0.0/16 jump netbird-acl-output-rules oifname "wt0" drop } chain netbird-acl-forward-filter { type filter hook forward priority filter; policy accept; iifname "wt0" jump netbird-rt-fwd oifname "wt0" jump netbird-rt-fwd iifname "wt0" meta mark 0x000007e4 accept oifname "wt0" meta mark 0x000007e4 accept iifname "wt0" jump netbird-acl-input-rules iifname "wt0" drop } chain netbird-acl-prerouting-filter { type filter hook prerouting priority mangle; policy accept; iifname "wt0" ip saddr != 100.83.0.0/16 ip daddr 100.83.65.243 meta mark set 0x000007e4 } } ``` Ruleset 0.43 ``` table ip netbird { set nb0000001 { type ipv4_addr flags dynamic elements = { 0.0.0.0 } } chain netbird-rt-fwd { ct state established,related counter packets 0 bytes 0 accept } chain netbird-rt-postrouting { type nat hook postrouting priority srcnat - 1; policy accept; meta mark 0x0001bd21 oifname != "lo" counter packets 0 bytes 0 masquerade meta mark 0x0001bd22 oifname "wt0" counter packets 0 bytes 0 masquerade } chain netbird-rt-redirect { type nat hook prerouting priority dstnat; policy accept; } chain netbird-mangle-postrouting { type filter hook postrouting priority mangle; policy accept; oifname "wt0" ct state new ct mark set 0x0001bd11 } chain netbird-mangle-prerouting { type filter hook prerouting priority mangle; policy accept; iifname != "wt0" ct state new meta mark set 0x0001bd22 iifname "wt0" ct state new meta mark set 0x0001bd21 iifname "wt0" ct state new ct mark set 0x0001bd10 iifname "wt0" fib daddr type local meta mark set 0x0001bd20 } chain netbird-acl-input-rules { ct state established,related counter packets 0 bytes 0 accept accept } chain netbird-acl-input-filter { type filter hook input priority filter; policy accept; iifname "wt0" jump netbird-acl-input-rules iifname "wt0" drop } chain netbird-acl-forward-filter { type filter hook forward priority filter; policy accept; meta mark 0x0001bd20 accept iifname "wt0" jump netbird-rt-fwd iifname "wt0" drop } } ``` docker-compose.yml ``` services: netbird: image: netbirdio/netbird:0.29.4 container_name: netbird-exit-node hostname: eu.proxy.XXXX.YYYYY restart: unless-stopped environment: NB_SETUP_KEY: "REDACTED" NB_MANAGEMENT_URL: "https://netbird.XXXX.YYYYY:33073" volumes: - ./netbird:/etc/netbird privileged: true network_mode: "host" ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1318