Performance issues on multi-region but same network backbone #1181

Closed
opened 2025-11-20 05:25:29 -05:00 by saavagebueno · 2 comments
Owner

Originally created by @ednxzu on GitHub (Aug 27, 2024).

Describe the problem

Hello, we are currently testing netbird to connect two public cloud region on our infrastructure.
The setup went smoothly, and everything is up and running, however, I am seeing degraded performances pretty much across the board when testing region-to-region bandwidth.

To Reproduce

We have 2 openstack regions, each with their own project, subnets, and routers.

Each region has 2 netbird peers, advertising routes for their region (region-1 10.1.0.0/16, region-2 10.2.0.0/16).

One top of that, there is a keepalived virtual ip assigned to one of the routers on each region, that fails over in case of a router being down

each region has an openstack router connecting the privte subnets to the internet with basic NAT.

this router also has static routes, pointing to the virtual IP, for the CIDR blocks in the other region. Basically all the traffic for region-2 from region-1 gets forwarded to the vip, which is located on a netbird peer machine, then forwardng the traffic to the other region.

While doing performance tests on this setup, I noticed that interconnectivity between regions seemed slow.

for reference, iperf3 result from host in region-1 to host in region-2:

region-1 to region-2

root@netbird-routers-dc3-a-1:~# iperf3 -c 10.2.132.83 M
Connecting to host 10.2.132.83, port 5201
[  5] local 100.89.235.78 port 46268 connected to 10.2.132.83 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  52.4 MBytes   440 Mbits/sec   12    234 KBytes
[  5]   1.00-2.00   sec  62.7 MBytes   526 Mbits/sec   10    301 KBytes
[  5]   2.00-3.00   sec  54.1 MBytes   454 Mbits/sec    3    311 KBytes
[  5]   3.00-4.00   sec  57.8 MBytes   485 Mbits/sec    0    413 KBytes
[  5]   4.00-5.00   sec  54.9 MBytes   461 Mbits/sec    0    457 KBytes
[  5]   5.00-6.00   sec  58.5 MBytes   491 Mbits/sec    0    342 KBytes
[  5]   6.00-7.00   sec  49.8 MBytes   418 Mbits/sec    1    425 KBytes
[  5]   7.00-8.00   sec  49.0 MBytes   411 Mbits/sec   82    131 KBytes
[  5]   8.00-9.00   sec  48.5 MBytes   407 Mbits/sec    0    282 KBytes
[  5]   9.00-10.00  sec  60.6 MBytes   508 Mbits/sec    0    402 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   548 MBytes   460 Mbits/sec  108             sender
[  5]   0.00-10.04  sec   546 MBytes   456 Mbits/sec                  receiver

iperf Done.

And it gets even worse the other way around.

region-2 to region-1

root@netbird-routers-dc4-a-1:~# iperf3 -c 10.2.128.25 M
Connecting to host 10.2.128.25, port 5201
[  5] local 100.89.245.125 port 55952 connected to 10.2.128.25 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  22.9 MBytes   192 Mbits/sec   11    116 KBytes
[  5]   1.00-2.00   sec  23.1 MBytes   194 Mbits/sec   21    174 KBytes
[  5]   2.00-3.00   sec  25.3 MBytes   212 Mbits/sec   39    188 KBytes
[  5]   3.00-4.00   sec  28.4 MBytes   238 Mbits/sec    5    186 KBytes
[  5]   4.00-5.00   sec  22.2 MBytes   186 Mbits/sec   20    146 KBytes
[  5]   5.00-6.00   sec  23.9 MBytes   200 Mbits/sec    3    119 KBytes
[  5]   6.00-7.00   sec  25.8 MBytes   216 Mbits/sec    2    179 KBytes
[  5]   7.00-8.00   sec  27.7 MBytes   232 Mbits/sec    2    138 KBytes
[  5]   8.00-9.00   sec  23.8 MBytes   200 Mbits/sec    4    145 KBytes
[  5]   9.00-10.00  sec  24.8 MBytes   208 Mbits/sec    1    230 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   248 MBytes   208 Mbits/sec  108             sender
[  5]   0.00-10.05  sec   247 MBytes   206 Mbits/sec                  receiver

iperf Done.

I attached floating public IP addresses to the routers, and doing the same iperf test over their public IP works as expected

root@netbird-routers-dc3-a-2:~# iperf3 -c 37.156.42.162 M
Connecting to host 37.156.42.162, port 5201
[  5] local 10.2.128.79 port 40314 connected to 37.156.42.162 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   290 MBytes  2.44 Gbits/sec  145    722 KBytes
[  5]   1.00-2.00   sec   446 MBytes  3.74 Gbits/sec    0   1.07 MBytes
[  5]   2.00-3.00   sec   436 MBytes  3.66 Gbits/sec   48   1.01 MBytes
[  5]   3.00-4.00   sec   402 MBytes  3.38 Gbits/sec   22   1002 KBytes
[  5]   4.00-5.00   sec   441 MBytes  3.70 Gbits/sec    5    971 KBytes
[  5]   5.00-6.00   sec   438 MBytes  3.67 Gbits/sec   21    737 KBytes
[  5]   6.00-7.00   sec   398 MBytes  3.33 Gbits/sec    0   1.04 MBytes
[  5]   7.00-8.00   sec   399 MBytes  3.34 Gbits/sec  366    714 KBytes
[  5]   8.00-9.00   sec   421 MBytes  3.53 Gbits/sec    0   1.05 MBytes
[  5]   9.00-10.00  sec   409 MBytes  3.43 Gbits/sec   89   1.01 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.98 GBytes  3.42 Gbits/sec  696             sender
[  5]   0.00-10.04  sec  3.98 GBytes  3.41 Gbits/sec                  receiver

iperf Done.

The wireguard kernel modules are loaded correctly on all the peers

root@netbird-routers-dc3-a-2:~# lsmod | grep wireguard
wireguard              94208  0
curve25519_x86_64      36864  1 wireguard
libchacha20poly1305    16384  1 wireguard
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             20480  1 wireguard
libcurve25519_generic    49152  2 curve25519_x86_64,wireguard

I am not sure what could be the issue here, but these VMs are essentially on the same 25Gbps backbone, so the performances I would expect should be much higher the 200Mbps from site to site.

I'm using netbird 0.28.9 on all routers, spun up in docker containers like so

root@netbird-routers-dc3-a-2:~# cat /etc/systemd/system/netbird-router.service
[Unit]
After=docker.service
PartOf=docker.service
Requires=docker.service

[Service]
EnvironmentFile=/etc/default/netbird-router
ExecStartPre=-/usr/bin/docker rm -f %N
ExecStartPre=/usr/sbin/iptables -t nat -A POSTROUTING -o wt0 -j MASQUERADE

ExecStartPre=/usr/sbin/iptables -A FORWARD -d 127.0.0.0/8 -j ACCEPT
ExecStartPre=/usr/sbin/iptables -A FORWARD -s 127.0.0.0/8 -j ACCEPT
ExecStartPre=/usr/sbin/iptables -A FORWARD -d 10.0.0.0/8 -j ACCEPT
ExecStartPre=/usr/sbin/iptables -A FORWARD -s 10.0.0.0/8 -j ACCEPT
ExecStartPre=/usr/sbin/iptables -A FORWARD -d 172.16.0.0/12 -j ACCEPT
ExecStartPre=/usr/sbin/iptables -A FORWARD -s 172.16.0.0/12 -j ACCEPT
ExecStartPre=/usr/sbin/iptables -A FORWARD -d 192.168.0.0/16 -j ACCEPT
ExecStartPre=/usr/sbin/iptables -A FORWARD -s 192.168.0.0/16 -j ACCEPT
ExecStart=/usr/bin/docker run   --name %N   --rm   --network host   --health-cmd '/bin/sh -c \"if ip link show wt0 > /dev/null 2>&1 && ip addr show wt0 | grep -q ''inet ''; then exit 0; else exit 1; fi\"'   --health-interval "5s"   --health-start-interval "10s"   --env-file /etc/default/netbird-router   --volume netbird:/etc/netbird   --cap-add NET_ADMIN   --cap-add SYS_ADMIN   --cap-add SYS_RESOURCE   --cap-add NET_RAW   netbirdio/netbird:latest

ExecStop=/usr/bin/docker stop %N
ExecStopPost=/usr/sbin/iptables -t nat -D POSTROUTING -o wt0 -j MASQUERADE

ExecStopPost=/usr/sbin/iptables -D FORWARD -d 127.0.0.0/8 -j ACCEPT
ExecStopPost=/usr/sbin/iptables -D FORWARD -s 127.0.0.0/8 -j ACCEPT
ExecStopPost=/usr/sbin/iptables -D FORWARD -d 10.0.0.0/8 -j ACCEPT
ExecStopPost=/usr/sbin/iptables -D FORWARD -s 10.0.0.0/8 -j ACCEPT
ExecStopPost=/usr/sbin/iptables -D FORWARD -d 172.16.0.0/12 -j ACCEPT
ExecStopPost=/usr/sbin/iptables -D FORWARD -s 172.16.0.0/12 -j ACCEPT
ExecStopPost=/usr/sbin/iptables -D FORWARD -d 192.168.0.0/16 -j ACCEPT
ExecStopPost=/usr/sbin/iptables -D FORWARD -s 192.168.0.0/16 -j ACCEPT
SyslogIdentifier=%n
Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

I do not use self-hosted controller, just the SaaS platform.
Just wondering if you had any guesses on the issue.

We really like the product overall, and would want to advertise it to our customers for multi-region connectivity, but we cannot do it with these performances (consistent 500+Mbps would be a minimum requirement)

UPDATE: Tried recreating the infra from scratch, performance are "random", as in sometimes I'll get 200Mbps, sometimes 10Mbps, sometimes 600. Its all over the place for what is essentially a consistent setup.

Test repo: https://gitlab.com/cloud-infra-templates/multi-region-connectivity

Thanks !

Originally created by @ednxzu on GitHub (Aug 27, 2024). **Describe the problem** Hello, we are currently testing netbird to connect two public cloud region on our infrastructure. The setup went smoothly, and everything is up and running, however, I am seeing degraded performances pretty much across the board when testing region-to-region bandwidth. **To Reproduce** We have 2 openstack regions, each with their own project, subnets, and routers. Each region has 2 netbird peers, advertising routes for their region (region-1 10.1.0.0/16, region-2 10.2.0.0/16). One top of that, there is a keepalived virtual ip assigned to one of the routers on each region, that fails over in case of a router being down each region has an openstack router connecting the privte subnets to the internet with basic NAT. this router also has static routes, pointing to the virtual IP, for the CIDR blocks in the other region. Basically all the traffic for region-2 from region-1 gets forwarded to the vip, which is located on a netbird peer machine, then forwardng the traffic to the other region. While doing performance tests on this setup, I noticed that interconnectivity between regions seemed slow. for reference, iperf3 result from host in region-1 to host in region-2: **region-1 to region-2** ``` root@netbird-routers-dc3-a-1:~# iperf3 -c 10.2.132.83 M Connecting to host 10.2.132.83, port 5201 [ 5] local 100.89.235.78 port 46268 connected to 10.2.132.83 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 52.4 MBytes 440 Mbits/sec 12 234 KBytes [ 5] 1.00-2.00 sec 62.7 MBytes 526 Mbits/sec 10 301 KBytes [ 5] 2.00-3.00 sec 54.1 MBytes 454 Mbits/sec 3 311 KBytes [ 5] 3.00-4.00 sec 57.8 MBytes 485 Mbits/sec 0 413 KBytes [ 5] 4.00-5.00 sec 54.9 MBytes 461 Mbits/sec 0 457 KBytes [ 5] 5.00-6.00 sec 58.5 MBytes 491 Mbits/sec 0 342 KBytes [ 5] 6.00-7.00 sec 49.8 MBytes 418 Mbits/sec 1 425 KBytes [ 5] 7.00-8.00 sec 49.0 MBytes 411 Mbits/sec 82 131 KBytes [ 5] 8.00-9.00 sec 48.5 MBytes 407 Mbits/sec 0 282 KBytes [ 5] 9.00-10.00 sec 60.6 MBytes 508 Mbits/sec 0 402 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 548 MBytes 460 Mbits/sec 108 sender [ 5] 0.00-10.04 sec 546 MBytes 456 Mbits/sec receiver iperf Done. ``` And it gets even worse the other way around. **region-2 to region-1** ``` root@netbird-routers-dc4-a-1:~# iperf3 -c 10.2.128.25 M Connecting to host 10.2.128.25, port 5201 [ 5] local 100.89.245.125 port 55952 connected to 10.2.128.25 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 22.9 MBytes 192 Mbits/sec 11 116 KBytes [ 5] 1.00-2.00 sec 23.1 MBytes 194 Mbits/sec 21 174 KBytes [ 5] 2.00-3.00 sec 25.3 MBytes 212 Mbits/sec 39 188 KBytes [ 5] 3.00-4.00 sec 28.4 MBytes 238 Mbits/sec 5 186 KBytes [ 5] 4.00-5.00 sec 22.2 MBytes 186 Mbits/sec 20 146 KBytes [ 5] 5.00-6.00 sec 23.9 MBytes 200 Mbits/sec 3 119 KBytes [ 5] 6.00-7.00 sec 25.8 MBytes 216 Mbits/sec 2 179 KBytes [ 5] 7.00-8.00 sec 27.7 MBytes 232 Mbits/sec 2 138 KBytes [ 5] 8.00-9.00 sec 23.8 MBytes 200 Mbits/sec 4 145 KBytes [ 5] 9.00-10.00 sec 24.8 MBytes 208 Mbits/sec 1 230 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 248 MBytes 208 Mbits/sec 108 sender [ 5] 0.00-10.05 sec 247 MBytes 206 Mbits/sec receiver iperf Done. ``` I attached floating public IP addresses to the routers, and doing the same iperf test over their public IP works as expected ``` root@netbird-routers-dc3-a-2:~# iperf3 -c 37.156.42.162 M Connecting to host 37.156.42.162, port 5201 [ 5] local 10.2.128.79 port 40314 connected to 37.156.42.162 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 290 MBytes 2.44 Gbits/sec 145 722 KBytes [ 5] 1.00-2.00 sec 446 MBytes 3.74 Gbits/sec 0 1.07 MBytes [ 5] 2.00-3.00 sec 436 MBytes 3.66 Gbits/sec 48 1.01 MBytes [ 5] 3.00-4.00 sec 402 MBytes 3.38 Gbits/sec 22 1002 KBytes [ 5] 4.00-5.00 sec 441 MBytes 3.70 Gbits/sec 5 971 KBytes [ 5] 5.00-6.00 sec 438 MBytes 3.67 Gbits/sec 21 737 KBytes [ 5] 6.00-7.00 sec 398 MBytes 3.33 Gbits/sec 0 1.04 MBytes [ 5] 7.00-8.00 sec 399 MBytes 3.34 Gbits/sec 366 714 KBytes [ 5] 8.00-9.00 sec 421 MBytes 3.53 Gbits/sec 0 1.05 MBytes [ 5] 9.00-10.00 sec 409 MBytes 3.43 Gbits/sec 89 1.01 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 3.98 GBytes 3.42 Gbits/sec 696 sender [ 5] 0.00-10.04 sec 3.98 GBytes 3.41 Gbits/sec receiver iperf Done. ``` The wireguard kernel modules are loaded correctly on all the peers ``` root@netbird-routers-dc3-a-2:~# lsmod | grep wireguard wireguard 94208 0 curve25519_x86_64 36864 1 wireguard libchacha20poly1305 16384 1 wireguard ip6_udp_tunnel 16384 1 wireguard udp_tunnel 20480 1 wireguard libcurve25519_generic 49152 2 curve25519_x86_64,wireguard ``` I am not sure what could be the issue here, but these VMs are essentially on the same 25Gbps backbone, so the performances I would expect should be much higher the 200Mbps from site to site. I'm using netbird `0.28.9` on all routers, spun up in docker containers like so ``` root@netbird-routers-dc3-a-2:~# cat /etc/systemd/system/netbird-router.service [Unit] After=docker.service PartOf=docker.service Requires=docker.service [Service] EnvironmentFile=/etc/default/netbird-router ExecStartPre=-/usr/bin/docker rm -f %N ExecStartPre=/usr/sbin/iptables -t nat -A POSTROUTING -o wt0 -j MASQUERADE ExecStartPre=/usr/sbin/iptables -A FORWARD -d 127.0.0.0/8 -j ACCEPT ExecStartPre=/usr/sbin/iptables -A FORWARD -s 127.0.0.0/8 -j ACCEPT ExecStartPre=/usr/sbin/iptables -A FORWARD -d 10.0.0.0/8 -j ACCEPT ExecStartPre=/usr/sbin/iptables -A FORWARD -s 10.0.0.0/8 -j ACCEPT ExecStartPre=/usr/sbin/iptables -A FORWARD -d 172.16.0.0/12 -j ACCEPT ExecStartPre=/usr/sbin/iptables -A FORWARD -s 172.16.0.0/12 -j ACCEPT ExecStartPre=/usr/sbin/iptables -A FORWARD -d 192.168.0.0/16 -j ACCEPT ExecStartPre=/usr/sbin/iptables -A FORWARD -s 192.168.0.0/16 -j ACCEPT ExecStart=/usr/bin/docker run --name %N --rm --network host --health-cmd '/bin/sh -c \"if ip link show wt0 > /dev/null 2>&1 && ip addr show wt0 | grep -q ''inet ''; then exit 0; else exit 1; fi\"' --health-interval "5s" --health-start-interval "10s" --env-file /etc/default/netbird-router --volume netbird:/etc/netbird --cap-add NET_ADMIN --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --cap-add NET_RAW netbirdio/netbird:latest ExecStop=/usr/bin/docker stop %N ExecStopPost=/usr/sbin/iptables -t nat -D POSTROUTING -o wt0 -j MASQUERADE ExecStopPost=/usr/sbin/iptables -D FORWARD -d 127.0.0.0/8 -j ACCEPT ExecStopPost=/usr/sbin/iptables -D FORWARD -s 127.0.0.0/8 -j ACCEPT ExecStopPost=/usr/sbin/iptables -D FORWARD -d 10.0.0.0/8 -j ACCEPT ExecStopPost=/usr/sbin/iptables -D FORWARD -s 10.0.0.0/8 -j ACCEPT ExecStopPost=/usr/sbin/iptables -D FORWARD -d 172.16.0.0/12 -j ACCEPT ExecStopPost=/usr/sbin/iptables -D FORWARD -s 172.16.0.0/12 -j ACCEPT ExecStopPost=/usr/sbin/iptables -D FORWARD -d 192.168.0.0/16 -j ACCEPT ExecStopPost=/usr/sbin/iptables -D FORWARD -s 192.168.0.0/16 -j ACCEPT SyslogIdentifier=%n Restart=always RestartSec=10s [Install] WantedBy=multi-user.target ``` I do not use self-hosted controller, just the SaaS platform. Just wondering if you had any guesses on the issue. We really like the product overall, and would want to advertise it to our customers for multi-region connectivity, but we cannot do it with these performances (consistent 500+Mbps would be a minimum requirement) UPDATE: Tried recreating the infra from scratch, performance are "random", as in sometimes I'll get 200Mbps, sometimes 10Mbps, sometimes 600. Its all over the place for what is essentially a consistent setup. Test repo: https://gitlab.com/cloud-infra-templates/multi-region-connectivity Thanks !
saavagebueno added the waiting-feedbacktriage-needed labels 2025-11-20 05:25:29 -05:00
Author
Owner

@nazarewk commented on GitHub (Apr 28, 2025):

Hello @ednxzu,

We're currently reviewing our open issues and would like to verify if this problem still exists in the latest NetBird version.

Could you please confirm if the issue is still there?

We may close this issue temporarily if we don't hear back from you within 2 weeks, but feel free to reopen it with updated information.

Thanks for your contribution to improving the project!

@nazarewk commented on GitHub (Apr 28, 2025): Hello @ednxzu, We're currently reviewing our open issues and would like to verify if this problem still exists in the [latest NetBird version](https://github.com/netbirdio/netbird/releases). Could you please confirm if the issue is still there? We may close this issue temporarily if we don't hear back from you within **2 weeks**, but feel free to reopen it with updated information. Thanks for your contribution to improving the project!
Author
Owner

@ednxzu commented on GitHub (May 3, 2025):

Hello !

I turned out not to be a netbird issue in the end, but a UDP performance issue on our end, which obviously impacted netbird, but it wasn't the main culprit. This is no longer relevant so I will close it.

@ednxzu commented on GitHub (May 3, 2025): Hello ! I turned out not to be a netbird issue in the end, but a UDP performance issue on our end, which obviously impacted netbird, but it wasn't the main culprit. This is no longer relevant so I will close it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1181