ip route conflicts and race conditions when resolving best routes in clientNetwork.getBestRouteFromStatuses() #935

Open
opened 2025-11-20 05:20:13 -05:00 by saavagebueno · 4 comments
Owner

Originally created by @nazarewk on GitHub (May 28, 2024).

Describe the problem

I have noticed the client-side route selection feature so during today's Netbird workshop for our team I have enabled both routes clashing on 10.4/16 and later today teammates started reporting problems with accessing the primary system so I switched the secondary off and dug into the code.

First thing I have noticed using ip route metric was superseded by:

The method is scoped to *clientNetwork, which I guess does not consider a possibility of another NetworkID providing the same CIDR (10.4/16 in our case) resulting in a conflict/race condition on the created ip route entry.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Either of:

  • only the best/highest metric route across all networks is selected
  • create the ip route with the metric argument (seems to be netlink.Route.Priority) which would properly prioritize routes on the ip route level even in case of a clash
    • watch our not to deregister the wrong route during updates

Are you using NetBird Cloud?

Yes

NetBird version

0.27.7

NetBird status -d output:

If applicable, add the `netbird status -d' command output.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Originally created by @nazarewk on GitHub (May 28, 2024). **Describe the problem** I have noticed the client-side route selection feature so during today's Netbird workshop for our team I have enabled both routes clashing on `10.4/16` and later today teammates started reporting problems with accessing the `primary` system so I switched the `secondary` off and dug into the code. First thing I have noticed using `ip route metric` was superseded by: - [best peer selection algorithm](https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/client.go#L80-L93), which takes `Route.Metric` as the most significant scoring, - using dedicated routing table `7120` aka `netbird` (the one I discovered in https://github.com/netbirdio/netbird/issues/2023 ) - [not providing a `metric` parameter to created `ip route` at all](https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/systemops_linux.go#L258-L263) The method is scoped to `*clientNetwork`, which I guess does not consider a possibility of another `NetworkID` providing the same CIDR (`10.4/16` in our case) resulting in a conflict/race condition on the created `ip route` entry. **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** Either of: - only the best/highest metric route across all networks is selected - create the `ip route` with the `metric` argument (seems to be `netlink.Route.Priority`) which would properly prioritize routes on the `ip route` level even in case of a clash - watch our not to deregister the wrong route during updates **Are you using NetBird Cloud?** Yes **NetBird version** `0.27.7` **NetBird status -d output:** If applicable, add the `netbird status -d' command output. **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** Add any other context about the problem here.
saavagebueno added the bugroutesnetworkingcloud labels 2025-11-20 05:20:13 -05:00
Author
Owner

@nazarewk commented on GitHub (May 28, 2024):

I've noticed that I was connected to the primary network and the ip route entry disappeared completely after i disabled the secondary network in the web ui.

I had to do netbird down && netbird up for the client to re-register the primary route.

@nazarewk commented on GitHub (May 28, 2024): I've noticed that I was connected to the `primary` network and the `ip route` entry disappeared completely after i disabled the `secondary` network in the web ui. I had to do `netbird down && netbird up` for the client to re-register the `primary` route.
Author
Owner

@nazarewk commented on GitHub (May 28, 2024):

a single clientNetwork seems to represent a very specific <network-id>-<network-range> grouping:

  1. f176807ebe/client/internal/routemanager/manager.go (L46-L46)
  2. f176807ebe/route/hauniqueid.go (L8-L10)

making it impossible to know about other input.NetID providing the same input.Network.String() on completely different site/datacenter

@nazarewk commented on GitHub (May 28, 2024): a single `clientNetwork` seems to represent a very specific `<network-id>-<network-range>` grouping: 1. https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L46-L46 2. https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/route/hauniqueid.go#L8-L10 making it impossible to know about other `input.NetID` providing the same `input.Network.String()` on completely different site/datacenter
Author
Owner

@nazarewk commented on GitHub (May 28, 2024):

theoretically using input.Network.String() instead of HAUniqueID here could help with my issue:
f176807ebe/client/internal/routemanager/manager.go (L243-L251)
but I am not sure about the wider consequences

@nazarewk commented on GitHub (May 28, 2024): theoretically using `input.Network.String()` instead of `HAUniqueID` here could help with my issue: https://github.com/netbirdio/netbird/blob/f176807ebee751289382202569bf6874105e111f/client/internal/routemanager/manager.go#L243-L251 but I am not sure about the wider consequences
Author
Owner

@nazarewk commented on GitHub (May 28, 2024):

I've noticed that I was connected to the primary network and the ip route entry disappeared completely after i disabled the secondary network in the web ui.

I had to do netbird down && netbird up for the client to re-register the primary route.

@mlsmaycon noted that this part of the issue (removing routes still in use) will be fixed by https://github.com/netbirdio/netbird/pull/1943 , since it is a big refactor of (previously) linked code in this issue it will be worth analyzing and revisiting again after #1943 is merged

@nazarewk commented on GitHub (May 28, 2024): > I've noticed that I was connected to the `primary` network and the `ip route` entry disappeared completely after i disabled the `secondary` network in the web ui. > > I had to do `netbird down && netbird up` for the client to re-register the `primary` route. @mlsmaycon noted that this part of the issue (removing routes still in use) will be fixed by https://github.com/netbirdio/netbird/pull/1943 , since it is a big refactor of (previously) linked code in this issue it will be worth analyzing and revisiting again after #1943 is merged
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#935