Deadlock in latest master build when starting service #1212

Closed
opened 2025-11-20 05:26:03 -05:00 by saavagebueno · 6 comments
Owner

Originally created by @hurricanehrndz on GitHub (Sep 3, 2024).

Describe the problem

Following PR created a deadlock
13e7198046

Specifically this line and so this never gets executed https://github.com/netbirdio/netbird/blob/main/client/cmd/service_controller.go#L62 because of of the deadlock

To Reproduce

Steps to reproduce the behavior:
Not sure because it happens at various times, and independent of the flag

Expected behavior
GRPC Socket to be responsive and status cmd to work

Are you using NetBird Cloud?
no

Please specify whether you use NetBird Cloud or self-host NetBird's control plane.

NetBird version

master

NetBird status -dA output:

NA

I did builds where I remove the wait, and sure enough netbird status would be functional again. On non working instances the log file was missing

started daemon server
Originally created by @hurricanehrndz on GitHub (Sep 3, 2024). **Describe the problem** Following PR created a deadlock https://github.com/netbirdio/netbird/commit/13e7198046a0d73a9cd91bf8e063fafb3d41885c Specifically this [line](https://github.com/netbirdio/netbird/commit/13e7198046a0d73a9cd91bf8e063fafb3d41885c#diff-b9a524a16fa54801148c3bf6f07149721310a5d097f234e839466a751ca900d8R151) and so this never gets executed https://github.com/netbirdio/netbird/blob/main/client/cmd/service_controller.go#L62 because of of the deadlock **To Reproduce** Steps to reproduce the behavior: Not sure because it happens at various times, and independent of the flag **Expected behavior** GRPC Socket to be responsive and status cmd to work **Are you using NetBird Cloud?** no Please specify whether you use NetBird Cloud or self-host NetBird's control plane. **NetBird version** master **NetBird status -dA output:** NA I did builds where I remove the wait, and sure enough `netbird status` would be functional again. On non working instances the log file was missing ``` started daemon server ```
saavagebueno added the triage-needed label 2025-11-20 05:26:03 -05:00
Author
Owner

@hurricanehrndz commented on GitHub (Sep 3, 2024):

Here is the applicable back trace from when I killed the process

goroutine 19 [semacquire]:
runtime.gopark(0xc0000c23b0?, 0x10?, 0x40?, 0x42?, 0x1b1e2a0?)
	/opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/proc.go:398 +0xce fp=0xc00022fa08 sp=0xc00022f9e8 pc=0x103b84e
runtime.goparkunlock(...)
	/opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/proc.go:404
runtime.semacquire1(0xc0000c8938, 0x10?, 0x1, 0x0, 0x0?)
	/opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/sema.go:160 +0x218 fp=0xc00022fa70 sp=0xc00022fa08 pc=0x104c298
sync.runtime_Semacquire(0x10?)
	/opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/sema.go:62 +0x25 fp=0xc00022faa8 sp=0xc00022fa70 pc=0x1067705
sync.(*WaitGroup).Wait(0x1e11298?)
	/opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/sync/waitgroup.go:116 +0x48 fp=0xc00022fad0 sp=0xc00022faa8 pc=0x1078688
github.com/netbirdio/netbird/client/server.(*Server).Start(0xc0000ea200)
	/Users/chernand/src/yelp/netbird/client/server/server.go:151 +0x8bb fp=0xc00022fe68 sp=0xc00022fad0 pc=0x19ac81b
github.com/netbirdio/netbird/client/cmd.(*program).Start.func1()
	/Users/chernand/src/yelp/netbird/client/cmd/service_controller.go:59 +0x513 fp=0xc00022ffe0 sp=0xc00022fe68 pc=0x19bebd3
runtime.goexit()
	/opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00022ffe8 sp=0xc00022ffe0 pc=0x106b981
created by github.com/netbirdio/netbird/client/cmd.(*program).Start in goroutine 1
	/Users/chernand/src/yelp/netbird/client/cmd/service_controller.go:47 +0x315
@hurricanehrndz commented on GitHub (Sep 3, 2024): Here is the applicable back trace from when I killed the process ``` goroutine 19 [semacquire]: runtime.gopark(0xc0000c23b0?, 0x10?, 0x40?, 0x42?, 0x1b1e2a0?) /opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/proc.go:398 +0xce fp=0xc00022fa08 sp=0xc00022f9e8 pc=0x103b84e runtime.goparkunlock(...) /opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/proc.go:404 runtime.semacquire1(0xc0000c8938, 0x10?, 0x1, 0x0, 0x0?) /opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/sema.go:160 +0x218 fp=0xc00022fa70 sp=0xc00022fa08 pc=0x104c298 sync.runtime_Semacquire(0x10?) /opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/sema.go:62 +0x25 fp=0xc00022faa8 sp=0xc00022fa70 pc=0x1067705 sync.(*WaitGroup).Wait(0x1e11298?) /opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/sync/waitgroup.go:116 +0x48 fp=0xc00022fad0 sp=0xc00022faa8 pc=0x1078688 github.com/netbirdio/netbird/client/server.(*Server).Start(0xc0000ea200) /Users/chernand/src/yelp/netbird/client/server/server.go:151 +0x8bb fp=0xc00022fe68 sp=0xc00022fad0 pc=0x19ac81b github.com/netbirdio/netbird/client/cmd.(*program).Start.func1() /Users/chernand/src/yelp/netbird/client/cmd/service_controller.go:59 +0x513 fp=0xc00022ffe0 sp=0xc00022fe68 pc=0x19bebd3 runtime.goexit() /opt/homebrew/Cellar/go@1.21/1.21.13/libexec/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00022ffe8 sp=0xc00022ffe0 pc=0x106b981 created by github.com/netbirdio/netbird/client/cmd.(*program).Start in goroutine 1 /Users/chernand/src/yelp/netbird/client/cmd/service_controller.go:47 +0x315 ```
Author
Owner

@mlsmaycon commented on GitHub (Sep 3, 2024):

Thanks for opening the issue @hurricanehrndz confirmed the issue and created the PR #2528 to fix the issue

@mlsmaycon commented on GitHub (Sep 3, 2024): Thanks for opening the issue @hurricanehrndz confirmed the issue and created the PR #2528 to fix the issue
Author
Owner

@hurricanehrndz commented on GitHub (Sep 3, 2024):

@mlsmaycon my pleasure, only partially fixes the issue. When autoconnect is disabled now, there is no deadlock. But there is still a deadlock if autoconnect is enabled

@hurricanehrndz commented on GitHub (Sep 3, 2024): @mlsmaycon my pleasure, only partially fixes the issue. When autoconnect is disabled now, there is no deadlock. But there is still a deadlock if autoconnect is enabled
Author
Owner

@mlsmaycon commented on GitHub (Sep 3, 2024):

@hurricanehrndz it is possible that the connection is exiting before we call wg.Done() due to an error. Can you enable trace logs and check if there are any events after the entry running client connection?

@mlsmaycon commented on GitHub (Sep 3, 2024): @hurricanehrndz it is possible that the connection is exiting before we call wg.Done() due to an error. Can you enable [trace logs](https://docs.netbird.io/how-to/troubleshooting-client#enabling-debug-logs-on-agent) and check if there are any events after the entry `running client connection`?
Author
Owner

@hurricanehrndz commented on GitHub (Sep 3, 2024):

No connection, this is a new client, not registered to any mgmt endpoint. I will get you the full backtrace and the log

@hurricanehrndz commented on GitHub (Sep 3, 2024): No connection, this is a new client, not registered to any mgmt endpoint. I will get you the full backtrace and the log
Author
Owner

@hurricanehrndz commented on GitHub (Sep 3, 2024):

Going to DM them to you in slack

@hurricanehrndz commented on GitHub (Sep 3, 2024): Going to DM them to you in slack
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1212