Files
netbird/client/cmd/debug.go
Riccardo Manfrin 2bcea9d582 [client] add MDM configuration profile support (Windows registry + macOS plist) (#6374)
* Initial scaffolding

* Applies MDM override

* Unit tests

* Helpers business logic

* Return error if trying to modify any config that is gated by MDM

* Add ManagedFields to returned config over GetConfig

* Adds initial 101 MDM policy business logic testing

* gRPC MDM changes

* MDM Name scoping for clarity

* Implements windows loading of MDM policy

* Adds missing WGPort config

* Cleanup setupKey to align to linear

* Align split tunnel code

* Adds some log

* Prefix every log with MDM

* Adds debug config cobra command

This can be useful for troubleshooting and checking config
now that its resolution is not trivial

defaults > config > env cars > CLI/UI > MDM

* Adds MDM 1m diff checker & reloader

* Adds also up/start after cancel

* Publishes event for UI to sync upon MDM changes

* Add events to resync UI to actual config

This also provide fixup for UI no aligning to changed config when coming from cli up with config flags.

* UI behavior conflicts relaxation

UI sends full config snapshot with all values. It doesn't
make sense to block it if the values are aligned with the
values constrained by the MDM policy. It's just simplier
to allow values that are compliant. (this goes for the CLI
as well at this point)

* Lock toggle Settngs

* Advanced Settings locking

* Fixup presharedkey

* Apply MDM locks

* Toggle gray in/out for Advanced Settings

* Adds support for disabling of Profiles and UpdateSettings feature flags

* Adds Gate Login as well when --disable-update-settings=true is given to service

This commit tries to settle things with an old PR-4237 which had relaxed
the case where the SetConfig returned an `Unavailable` code error.

Under this circumnstance the PR allowed the upFunc to just emit a warning and
progress further with the login gRPC. Since the login call is consuming
the --management-url coming from the `up` command, it might be possible
to abuse the "Unavailable" code to inject a management URL that is different
from the configured one even though the --disable-update-settings is set
to true (?)

* Evaluate disable-update-settings errors only when there's an actual override

* [UI] Fixup advanced Settings

* [UI] Fixup for preshared key

* [UI] Fixup for profile enable/disable toggle

We need to align the initial state to evaluate the delta in case.

The initial state has to be "true" since the profile starts visible.
Then we receive MDM and transition the cache bool value to the actual
MDM imposed state

* Enforces disable networks

* [UI] Aligns to "enable/disable once on change only"

* Fixup: MDM wins. always

* Removes --disable-advanced-settings

It was a typo in our meetings. the actual thing is --disable-update-settings

* [PROTO] Removes --disable-advanced-settings

* [UI] Removes --disable-advanced-settings

* Pins feat profile retrieval to notif event

* [UI] Fix for "hide" not working when propagating to parent with children

* Adds dep for reading plist files

* Introduces support for darwing plist loading

* Tests MDM config reload via ticker

* [PROVISIONING] ADMX/ADML/PS/bash scripts/templates

* CI fixes

- Add docstrings to `mdm_integration`
- refactor for cognitive complexity
- mod tidy

* Linting

* Add docstrings to `mdm_integration`

* nil,nil is no policy and no error. Allow it

* nil,nil is no policy and no error. Allow it

* exclude MDM profile adminstrated keys data from debug bundle

* Fixes Rosenpass left disable after MDM unlock

* Partial revert coderabbit added docstrings

* Renaming fix

* Avoid locking on clientRunning bool when the connection is aborted for whatever reason

We want to just signal this through the giveUpChan, we will manage the signal from
the waiter side and in case set it to false there. THis way we avoid locking,
which should allow the MDM down+wait_for_term_chan_signal_+up procedure

clientRunning is used to signal two different conditions here:

1. the initialization procedure is over (we have an engine)
2. the connection being up (or being attempted)

Probably these two functionalities should not alias, and the failure of the second condition
(because of any error) should just drive a reconnection (currently it's not happening,
and we silently go idle).
OR, mor probably, the two things are the SAME and there should not exist a case where
we did the "Up" initialization and connection attempt but we are not still attempting it.

* Moves test helper at te very bottom

* Addresses github comments

* No lock no copy

* Prevents engine not stopping within 10 secs from being paired by another instance

We instead juts SKIP updating the policy, so
1. the MDM ticker will kick in 1 minute time,
2. find the policy misaligned,
3. enter the onMDMPolicyChange,
4. find the s.clientRunning == true
   (because it is set to false only in server cleanupConnection,
   and not by s.actCancel())
5. call s.actCancel() again if not nil
6. immediately return from <-s.clientGiveUpChan
7. finally call s.restartEngineForMDMLocked()

* Since we ARE running there should be a config

If the config was cancelled midflight, connect will abort later on

* DisableAutoConnect should not stop a running connection.

DisableAutoConnect should just avoid the connection attempts *when the service starts*.
If we are started and we are up and running, DisableAutoConnect should not kick in.

Another PR will follow about this topic

* Removes unused vars

* Moves callback into Run method arg

* align comment to removal of DisableAutoConnect

DisableAutoConnect should just avoid the connection attempts *when the service starts*.
If we are started and we are up and running, DisableAutoConnect should not kick in

* Removes unused managed_fields data.

This was initially used to drive the UI but approach changed
to reload config/features upon notifications which makes this data redundant.

* Reorder stuff

* Unexport unrequired vars/functions

PoliciesEqual → policiesEqual
AllKeys → allKeys

* Adds list of MDM managed fields in the debug bundle
2026-06-12 12:28:49 +02:00

534 lines
17 KiB
Go

package cmd
import (
"context"
"fmt"
"os/user"
"strings"
"time"
log "github.com/sirupsen/logrus"
"github.com/spf13/cobra"
"google.golang.org/grpc/status"
"google.golang.org/protobuf/encoding/protojson"
"google.golang.org/protobuf/types/known/durationpb"
"github.com/netbirdio/netbird/client/internal"
"github.com/netbirdio/netbird/client/internal/debug"
"github.com/netbirdio/netbird/client/internal/peer"
"github.com/netbirdio/netbird/client/internal/profilemanager"
"github.com/netbirdio/netbird/client/proto"
"github.com/netbirdio/netbird/client/server"
mgmProto "github.com/netbirdio/netbird/shared/management/proto"
"github.com/netbirdio/netbird/upload-server/types"
"github.com/netbirdio/netbird/version"
)
const errCloseConnection = "Failed to close connection: %v"
var (
logFileCount uint32
systemInfoFlag bool
uploadBundleFlag bool
uploadBundleURLFlag string
)
var debugCmd = &cobra.Command{
Use: "debug",
Short: "Debugging commands",
Long: "Commands for debugging and logging within the NetBird daemon.",
}
var debugBundleCmd = &cobra.Command{
Use: "bundle",
Example: " netbird debug bundle",
Short: "Create a debug bundle",
Long: "Generates a compressed archive of the daemon's logs and status for debugging purposes.",
RunE: debugBundle,
}
var logCmd = &cobra.Command{
Use: "log",
Short: "Manage logging for the NetBird daemon",
Long: `Commands to manage logging settings for the NetBird daemon, including ICE, gRPC, and general log levels.`,
}
var logLevelCmd = &cobra.Command{
Use: "level <level>",
Short: "Set the logging level for this session",
Long: `Sets the logging level for the current session. This setting is temporary and will revert to the default on daemon restart.
Available log levels are:
panic: for panic level, highest level of severity
fatal: for fatal level errors that cause the program to exit
error: for error conditions
warn: for warning conditions
info: for informational messages
debug: for debug-level messages
trace: for trace-level messages, which include more fine-grained information than debug`,
Args: cobra.ExactArgs(1),
RunE: setLogLevel,
}
var forCmd = &cobra.Command{
Use: "for <time>",
Short: "Run debug logs for a specified duration and create a debug bundle",
Long: `Sets the logging level to trace, runs for the specified duration, and then generates a debug bundle.`,
Example: " netbird debug for 5m",
Args: cobra.ExactArgs(1),
RunE: runForDuration,
}
var persistenceCmd = &cobra.Command{
Use: "persistence [on|off]",
Short: "Set sync response memory persistence",
Long: `Configure whether the latest sync response should persist in memory. When enabled, the last known sync response will be kept in memory.`,
Example: " netbird debug persistence on",
Args: cobra.ExactArgs(1),
RunE: setSyncResponsePersistence,
}
var debugConfigCmd = &cobra.Command{
Use: "config",
Example: " netbird debug config",
Short: "Dump the effective configuration",
Long: "Prints the daemon's resolved configuration (after applying defaults, file, env, CLI input, and MDM policy overrides) as JSON. Includes the list of MDM-managed fields.",
RunE: debugConfigDump,
}
// debugConfigDump implements `netbird debug config`. It resolves the
// active profile, queries the daemon for the effective configuration
// via GetConfig, and prints the resulting GetConfigResponse as JSON
// (via protojson with EmitUnpopulated=true so the output is stable
// across runs and includes zero-valued fields).
//
// Useful for verifying MDM enforcement end-to-end: the response's
// mDMManagedFields array is the single source of truth for "which
// fields is the daemon currently enforcing from the MDM source", and
// every config field side-by-side with that list confirms the merge
// result. Secrets in the response (e.g. PreSharedKey) are already
// redacted by the daemon-side handler.
func debugConfigDump(cmd *cobra.Command, _ []string) error {
pm := profilemanager.NewProfileManager()
activeProf, err := pm.GetActiveProfile()
if err != nil {
return fmt.Errorf("get active profile: %v", err)
}
currUser, err := user.Current()
if err != nil {
return fmt.Errorf("get current user: %v", err)
}
conn, err := getClient(cmd)
if err != nil {
return err
}
defer func() {
if err := conn.Close(); err != nil {
log.Errorf(errCloseConnection, err)
}
}()
client := proto.NewDaemonServiceClient(conn)
resp, err := client.GetConfig(cmd.Context(), &proto.GetConfigRequest{
ProfileName: activeProf.Name,
Username: currUser.Username,
})
if err != nil {
return fmt.Errorf("failed to get config: %v", status.Convert(err).Message())
}
// Use protojson so well-known fields render correctly; emit defaults so
// the operator sees every field even when zero/empty.
m := protojson.MarshalOptions{Multiline: true, Indent: " ", EmitUnpopulated: true}
out, err := m.Marshal(resp)
if err != nil {
return fmt.Errorf("marshal config: %w", err)
}
cmd.Println(string(out))
return nil
}
// debugBundle requests the daemon to create a debug bundle and prints
// the resulting local file path and, if uploaded, the uploaded file
// key. It uses the package flags (anonymize, system info, log file
// count, CLI version, optional upload URL) to configure the bundle
// request. Returns an error if the RPC fails or if the daemon reports
// an upload failure reason.
func debugBundle(cmd *cobra.Command, _ []string) error {
conn, err := getClient(cmd)
if err != nil {
return err
}
defer func() {
if err := conn.Close(); err != nil {
log.Errorf(errCloseConnection, err)
}
}()
client := proto.NewDaemonServiceClient(conn)
request := &proto.DebugBundleRequest{
Anonymize: anonymizeFlag,
SystemInfo: systemInfoFlag,
LogFileCount: logFileCount,
CliVersion: version.NetbirdVersion(),
}
if uploadBundleFlag {
request.UploadURL = uploadBundleURLFlag
}
resp, err := client.DebugBundle(cmd.Context(), request)
if err != nil {
return fmt.Errorf("failed to bundle debug: %v", status.Convert(err).Message())
}
cmd.Printf("Local file:\n%s\n", resp.GetPath())
if resp.GetUploadFailureReason() != "" {
return fmt.Errorf("upload failed: %s", resp.GetUploadFailureReason())
}
if uploadBundleFlag {
cmd.Printf("Upload file key:\n%s\n", resp.GetUploadedKey())
}
return nil
}
func setLogLevel(cmd *cobra.Command, args []string) error {
conn, err := getClient(cmd)
if err != nil {
return err
}
defer func() {
if err := conn.Close(); err != nil {
log.Errorf(errCloseConnection, err)
}
}()
client := proto.NewDaemonServiceClient(conn)
level := server.ParseLogLevel(args[0])
if level == proto.LogLevel_UNKNOWN {
//nolint
return fmt.Errorf("unknown log level: %s. Available levels are: panic, fatal, error, warn, info, debug, trace\n", args[0])
}
_, err = client.SetLogLevel(cmd.Context(), &proto.SetLogLevelRequest{
Level: level,
})
if err != nil {
return fmt.Errorf("failed to set log level: %v", status.Convert(err).Message())
}
cmd.Println("Log level set successfully to", args[0])
return nil
}
func runForDuration(cmd *cobra.Command, args []string) error {
duration, err := time.ParseDuration(args[0])
if err != nil {
return fmt.Errorf("invalid duration format: %v", err)
}
conn, err := getClient(cmd)
if err != nil {
return err
}
defer func() {
if err := conn.Close(); err != nil {
log.Errorf(errCloseConnection, err)
}
}()
client := proto.NewDaemonServiceClient(conn)
stat, err := client.Status(cmd.Context(), &proto.StatusRequest{ShouldRunProbes: true})
if err != nil {
return fmt.Errorf("failed to get status: %v", status.Convert(err).Message())
}
stateWasDown := stat.Status != string(internal.StatusConnected) && stat.Status != string(internal.StatusConnecting)
initialLogLevel, err := client.GetLogLevel(cmd.Context(), &proto.GetLogLevelRequest{})
if err != nil {
return fmt.Errorf("failed to get log level: %v", status.Convert(err).Message())
}
if stateWasDown {
if _, err := client.Up(cmd.Context(), &proto.UpRequest{}); err != nil {
cmd.PrintErrf("Failed to bring service up: %v\n", status.Convert(err).Message())
} else {
cmd.Println("netbird up")
time.Sleep(time.Second * 10)
}
}
initialLevelTrace := initialLogLevel.GetLevel() >= proto.LogLevel_TRACE
if !initialLevelTrace {
_, err = client.SetLogLevel(cmd.Context(), &proto.SetLogLevelRequest{
Level: proto.LogLevel_TRACE,
})
if err != nil {
return fmt.Errorf("failed to set log level to TRACE: %v", status.Convert(err).Message())
}
cmd.Println("Log level set to trace.")
}
needsRestoreUp := false
if _, err := client.Down(cmd.Context(), &proto.DownRequest{}); err != nil {
cmd.PrintErrf("Failed to bring service down: %v\n", status.Convert(err).Message())
} else {
needsRestoreUp = !stateWasDown
cmd.Println("netbird down")
}
time.Sleep(1 * time.Second)
// Enable sync response persistence before bringing the service up
if _, err := client.SetSyncResponsePersistence(cmd.Context(), &proto.SetSyncResponsePersistenceRequest{
Enabled: true,
}); err != nil {
cmd.PrintErrf("Failed to enable sync response persistence: %v\n", status.Convert(err).Message())
}
if _, err := client.Up(cmd.Context(), &proto.UpRequest{}); err != nil {
cmd.PrintErrf("Failed to bring service up: %v\n", status.Convert(err).Message())
} else {
needsRestoreUp = false
cmd.Println("netbird up")
}
time.Sleep(3 * time.Second)
cpuProfilingStarted := false
if _, err := client.StartCPUProfile(cmd.Context(), &proto.StartCPUProfileRequest{}); err != nil {
cmd.PrintErrf("Failed to start CPU profiling: %v\n", err)
} else {
cpuProfilingStarted = true
defer func() {
if cpuProfilingStarted {
if _, err := client.StopCPUProfile(cmd.Context(), &proto.StopCPUProfileRequest{}); err != nil {
cmd.PrintErrf("Failed to stop CPU profiling: %v\n", err)
}
}
}()
}
captureStarted := false
if wantCapture, _ := cmd.Flags().GetBool("capture"); wantCapture {
captureTimeout := duration + 30*time.Second
const maxBundleCapture = 10 * time.Minute
if captureTimeout > maxBundleCapture {
captureTimeout = maxBundleCapture
}
_, err := client.StartBundleCapture(cmd.Context(), &proto.StartBundleCaptureRequest{
Timeout: durationpb.New(captureTimeout),
})
if err != nil {
cmd.PrintErrf("Failed to start packet capture: %v\n", status.Convert(err).Message())
} else {
captureStarted = true
cmd.Println("Packet capture started.")
// Safety: always stop on exit, even if the normal stop below runs too.
defer func() {
if captureStarted {
stopCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if _, err := client.StopBundleCapture(stopCtx, &proto.StopBundleCaptureRequest{}); err != nil {
cmd.PrintErrf("Failed to stop packet capture: %v\n", err)
}
}
}()
}
}
if waitErr := waitForDurationOrCancel(cmd.Context(), duration, cmd); waitErr != nil {
return waitErr
}
cmd.Println("\nDuration completed")
if captureStarted {
stopCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if _, err := client.StopBundleCapture(stopCtx, &proto.StopBundleCaptureRequest{}); err != nil {
cmd.PrintErrf("Failed to stop packet capture: %v\n", err)
} else {
captureStarted = false
cmd.Println("Packet capture stopped.")
}
}
if cpuProfilingStarted {
if _, err := client.StopCPUProfile(cmd.Context(), &proto.StopCPUProfileRequest{}); err != nil {
cmd.PrintErrf("Failed to stop CPU profiling: %v\n", err)
} else {
cpuProfilingStarted = false
}
}
cmd.Println("Creating debug bundle...")
request := &proto.DebugBundleRequest{
Anonymize: anonymizeFlag,
SystemInfo: systemInfoFlag,
LogFileCount: logFileCount,
CliVersion: version.NetbirdVersion(),
}
if uploadBundleFlag {
request.UploadURL = uploadBundleURLFlag
}
resp, err := client.DebugBundle(cmd.Context(), request)
if err != nil {
return fmt.Errorf("failed to bundle debug: %v", status.Convert(err).Message())
}
if needsRestoreUp {
if _, err := client.Up(cmd.Context(), &proto.UpRequest{}); err != nil {
cmd.PrintErrf("Failed to restore service up state: %v\n", status.Convert(err).Message())
} else {
cmd.Println("netbird up (restored)")
}
}
if stateWasDown {
if _, err := client.Down(cmd.Context(), &proto.DownRequest{}); err != nil {
cmd.PrintErrf("Failed to restore service down state: %v\n", status.Convert(err).Message())
} else {
cmd.Println("netbird down")
}
}
if !initialLevelTrace {
if _, err := client.SetLogLevel(cmd.Context(), &proto.SetLogLevelRequest{Level: initialLogLevel.GetLevel()}); err != nil {
cmd.PrintErrf("Failed to restore log level: %v\n", status.Convert(err).Message())
} else {
cmd.Println("Log level restored to", initialLogLevel.GetLevel())
}
}
cmd.Printf("Local file:\n%s\n", resp.GetPath())
if resp.GetUploadFailureReason() != "" {
return fmt.Errorf("upload failed: %s", resp.GetUploadFailureReason())
}
if uploadBundleFlag {
cmd.Printf("Upload file key:\n%s\n", resp.GetUploadedKey())
}
return nil
}
func setSyncResponsePersistence(cmd *cobra.Command, args []string) error {
conn, err := getClient(cmd)
if err != nil {
return err
}
defer func() {
if err := conn.Close(); err != nil {
log.Errorf(errCloseConnection, err)
}
}()
persistence := strings.ToLower(args[0])
if persistence != "on" && persistence != "off" {
return fmt.Errorf("invalid persistence value: %s. Use 'on' or 'off'", args[0])
}
client := proto.NewDaemonServiceClient(conn)
_, err = client.SetSyncResponsePersistence(cmd.Context(), &proto.SetSyncResponsePersistenceRequest{
Enabled: persistence == "on",
})
if err != nil {
return fmt.Errorf("failed to set sync response persistence: %v", status.Convert(err).Message())
}
cmd.Printf("Sync response persistence set to: %s\n", persistence)
return nil
}
func waitForDurationOrCancel(ctx context.Context, duration time.Duration, cmd *cobra.Command) error {
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
startTime := time.Now()
done := make(chan struct{})
go func() {
defer close(done)
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
elapsed := time.Since(startTime)
if elapsed >= duration {
return
}
remaining := duration - elapsed
cmd.Printf("\rRemaining time: %s", formatDuration(remaining))
}
}
}()
select {
case <-ctx.Done():
return ctx.Err()
case <-done:
return nil
}
}
func formatDuration(d time.Duration) string {
d = d.Round(time.Second)
h := d / time.Hour
d %= time.Hour
m := d / time.Minute
d %= time.Minute
s := d / time.Second
return fmt.Sprintf("%02d:%02d:%02d", h, m, s)
}
func generateDebugBundle(config *profilemanager.Config, recorder *peer.Status, connectClient *internal.ConnectClient, logFilePath string) {
var syncResponse *mgmProto.SyncResponse
var err error
if connectClient != nil {
syncResponse, err = connectClient.GetLatestSyncResponse()
if err != nil {
log.Warnf("Failed to get latest sync response: %v", err)
}
}
bundleGenerator := debug.NewBundleGenerator(
debug.GeneratorDependencies{
InternalConfig: config,
StatusRecorder: recorder,
SyncResponse: syncResponse,
LogPath: logFilePath,
CPUProfile: nil,
DaemonVersion: version.NetbirdVersion(), // acting as daemon
},
debug.BundleConfig{
IncludeSystemInfo: true,
},
)
path, err := bundleGenerator.Generate()
if err != nil {
log.Errorf("Failed to generate debug bundle: %v", err)
return
}
log.Infof("Generated debug bundle from SIGUSR1 at: %s", path)
}
func init() {
debugBundleCmd.Flags().Uint32VarP(&logFileCount, "log-file-count", "C", 1, "Number of rotated log files to include in debug bundle")
debugBundleCmd.Flags().BoolVarP(&systemInfoFlag, "system-info", "S", true, "Adds system information to the debug bundle")
debugBundleCmd.Flags().BoolVarP(&uploadBundleFlag, "upload-bundle", "U", false, "Uploads the debug bundle to a server")
debugBundleCmd.Flags().StringVar(&uploadBundleURLFlag, "upload-bundle-url", types.DefaultBundleURL, "Service URL to get an URL to upload the debug bundle")
forCmd.Flags().Uint32VarP(&logFileCount, "log-file-count", "C", 1, "Number of rotated log files to include in debug bundle")
forCmd.Flags().BoolVarP(&systemInfoFlag, "system-info", "S", true, "Adds system information to the debug bundle")
forCmd.Flags().BoolVarP(&uploadBundleFlag, "upload-bundle", "U", false, "Uploads the debug bundle to a server")
forCmd.Flags().StringVar(&uploadBundleURLFlag, "upload-bundle-url", types.DefaultBundleURL, "Service URL to get an URL to upload the debug bundle")
forCmd.Flags().Bool("capture", false, "Capture packets during the debug duration and include in bundle")
}