mirror of
https://github.com/netbirdio/netbird.git
synced 2026-03-31 06:24:18 -04:00
* Add client metrics * Add client metrics system with OpenTelemetry and VictoriaMetrics support Implements a comprehensive client metrics system to track peer connection stages and performance. The system supports multiple backend implementations (OpenTelemetry, VictoriaMetrics, and no-op) and tracks detailed connection stage durations from creation through WireGuard handshake. Key changes: - Add metrics package with pluggable backend implementations - Implement OpenTelemetry metrics backend - Implement VictoriaMetrics metrics backend - Add no-op metrics implementation for disabled state - Track connection stages: creation, semaphore, signaling, connection ready, and WireGuard handshake - Move WireGuard watcher functionality to conn.go - Refactor engine to integrate metrics tracking - Add metrics export endpoint in debug server * Add signaling metrics tracking for initial and reconnection attempts * Reset connection stage timestamps during reconnections to exclude unnecessary metrics tracking * Delete otel lib from client * Update unit tests * Invoke callback on handshake success in WireGuard watcher * Add Netbird version tracking to client metrics Integrate Netbird version into VictoriaMetrics backend and metrics labels. Update `ClientMetrics` constructor and metric name formatting to include version information. * Add sync duration tracking to client metrics Introduce `RecordSyncDuration` for measuring sync message processing time. Update all metrics implementations (VictoriaMetrics, no-op) to support the new method. Refactor `ClientMetrics` to use `AgentInfo` for static agent data. * Remove no-op metrics implementation and simplify ClientMetrics constructor Eliminate unused `noopMetrics` and refactor `ClientMetrics` to always use the VictoriaMetrics implementation. Update associated logic to reflect these changes. * Add total duration tracking for connection attempts Calculate total duration for both initial connections and reconnections, accounting for different timestamp scenarios. Update `Export` method to include Prometheus HELP comments. * Add metrics push support to VictoriaMetrics integration * [client] anchor connection metrics to first signal received * Remove creation_to_semaphore connection stage metric The semaphore queuing stage (Created → SemaphoreAcquired) is no longer tracked. Connection metrics now start from SignalingReceived. Updated docs and Grafana dashboard accordingly. * [client] Add remote push config for metrics with version-based eligibility Introduce remoteconfig.Manager that fetches a remote JSON config to control metrics push interval and restrict pushing to a specific agent version range. When NB_METRICS_INTERVAL is set, remote config is bypassed entirely for local override. * [client] Add WASM-compatible NewClientMetrics implementation Replace NewClientMetrics in metrics.go with a WASM-specific stub in metrics_js.go, returning nil for compatibility with JS builds. Simplify method usage for WASM targets. * Add missing file * Update default case in DeploymentType.String to return "unknown" instead of "selfhosted" * [client] Rework metrics to use timestamped samples instead of histograms Replace cumulative Prometheus histograms with timestamped point-in-time samples that are pushed once and cleared. This fixes metrics for sparse events (connections/syncs that happen once at startup) where rate() and increase() produced incorrect or empty results. Changes: - Switch from VictoriaMetrics histogram library to raw Prometheus text format with explicit millisecond timestamps - Reset samples after successful push (no resending stale data) - Rename connection_to_handshake → connection_to_wg_handshake - Add netbird_peer_connection_count metric for ICE vs Relay tracking - Simplify dashboard: point-based scatter plots, donut pie chart - Add maxStalenessInterval=1m to VictoriaMetrics to prevent forward-fill - Fix deployment_type Unknown returning "selfhosted" instead of "unknown" - Fix inverted shouldPush condition in push.go * [client] Add InfluxDB metrics backend alongside VictoriaMetrics Add influxdb.go with timestamped line protocol export for sparse one-shot events. Restore victoria.go to use proper Prometheus histograms. Update Grafana dashboards, add InfluxDB datasource, and update docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [client] Fix metrics issues and update dev docker setup - Fix StopPush not clearing push state, preventing restart - Fix race condition reading currentConnPriority without lock in recordConnectionMetrics - Fix stale comment referencing old metrics server URL - Update docker-compose for InfluxDB: add scoped tokens, .env config, init scripts - Rename docker-compose.victoria.yml to docker-compose.yml * [client] Add anonymised peer tracking to pushed metrics Introduce peer_id and connection_pair_id tags to InfluxDB metrics. Public keys are hashed (truncated SHA-256) for anonymisation. The connection pair ID is deterministic regardless of which side computes it, enabling deduplication of reconnections in the ICE vs Relay dashboard. Also pin Grafana to v11.6.0 for file-based provisioning and fix datasource UID references. * Remove unused dependencies from go.mod and go.sum * Refactor InfluxDB ingest pipeline: extract validation logic - Move line validation logic to `validateLine` and `validateField` helper functions. - Improve error handling with structured validation and clearer separation of concerns. - Add stderr redirection for error messages in `create-tokens.sh`. * Set non-root user in Dockerfile for Ingest service * Fix Windows CI: command line too long * Remove Victoria metrics * Add hashed peer ID as Authorization header in metrics push * Revert influxdb in docker compose * Enable gzip compression and authorization validation for metrics push and ingest * Reducate code of complexity * Update debug documentation to include metrics.txt description * Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping logic * Refactor deployment type detection to use URL parsing for improved accuracy * Update readme * Throttle remote config retries on fetch failure * Preserve first WG handshake timestamp, ignore rekeys * Skip adding empty metrics.txt to debug bundle in debug mode * Update default metrics server URL to https://ingest.netbird.io * Atomic metrics export-and-reset to prevent sample loss between Export and Reset calls * Fix doc * Refactor Push configuration to improve clarity and enforce minimum push interval * Remove `minPushInterval` and update push interval validation logic * Revert ExportAndReset, it is acceptable data loss * Fix metrics review issues: rename env var, remove stale infra, add tests - Rename NB_METRICS_ENABLED to NB_METRICS_PUSH_ENABLED to clarify that collection is always active (for debug bundles) and only push is opt-in - Change default config URL from staging to production (ingest.netbird.io) - Delete broken Prometheus dashboard (used non-existent metric names) - Delete unused VictoriaMetrics datasource config - Replace committed .env with .env.example containing placeholder values - Wire Grafana admin credentials through env vars in docker-compose - Make metricsStages a pointer to prevent reset-vs-write race on reconnect - Fix typed-nil interface in debug bundle path (GetClientMetrics) - Use deterministic field order in InfluxDB Export (sorted keys) - Replace Authorization header with X-Peer-ID for metrics push - Fix ingest server timeout to use time.Second instead of float - Fix gzip double-close, stale comments, trim log levels - Add tests for influxdb.go and MetricsStages * Add login duration metric, ingest tag validation, and duration bounds - Add netbird_login measurement recording login/auth duration to management server, with success/failure result tag - Validate InfluxDB tags against per-measurement allowlists in ingest server to prevent arbitrary tag injection - Cap all duration fields (*_seconds) at 300s instead of only total_seconds - Add ingest server tests for tag/field validation, bounds, and auth * Add arch tag to all metrics * Fix Grafana dashboard: add arch to drop columns, add login panels * Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL * Address review comments: fix README wording, update stale comments * Clarify env var precedence does not bypass remote config eligibility * Remove accidentally committed pprof files --------- Co-authored-by: Viktor Liu <viktor@netbird.io>
693 lines
21 KiB
Go
693 lines
21 KiB
Go
package internal
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
"fmt"
|
|
"net"
|
|
"net/netip"
|
|
"runtime"
|
|
"runtime/debug"
|
|
"strings"
|
|
"sync"
|
|
"time"
|
|
|
|
"github.com/cenkalti/backoff/v4"
|
|
log "github.com/sirupsen/logrus"
|
|
"golang.zx2c4.com/wireguard/wgctrl/wgtypes"
|
|
"google.golang.org/grpc/codes"
|
|
gstatus "google.golang.org/grpc/status"
|
|
|
|
"github.com/netbirdio/netbird/client/iface"
|
|
"github.com/netbirdio/netbird/client/iface/device"
|
|
"github.com/netbirdio/netbird/client/iface/netstack"
|
|
"github.com/netbirdio/netbird/client/internal/dns"
|
|
"github.com/netbirdio/netbird/client/internal/listener"
|
|
"github.com/netbirdio/netbird/client/internal/metrics"
|
|
"github.com/netbirdio/netbird/client/internal/peer"
|
|
"github.com/netbirdio/netbird/client/internal/profilemanager"
|
|
"github.com/netbirdio/netbird/client/internal/statemanager"
|
|
"github.com/netbirdio/netbird/client/internal/stdnet"
|
|
"github.com/netbirdio/netbird/client/internal/updater"
|
|
"github.com/netbirdio/netbird/client/internal/updater/installer"
|
|
nbnet "github.com/netbirdio/netbird/client/net"
|
|
cProto "github.com/netbirdio/netbird/client/proto"
|
|
"github.com/netbirdio/netbird/client/ssh"
|
|
sshconfig "github.com/netbirdio/netbird/client/ssh/config"
|
|
"github.com/netbirdio/netbird/client/system"
|
|
mgm "github.com/netbirdio/netbird/shared/management/client"
|
|
mgmProto "github.com/netbirdio/netbird/shared/management/proto"
|
|
"github.com/netbirdio/netbird/shared/relay/auth/hmac"
|
|
relayClient "github.com/netbirdio/netbird/shared/relay/client"
|
|
signal "github.com/netbirdio/netbird/shared/signal/client"
|
|
"github.com/netbirdio/netbird/util"
|
|
"github.com/netbirdio/netbird/version"
|
|
)
|
|
|
|
type ConnectClient struct {
|
|
ctx context.Context
|
|
config *profilemanager.Config
|
|
statusRecorder *peer.Status
|
|
|
|
engine *Engine
|
|
engineMutex sync.Mutex
|
|
clientMetrics *metrics.ClientMetrics
|
|
updateManager *updater.Manager
|
|
|
|
persistSyncResponse bool
|
|
}
|
|
|
|
func NewConnectClient(
|
|
ctx context.Context,
|
|
config *profilemanager.Config,
|
|
statusRecorder *peer.Status,
|
|
) *ConnectClient {
|
|
return &ConnectClient{
|
|
ctx: ctx,
|
|
config: config,
|
|
statusRecorder: statusRecorder,
|
|
engineMutex: sync.Mutex{},
|
|
}
|
|
}
|
|
|
|
func (c *ConnectClient) SetUpdateManager(um *updater.Manager) {
|
|
c.updateManager = um
|
|
}
|
|
|
|
// Run with main logic.
|
|
func (c *ConnectClient) Run(runningChan chan struct{}, logPath string) error {
|
|
return c.run(MobileDependency{}, runningChan, logPath)
|
|
}
|
|
|
|
// RunOnAndroid with main logic on mobile system
|
|
func (c *ConnectClient) RunOnAndroid(
|
|
tunAdapter device.TunAdapter,
|
|
iFaceDiscover stdnet.ExternalIFaceDiscover,
|
|
networkChangeListener listener.NetworkChangeListener,
|
|
dnsAddresses []netip.AddrPort,
|
|
dnsReadyListener dns.ReadyListener,
|
|
stateFilePath string,
|
|
) error {
|
|
// in case of non Android os these variables will be nil
|
|
mobileDependency := MobileDependency{
|
|
TunAdapter: tunAdapter,
|
|
IFaceDiscover: iFaceDiscover,
|
|
NetworkChangeListener: networkChangeListener,
|
|
HostDNSAddresses: dnsAddresses,
|
|
DnsReadyListener: dnsReadyListener,
|
|
StateFilePath: stateFilePath,
|
|
}
|
|
return c.run(mobileDependency, nil, "")
|
|
}
|
|
|
|
func (c *ConnectClient) RunOniOS(
|
|
fileDescriptor int32,
|
|
networkChangeListener listener.NetworkChangeListener,
|
|
dnsManager dns.IosDnsManager,
|
|
stateFilePath string,
|
|
) error {
|
|
// Set GC percent to 5% to reduce memory usage as iOS only allows 50MB of memory for the extension.
|
|
debug.SetGCPercent(5)
|
|
|
|
mobileDependency := MobileDependency{
|
|
FileDescriptor: fileDescriptor,
|
|
NetworkChangeListener: networkChangeListener,
|
|
DnsManager: dnsManager,
|
|
StateFilePath: stateFilePath,
|
|
}
|
|
return c.run(mobileDependency, nil, "")
|
|
}
|
|
|
|
func (c *ConnectClient) run(mobileDependency MobileDependency, runningChan chan struct{}, logPath string) error {
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
rec := c.statusRecorder
|
|
if rec != nil {
|
|
rec.PublishEvent(
|
|
cProto.SystemEvent_CRITICAL, cProto.SystemEvent_SYSTEM,
|
|
"panic occurred",
|
|
"The Netbird service panicked. Please restart the service and submit a bug report with the client logs.",
|
|
nil,
|
|
)
|
|
}
|
|
|
|
log.Panicf("Panic occurred: %v, stack trace: %s", r, string(debug.Stack()))
|
|
}
|
|
}()
|
|
|
|
// Stop metrics push on exit
|
|
defer func() {
|
|
if c.clientMetrics != nil {
|
|
c.clientMetrics.StopPush()
|
|
}
|
|
}()
|
|
|
|
log.Infof("starting NetBird client version %s on %s/%s", version.NetbirdVersion(), runtime.GOOS, runtime.GOARCH)
|
|
|
|
nbnet.Init()
|
|
|
|
// Initialize metrics once at startup (always active for debug bundles)
|
|
if c.clientMetrics == nil {
|
|
agentInfo := metrics.AgentInfo{
|
|
DeploymentType: metrics.DeploymentTypeUnknown,
|
|
Version: version.NetbirdVersion(),
|
|
OS: runtime.GOOS,
|
|
Arch: runtime.GOARCH,
|
|
}
|
|
c.clientMetrics = metrics.NewClientMetrics(agentInfo)
|
|
log.Debugf("initialized client metrics")
|
|
|
|
// Start metrics push if enabled (uses daemon context, persists across engine restarts)
|
|
if metrics.IsMetricsPushEnabled() {
|
|
c.clientMetrics.StartPush(c.ctx, metrics.PushConfigFromEnv())
|
|
}
|
|
}
|
|
|
|
backOff := &backoff.ExponentialBackOff{
|
|
InitialInterval: time.Second,
|
|
RandomizationFactor: 1,
|
|
Multiplier: 1.7,
|
|
MaxInterval: 15 * time.Second,
|
|
MaxElapsedTime: 3 * 30 * 24 * time.Hour, // 3 months
|
|
Stop: backoff.Stop,
|
|
Clock: backoff.SystemClock,
|
|
}
|
|
|
|
state := CtxGetState(c.ctx)
|
|
defer func() {
|
|
s, err := state.Status()
|
|
if err != nil || s != StatusNeedsLogin {
|
|
state.Set(StatusIdle)
|
|
}
|
|
}()
|
|
|
|
wrapErr := state.Wrap
|
|
myPrivateKey, err := wgtypes.ParseKey(c.config.PrivateKey)
|
|
if err != nil {
|
|
log.Errorf("failed parsing Wireguard key %s: [%s]", c.config.PrivateKey, err.Error())
|
|
return wrapErr(err)
|
|
}
|
|
|
|
var mgmTlsEnabled bool
|
|
if c.config.ManagementURL.Scheme == "https" {
|
|
mgmTlsEnabled = true
|
|
}
|
|
|
|
publicSSHKey, err := ssh.GeneratePublicKey([]byte(c.config.SSHKey))
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
var path string
|
|
if runtime.GOOS == "ios" || runtime.GOOS == "android" {
|
|
// On mobile, use the provided state file path directly
|
|
if !fileExists(mobileDependency.StateFilePath) {
|
|
if err := createFile(mobileDependency.StateFilePath); err != nil {
|
|
log.Errorf("failed to create state file: %v", err)
|
|
// we are not exiting as we can run without the state manager
|
|
}
|
|
}
|
|
path = mobileDependency.StateFilePath
|
|
} else {
|
|
sm := profilemanager.NewServiceManager("")
|
|
path = sm.GetStatePath()
|
|
}
|
|
stateManager := statemanager.New(path)
|
|
stateManager.RegisterState(&sshconfig.ShutdownState{})
|
|
|
|
if c.updateManager != nil {
|
|
c.updateManager.CheckUpdateSuccess(c.ctx)
|
|
}
|
|
|
|
inst := installer.New()
|
|
if err := inst.CleanUpInstallerFiles(); err != nil {
|
|
log.Errorf("failed to clean up temporary installer file: %v", err)
|
|
}
|
|
|
|
defer c.statusRecorder.ClientStop()
|
|
operation := func() error {
|
|
// if context cancelled we not start new backoff cycle
|
|
if c.ctx.Err() != nil {
|
|
return nil
|
|
}
|
|
|
|
state.Set(StatusConnecting)
|
|
|
|
engineCtx, cancel := context.WithCancel(c.ctx)
|
|
defer func() {
|
|
_, err := state.Status()
|
|
c.statusRecorder.MarkManagementDisconnected(err)
|
|
c.statusRecorder.CleanLocalPeerState()
|
|
cancel()
|
|
}()
|
|
|
|
log.Debugf("connecting to the Management service %s", c.config.ManagementURL.Host)
|
|
mgmClient, err := mgm.NewClient(engineCtx, c.config.ManagementURL.Host, myPrivateKey, mgmTlsEnabled)
|
|
if err != nil {
|
|
return wrapErr(gstatus.Errorf(codes.FailedPrecondition, "failed connecting to Management Service : %s", err))
|
|
}
|
|
mgmNotifier := statusRecorderToMgmConnStateNotifier(c.statusRecorder)
|
|
mgmClient.SetConnStateListener(mgmNotifier)
|
|
|
|
// Update metrics with actual deployment type after connection
|
|
deploymentType := metrics.DetermineDeploymentType(mgmClient.GetServerURL())
|
|
agentInfo := metrics.AgentInfo{
|
|
DeploymentType: deploymentType,
|
|
Version: version.NetbirdVersion(),
|
|
OS: runtime.GOOS,
|
|
Arch: runtime.GOARCH,
|
|
}
|
|
c.clientMetrics.UpdateAgentInfo(agentInfo, myPrivateKey.PublicKey().String())
|
|
|
|
log.Debugf("connected to the Management service %s", c.config.ManagementURL.Host)
|
|
defer func() {
|
|
if err = mgmClient.Close(); err != nil {
|
|
log.Warnf("failed to close the Management service client %v", err)
|
|
}
|
|
}()
|
|
|
|
// connect (just a connection, no stream yet) and login to Management Service to get an initial global Netbird config
|
|
loginStarted := time.Now()
|
|
loginResp, err := loginToManagement(engineCtx, mgmClient, publicSSHKey, c.config)
|
|
if err != nil {
|
|
c.clientMetrics.RecordLoginDuration(engineCtx, time.Since(loginStarted), false)
|
|
log.Debug(err)
|
|
if s, ok := gstatus.FromError(err); ok && (s.Code() == codes.PermissionDenied) {
|
|
state.Set(StatusNeedsLogin)
|
|
_ = c.Stop()
|
|
return backoff.Permanent(wrapErr(err)) // unrecoverable error
|
|
}
|
|
return wrapErr(err)
|
|
}
|
|
c.clientMetrics.RecordLoginDuration(engineCtx, time.Since(loginStarted), true)
|
|
c.statusRecorder.MarkManagementConnected()
|
|
|
|
localPeerState := peer.LocalPeerState{
|
|
IP: loginResp.GetPeerConfig().GetAddress(),
|
|
PubKey: myPrivateKey.PublicKey().String(),
|
|
KernelInterface: device.WireGuardModuleIsLoaded() && !netstack.IsEnabled(),
|
|
FQDN: loginResp.GetPeerConfig().GetFqdn(),
|
|
}
|
|
c.statusRecorder.UpdateLocalPeerState(localPeerState)
|
|
|
|
signalURL := fmt.Sprintf("%s://%s",
|
|
strings.ToLower(loginResp.GetNetbirdConfig().GetSignal().GetProtocol().String()),
|
|
loginResp.GetNetbirdConfig().GetSignal().GetUri(),
|
|
)
|
|
|
|
c.statusRecorder.UpdateSignalAddress(signalURL)
|
|
|
|
c.statusRecorder.MarkSignalDisconnected(nil)
|
|
defer func() {
|
|
_, err := state.Status()
|
|
c.statusRecorder.MarkSignalDisconnected(err)
|
|
}()
|
|
|
|
// with the global Netbird config in hand connect (just a connection, no stream yet) Signal
|
|
signalClient, err := connectToSignal(engineCtx, loginResp.GetNetbirdConfig(), myPrivateKey)
|
|
if err != nil {
|
|
log.Error(err)
|
|
return wrapErr(err)
|
|
}
|
|
defer func() {
|
|
err = signalClient.Close()
|
|
if err != nil {
|
|
log.Warnf("failed closing Signal service client %v", err)
|
|
}
|
|
}()
|
|
|
|
signalNotifier := statusRecorderToSignalConnStateNotifier(c.statusRecorder)
|
|
signalClient.SetConnStateListener(signalNotifier)
|
|
|
|
c.statusRecorder.MarkSignalConnected()
|
|
|
|
relayURLs, token := parseRelayInfo(loginResp)
|
|
peerConfig := loginResp.GetPeerConfig()
|
|
|
|
engineConfig, err := createEngineConfig(myPrivateKey, c.config, peerConfig, logPath)
|
|
if err != nil {
|
|
log.Error(err)
|
|
return wrapErr(err)
|
|
}
|
|
|
|
relayManager := relayClient.NewManager(engineCtx, relayURLs, myPrivateKey.PublicKey().String(), engineConfig.MTU)
|
|
c.statusRecorder.SetRelayMgr(relayManager)
|
|
if len(relayURLs) > 0 {
|
|
if token != nil {
|
|
if err := relayManager.UpdateToken(token); err != nil {
|
|
log.Errorf("failed to update token: %s", err)
|
|
return wrapErr(err)
|
|
}
|
|
}
|
|
log.Infof("connecting to the Relay service(s): %s", strings.Join(relayURLs, ", "))
|
|
if err = relayManager.Serve(); err != nil {
|
|
log.Error(err)
|
|
}
|
|
}
|
|
|
|
checks := loginResp.GetChecks()
|
|
|
|
c.engineMutex.Lock()
|
|
engine := NewEngine(engineCtx, cancel, engineConfig, EngineServices{
|
|
SignalClient: signalClient,
|
|
MgmClient: mgmClient,
|
|
RelayManager: relayManager,
|
|
StatusRecorder: c.statusRecorder,
|
|
Checks: checks,
|
|
StateManager: stateManager,
|
|
UpdateManager: c.updateManager,
|
|
ClientMetrics: c.clientMetrics,
|
|
}, mobileDependency)
|
|
engine.SetSyncResponsePersistence(c.persistSyncResponse)
|
|
c.engine = engine
|
|
c.engineMutex.Unlock()
|
|
|
|
if err := engine.Start(loginResp.GetNetbirdConfig(), c.config.ManagementURL); err != nil {
|
|
log.Errorf("error while starting Netbird Connection Engine: %s", err)
|
|
return wrapErr(err)
|
|
}
|
|
|
|
log.Infof("Netbird engine started, the IP is: %s", peerConfig.GetAddress())
|
|
state.Set(StatusConnected)
|
|
|
|
if runningChan != nil {
|
|
select {
|
|
case <-runningChan:
|
|
default:
|
|
close(runningChan)
|
|
}
|
|
}
|
|
|
|
<-engineCtx.Done()
|
|
|
|
c.engineMutex.Lock()
|
|
c.engine = nil
|
|
c.engineMutex.Unlock()
|
|
|
|
// todo: consider to remove this condition. Is not thread safe.
|
|
// We should always call Stop(), but we need to verify that it is idempotent
|
|
if engine.wgInterface != nil {
|
|
log.Infof("ensuring %s is removed, Netbird engine context cancelled", engine.wgInterface.Name())
|
|
|
|
if err := engine.Stop(); err != nil {
|
|
log.Errorf("Failed to stop engine: %v", err)
|
|
}
|
|
}
|
|
c.statusRecorder.ClientTeardown()
|
|
|
|
backOff.Reset()
|
|
|
|
log.Info("stopped NetBird client")
|
|
|
|
if _, err := state.Status(); errors.Is(err, ErrResetConnection) {
|
|
return err
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
c.statusRecorder.ClientStart()
|
|
err = backoff.Retry(operation, backOff)
|
|
if err != nil {
|
|
log.Debugf("exiting client retry loop due to unrecoverable error: %s", err)
|
|
if s, ok := gstatus.FromError(err); ok && (s.Code() == codes.PermissionDenied) {
|
|
state.Set(StatusNeedsLogin)
|
|
_ = c.Stop()
|
|
}
|
|
return err
|
|
}
|
|
return nil
|
|
}
|
|
|
|
func parseRelayInfo(loginResp *mgmProto.LoginResponse) ([]string, *hmac.Token) {
|
|
relayCfg := loginResp.GetNetbirdConfig().GetRelay()
|
|
if relayCfg == nil {
|
|
return nil, nil
|
|
}
|
|
|
|
token := &hmac.Token{
|
|
Payload: relayCfg.GetTokenPayload(),
|
|
Signature: relayCfg.GetTokenSignature(),
|
|
}
|
|
|
|
return relayCfg.GetUrls(), token
|
|
}
|
|
|
|
func (c *ConnectClient) Engine() *Engine {
|
|
if c == nil {
|
|
return nil
|
|
}
|
|
var e *Engine
|
|
c.engineMutex.Lock()
|
|
e = c.engine
|
|
c.engineMutex.Unlock()
|
|
return e
|
|
}
|
|
|
|
// GetLatestSyncResponse returns the latest sync response from the engine.
|
|
func (c *ConnectClient) GetLatestSyncResponse() (*mgmProto.SyncResponse, error) {
|
|
engine := c.Engine()
|
|
if engine == nil {
|
|
return nil, errors.New("engine is not initialized")
|
|
}
|
|
|
|
syncResponse, err := engine.GetLatestSyncResponse()
|
|
if err != nil {
|
|
return nil, fmt.Errorf("get latest sync response: %w", err)
|
|
}
|
|
|
|
if syncResponse == nil {
|
|
return nil, errors.New("sync response is not available")
|
|
}
|
|
|
|
return syncResponse, nil
|
|
}
|
|
|
|
// SetLogLevel sets the log level for the firewall manager if the engine is running.
|
|
func (c *ConnectClient) SetLogLevel(level log.Level) {
|
|
engine := c.Engine()
|
|
if engine == nil {
|
|
return
|
|
}
|
|
|
|
fwManager := engine.GetFirewallManager()
|
|
if fwManager != nil {
|
|
fwManager.SetLogLevel(level)
|
|
}
|
|
}
|
|
|
|
// Status returns the current client status
|
|
func (c *ConnectClient) Status() StatusType {
|
|
if c == nil {
|
|
return StatusIdle
|
|
}
|
|
status, err := CtxGetState(c.ctx).Status()
|
|
if err != nil {
|
|
return StatusIdle
|
|
}
|
|
|
|
return status
|
|
}
|
|
|
|
func (c *ConnectClient) Stop() error {
|
|
engine := c.Engine()
|
|
if engine != nil {
|
|
if err := engine.Stop(); err != nil {
|
|
return fmt.Errorf("stop engine: %w", err)
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// SetSyncResponsePersistence enables or disables sync response persistence.
|
|
// When enabled, the last received sync response will be stored and can be retrieved
|
|
// through the Engine's GetLatestSyncResponse method. When disabled, any stored
|
|
// sync response will be cleared.
|
|
func (c *ConnectClient) SetSyncResponsePersistence(enabled bool) {
|
|
c.engineMutex.Lock()
|
|
c.persistSyncResponse = enabled
|
|
c.engineMutex.Unlock()
|
|
|
|
engine := c.Engine()
|
|
if engine != nil {
|
|
engine.SetSyncResponsePersistence(enabled)
|
|
}
|
|
}
|
|
|
|
// createEngineConfig converts configuration received from Management Service to EngineConfig
|
|
func createEngineConfig(key wgtypes.Key, config *profilemanager.Config, peerConfig *mgmProto.PeerConfig, logPath string) (*EngineConfig, error) {
|
|
nm := false
|
|
if config.NetworkMonitor != nil {
|
|
nm = *config.NetworkMonitor
|
|
}
|
|
engineConf := &EngineConfig{
|
|
WgIfaceName: config.WgIface,
|
|
WgAddr: peerConfig.Address,
|
|
IFaceBlackList: config.IFaceBlackList,
|
|
DisableIPv6Discovery: config.DisableIPv6Discovery,
|
|
WgPrivateKey: key,
|
|
WgPort: config.WgPort,
|
|
NetworkMonitor: nm,
|
|
SSHKey: []byte(config.SSHKey),
|
|
NATExternalIPs: config.NATExternalIPs,
|
|
CustomDNSAddress: config.CustomDNSAddress,
|
|
RosenpassEnabled: config.RosenpassEnabled,
|
|
RosenpassPermissive: config.RosenpassPermissive,
|
|
ServerSSHAllowed: util.ReturnBoolWithDefaultTrue(config.ServerSSHAllowed),
|
|
EnableSSHRoot: config.EnableSSHRoot,
|
|
EnableSSHSFTP: config.EnableSSHSFTP,
|
|
EnableSSHLocalPortForwarding: config.EnableSSHLocalPortForwarding,
|
|
EnableSSHRemotePortForwarding: config.EnableSSHRemotePortForwarding,
|
|
DisableSSHAuth: config.DisableSSHAuth,
|
|
DNSRouteInterval: config.DNSRouteInterval,
|
|
|
|
DisableClientRoutes: config.DisableClientRoutes,
|
|
DisableServerRoutes: config.DisableServerRoutes || config.BlockInbound,
|
|
DisableDNS: config.DisableDNS,
|
|
DisableFirewall: config.DisableFirewall,
|
|
BlockLANAccess: config.BlockLANAccess,
|
|
BlockInbound: config.BlockInbound,
|
|
|
|
LazyConnectionEnabled: config.LazyConnectionEnabled,
|
|
|
|
MTU: selectMTU(config.MTU, peerConfig.Mtu),
|
|
LogPath: logPath,
|
|
|
|
ProfileConfig: config,
|
|
}
|
|
|
|
if config.PreSharedKey != "" {
|
|
preSharedKey, err := wgtypes.ParseKey(config.PreSharedKey)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
engineConf.PreSharedKey = &preSharedKey
|
|
}
|
|
|
|
port, err := freePort(config.WgPort)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
if port != config.WgPort {
|
|
log.Infof("using %d as wireguard port: %d is in use", port, config.WgPort)
|
|
}
|
|
engineConf.WgPort = port
|
|
|
|
return engineConf, nil
|
|
}
|
|
|
|
func selectMTU(localMTU uint16, peerMTU int32) uint16 {
|
|
var finalMTU uint16 = iface.DefaultMTU
|
|
if localMTU > 0 {
|
|
finalMTU = localMTU
|
|
} else if peerMTU > 0 {
|
|
finalMTU = uint16(peerMTU)
|
|
}
|
|
|
|
// Set global DNS MTU
|
|
dns.SetCurrentMTU(finalMTU)
|
|
|
|
return finalMTU
|
|
}
|
|
|
|
// connectToSignal creates Signal Service client and established a connection
|
|
func connectToSignal(ctx context.Context, wtConfig *mgmProto.NetbirdConfig, ourPrivateKey wgtypes.Key) (*signal.GrpcClient, error) {
|
|
var sigTLSEnabled bool
|
|
if wtConfig.Signal.Protocol == mgmProto.HostConfig_HTTPS {
|
|
sigTLSEnabled = true
|
|
} else {
|
|
sigTLSEnabled = false
|
|
}
|
|
|
|
signalClient, err := signal.NewClient(ctx, wtConfig.Signal.Uri, ourPrivateKey, sigTLSEnabled)
|
|
if err != nil {
|
|
log.Errorf("error while connecting to the Signal Exchange Service %s: %s", wtConfig.Signal.Uri, err)
|
|
return nil, gstatus.Errorf(codes.FailedPrecondition, "failed connecting to Signal Service : %s", err)
|
|
}
|
|
|
|
return signalClient, nil
|
|
}
|
|
|
|
// loginToManagement creates Management ServiceDependencies client, establishes a connection, logs-in and gets a global Netbird config (signal, turn, stun hosts, etc)
|
|
func loginToManagement(ctx context.Context, client mgm.Client, pubSSHKey []byte, config *profilemanager.Config) (*mgmProto.LoginResponse, error) {
|
|
|
|
serverPublicKey, err := client.GetServerPublicKey()
|
|
if err != nil {
|
|
return nil, gstatus.Errorf(codes.FailedPrecondition, "failed while getting Management Service public key: %s", err)
|
|
}
|
|
|
|
sysInfo := system.GetInfo(ctx)
|
|
sysInfo.SetFlags(
|
|
config.RosenpassEnabled,
|
|
config.RosenpassPermissive,
|
|
config.ServerSSHAllowed,
|
|
config.DisableClientRoutes,
|
|
config.DisableServerRoutes,
|
|
config.DisableDNS,
|
|
config.DisableFirewall,
|
|
config.BlockLANAccess,
|
|
config.BlockInbound,
|
|
config.LazyConnectionEnabled,
|
|
config.EnableSSHRoot,
|
|
config.EnableSSHSFTP,
|
|
config.EnableSSHLocalPortForwarding,
|
|
config.EnableSSHRemotePortForwarding,
|
|
config.DisableSSHAuth,
|
|
)
|
|
loginResp, err := client.Login(*serverPublicKey, sysInfo, pubSSHKey, config.DNSLabels)
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
|
|
return loginResp, nil
|
|
}
|
|
|
|
func statusRecorderToMgmConnStateNotifier(statusRecorder *peer.Status) mgm.ConnStateNotifier {
|
|
var sri interface{} = statusRecorder
|
|
mgmNotifier, _ := sri.(mgm.ConnStateNotifier)
|
|
return mgmNotifier
|
|
}
|
|
|
|
func statusRecorderToSignalConnStateNotifier(statusRecorder *peer.Status) signal.ConnStateNotifier {
|
|
var sri interface{} = statusRecorder
|
|
notifier, _ := sri.(signal.ConnStateNotifier)
|
|
return notifier
|
|
}
|
|
|
|
// freePort attempts to determine if the provided port is available, if not it will ask the system for a free port.
|
|
func freePort(initPort int) (int, error) {
|
|
addr := net.UDPAddr{Port: initPort}
|
|
|
|
conn, err := net.ListenUDP("udp", &addr)
|
|
if err == nil {
|
|
returnPort := conn.LocalAddr().(*net.UDPAddr).Port
|
|
closeConnWithLog(conn)
|
|
return returnPort, nil
|
|
}
|
|
|
|
// if the port is already in use, ask the system for a free port
|
|
addr.Port = 0
|
|
conn, err = net.ListenUDP("udp", &addr)
|
|
if err != nil {
|
|
return 0, fmt.Errorf("unable to get a free port: %v", err)
|
|
}
|
|
|
|
udpAddr, ok := conn.LocalAddr().(*net.UDPAddr)
|
|
if !ok {
|
|
return 0, errors.New("wrong address type when getting a free port")
|
|
}
|
|
closeConnWithLog(conn)
|
|
return udpAddr.Port, nil
|
|
}
|
|
|
|
func closeConnWithLog(conn *net.UDPConn) {
|
|
startClosing := time.Now()
|
|
err := conn.Close()
|
|
if err != nil {
|
|
log.Warnf("closing probe port %d failed: %v. NetBird will still attempt to use this port for connection.", conn.LocalAddr().(*net.UDPAddr).Port, err)
|
|
}
|
|
if time.Since(startClosing) > time.Second {
|
|
log.Warnf("closing the testing port %d took %s. Usually it is safe to ignore, but continuous warnings may indicate a problem.", conn.LocalAddr().(*net.UDPAddr).Port, time.Since(startClosing))
|
|
}
|
|
}
|