netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-08-03 05:00:07 -04:00

Author	SHA1	Message	Date
Viktor Liu	145d82f322	[client] Replace iOS DNS IsPrivate heuristic with route manager check (#5694 ) v0.67.1	2026-03-26 18:11:05 +08:00
Viktor Liu	a8b9570700	[client] Enable RPM package signature verification in install script (#5676 )	2026-03-26 09:50:43 +01:00
Viktor Liu	6ff6d84646	[client] Bump go-m1cpu to v0.2.1 to fix segfault on macOS 26 / M5 chips (#5701 )	2026-03-26 09:49:02 +01:00
Viktor Liu	9aaa05e8ea	Replace discontinued LocalStack image with MinIO in S3 test (#5680 )	2026-03-25 15:51:29 +08:00
Bethuel Mmbaga	0af5a0441f	[management] Fix DNS label uniqueness check on peer rename (#5679 )	2026-03-24 20:25:29 +03:00
Viktor Liu	0fc63ea0ba	[management] Allow multiple header auths with same header name (#5678 )	2026-03-24 16:18:21 +01:00
Bethuel Mmbaga	0b329f7881	[management] Replace JumpCloud SDK with direct HTTP calls (#5591 )	2026-03-24 13:21:42 +03:00
Viktor Liu	5b85edb753	[management] Omit proxy_protocol from API response when false (#5656 ) The internal Target model uses a plain bool for ProxyProtocol, which was always serialized to the API response as false even when not configured. Only set the API field when true so it gets omitted via omitempty when unset.	2026-03-23 17:53:17 +01:00
Maycon Santos	17cfa5fe1e	[misc] Set signing env only if not fork and set license (#5659 ) * Add condition to GPG key decoding to handle pull requests * Add license field to deb and rpm package configurations * Add condition to GPG key decoding for external pull requests	2026-03-23 17:16:23 +01:00
Viktor Liu	2313494e0e	[client] Don't abort debug for command when up/down fails (#5657 )	2026-03-23 14:04:03 +01:00
Viktor Liu	fd9d430334	[client] Simplify entrypoint by running netbird up unconditionally (#5652 ) v0.67.0	2026-03-23 09:39:32 +01:00
Zoltan Papp	91f0d5cefd	[client] Feature/client metrics (#5512 ) * Add client metrics * Add client metrics system with OpenTelemetry and VictoriaMetrics support Implements a comprehensive client metrics system to track peer connection stages and performance. The system supports multiple backend implementations (OpenTelemetry, VictoriaMetrics, and no-op) and tracks detailed connection stage durations from creation through WireGuard handshake. Key changes: - Add metrics package with pluggable backend implementations - Implement OpenTelemetry metrics backend - Implement VictoriaMetrics metrics backend - Add no-op metrics implementation for disabled state - Track connection stages: creation, semaphore, signaling, connection ready, and WireGuard handshake - Move WireGuard watcher functionality to conn.go - Refactor engine to integrate metrics tracking - Add metrics export endpoint in debug server * Add signaling metrics tracking for initial and reconnection attempts * Reset connection stage timestamps during reconnections to exclude unnecessary metrics tracking * Delete otel lib from client * Update unit tests * Invoke callback on handshake success in WireGuard watcher * Add Netbird version tracking to client metrics Integrate Netbird version into VictoriaMetrics backend and metrics labels. Update `ClientMetrics` constructor and metric name formatting to include version information. * Add sync duration tracking to client metrics Introduce `RecordSyncDuration` for measuring sync message processing time. Update all metrics implementations (VictoriaMetrics, no-op) to support the new method. Refactor `ClientMetrics` to use `AgentInfo` for static agent data. * Remove no-op metrics implementation and simplify ClientMetrics constructor Eliminate unused `noopMetrics` and refactor `ClientMetrics` to always use the VictoriaMetrics implementation. Update associated logic to reflect these changes. * Add total duration tracking for connection attempts Calculate total duration for both initial connections and reconnections, accounting for different timestamp scenarios. Update `Export` method to include Prometheus HELP comments. * Add metrics push support to VictoriaMetrics integration * [client] anchor connection metrics to first signal received * Remove creation_to_semaphore connection stage metric The semaphore queuing stage (Created → SemaphoreAcquired) is no longer tracked. Connection metrics now start from SignalingReceived. Updated docs and Grafana dashboard accordingly. * [client] Add remote push config for metrics with version-based eligibility Introduce remoteconfig.Manager that fetches a remote JSON config to control metrics push interval and restrict pushing to a specific agent version range. When NB_METRICS_INTERVAL is set, remote config is bypassed entirely for local override. * [client] Add WASM-compatible NewClientMetrics implementation Replace NewClientMetrics in metrics.go with a WASM-specific stub in metrics_js.go, returning nil for compatibility with JS builds. Simplify method usage for WASM targets. * Add missing file * Update default case in DeploymentType.String to return "unknown" instead of "selfhosted" * [client] Rework metrics to use timestamped samples instead of histograms Replace cumulative Prometheus histograms with timestamped point-in-time samples that are pushed once and cleared. This fixes metrics for sparse events (connections/syncs that happen once at startup) where rate() and increase() produced incorrect or empty results. Changes: - Switch from VictoriaMetrics histogram library to raw Prometheus text format with explicit millisecond timestamps - Reset samples after successful push (no resending stale data) - Rename connection_to_handshake → connection_to_wg_handshake - Add netbird_peer_connection_count metric for ICE vs Relay tracking - Simplify dashboard: point-based scatter plots, donut pie chart - Add maxStalenessInterval=1m to VictoriaMetrics to prevent forward-fill - Fix deployment_type Unknown returning "selfhosted" instead of "unknown" - Fix inverted shouldPush condition in push.go * [client] Add InfluxDB metrics backend alongside VictoriaMetrics Add influxdb.go with timestamped line protocol export for sparse one-shot events. Restore victoria.go to use proper Prometheus histograms. Update Grafana dashboards, add InfluxDB datasource, and update docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [client] Fix metrics issues and update dev docker setup - Fix StopPush not clearing push state, preventing restart - Fix race condition reading currentConnPriority without lock in recordConnectionMetrics - Fix stale comment referencing old metrics server URL - Update docker-compose for InfluxDB: add scoped tokens, .env config, init scripts - Rename docker-compose.victoria.yml to docker-compose.yml * [client] Add anonymised peer tracking to pushed metrics Introduce peer_id and connection_pair_id tags to InfluxDB metrics. Public keys are hashed (truncated SHA-256) for anonymisation. The connection pair ID is deterministic regardless of which side computes it, enabling deduplication of reconnections in the ICE vs Relay dashboard. Also pin Grafana to v11.6.0 for file-based provisioning and fix datasource UID references. * Remove unused dependencies from go.mod and go.sum * Refactor InfluxDB ingest pipeline: extract validation logic - Move line validation logic to `validateLine` and `validateField` helper functions. - Improve error handling with structured validation and clearer separation of concerns. - Add stderr redirection for error messages in `create-tokens.sh`. * Set non-root user in Dockerfile for Ingest service * Fix Windows CI: command line too long * Remove Victoria metrics * Add hashed peer ID as Authorization header in metrics push * Revert influxdb in docker compose * Enable gzip compression and authorization validation for metrics push and ingest * Reducate code of complexity * Update debug documentation to include metrics.txt description * Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping logic * Refactor deployment type detection to use URL parsing for improved accuracy * Update readme * Throttle remote config retries on fetch failure * Preserve first WG handshake timestamp, ignore rekeys * Skip adding empty metrics.txt to debug bundle in debug mode * Update default metrics server URL to https://ingest.netbird.io * Atomic metrics export-and-reset to prevent sample loss between Export and Reset calls * Fix doc * Refactor Push configuration to improve clarity and enforce minimum push interval * Remove `minPushInterval` and update push interval validation logic * Revert ExportAndReset, it is acceptable data loss * Fix metrics review issues: rename env var, remove stale infra, add tests - Rename NB_METRICS_ENABLED to NB_METRICS_PUSH_ENABLED to clarify that collection is always active (for debug bundles) and only push is opt-in - Change default config URL from staging to production (ingest.netbird.io) - Delete broken Prometheus dashboard (used non-existent metric names) - Delete unused VictoriaMetrics datasource config - Replace committed .env with .env.example containing placeholder values - Wire Grafana admin credentials through env vars in docker-compose - Make metricsStages a pointer to prevent reset-vs-write race on reconnect - Fix typed-nil interface in debug bundle path (GetClientMetrics) - Use deterministic field order in InfluxDB Export (sorted keys) - Replace Authorization header with X-Peer-ID for metrics push - Fix ingest server timeout to use time.Second instead of float - Fix gzip double-close, stale comments, trim log levels - Add tests for influxdb.go and MetricsStages * Add login duration metric, ingest tag validation, and duration bounds - Add netbird_login measurement recording login/auth duration to management server, with success/failure result tag - Validate InfluxDB tags against per-measurement allowlists in ingest server to prevent arbitrary tag injection - Cap all duration fields (_seconds) at 300s instead of only total_seconds - Add ingest server tests for tag/field validation, bounds, and auth Add arch tag to all metrics * Fix Grafana dashboard: add arch to drop columns, add login panels * Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL * Address review comments: fix README wording, update stale comments * Clarify env var precedence does not bypass remote config eligibility * Remove accidentally committed pprof files --------- Co-authored-by: Viktor Liu <viktor@netbird.io>	2026-03-22 12:45:41 +01:00
Viktor Liu	82762280ee	[client] Add health check flag to status command and expose daemon status in output (#5650 )	2026-03-22 12:39:40 +01:00
Viktor Liu	b550a2face	[management, proxy] Add require_subdomain capability for proxy clusters (#5628 )	2026-03-20 11:29:50 +01:00
Viktor Liu	ab77508950	[client] Add env var for management gRPC max receive message size (#5622 )	2026-03-19 17:33:50 +01:00
Viktor Liu	b9462f5c6b	[client] Make raw table initialization non-fatal in firewall managers (#5621 )	2026-03-19 17:33:38 +01:00
Viktor Liu	5ffaa5cdd6	[client] Fix duplicate log lines in containers (#5609 )	2026-03-19 15:53:05 +01:00
Pascal Fischer	a1858a9cb7	[management] recover proxies after cleanup if heartbeat is still running (#5617 )	2026-03-18 11:48:38 +01:00
Viktor Liu	212b34f639	[management] Add GET /reverse-proxies/clusters endpoint (#5611 )	2026-03-18 11:15:56 +08:00
Viktor Liu	af8eaa23e2	[client] Restart engine when peer IP address changes (#5614 )	2026-03-17 17:00:24 +01:00
Viktor Liu	f0eed50678	[management] Accept domain target type for L4 reverse proxy services (#5612 )	2026-03-17 16:29:03 +01:00
Wouter van Os	19d94c6158	[client] Allow setting DNSLabels on client embed (#5493 )	2026-03-17 16:12:37 +01:00
Viktor Liu	628eb56073	[client] Update go-m1cpu to v0.2.0 to fix SIGSEGV on macOS Tahoe (#5613 )	2026-03-17 16:10:38 +01:00
eason	a590c38d8b	[client] Fix IPv6 address formatting in DNS address construction (#5603 ) Replace fmt.Sprintf("%s:%d", ip, port) with net.JoinHostPort() to properly handle IPv6 addresses that need bracket wrapping (e.g., [2606:4700:4700::1111]:53 instead of 2606:4700:4700::1111:53). Without this fix, configuring IPv6 nameservers causes "too many colons in address" errors because Go's net.Dial cannot parse the malformed address string. Fixes #5601 Related to #4074 Co-authored-by: easonysliu <easonysliu@tencent.com>	2026-03-17 06:27:47 +01:00
Wesley Gimenes	4e149c9222	[client] update gvisor to build with Go 1.26.x (#5447 ) Building the client with Go 1.26.x fails with errors: ``` [...] /builder/dl/go-mod-cache/gvisor.dev/gvisor@v0.0.0-20251031020517-ecfcdd2f171c/pkg/sync/runtime_constants_go126.go:22:2: WaitReasonSelect redeclared in this block /builder/dl/go-mod-cache/gvisor.dev/gvisor@v0.0.0-20251031020517-ecfcdd2f171c/pkg/sync/runtime_constants_go125.go:22:2: other declaration of WaitReasonSelect /builder/dl/go-mod-cache/gvisor.dev/gvisor@v0.0.0-20251031020517-ecfcdd2f171c/pkg/sync/runtime_constants_go126.go:23:2: WaitReasonChanReceive redeclared in this block /builder/dl/go-mod-cache/gvisor.dev/gvisor@v0.0.0-20251031020517-ecfcdd2f171c/pkg/sync/runtime_constants_go125.go:23:2: other declaration of WaitReasonChanReceive /builder/dl/go-mod-cache/gvisor.dev/gvisor@v0.0.0-20251031020517-ecfcdd2f171c/pkg/sync/runtime_constants_go126.go:24:2: WaitReasonSemacquire redeclared in this block /builder/dl/go-mod-cache/gvisor.dev/gvisor@v0.0.0-20251031020517-ecfcdd2f171c/pkg/sync/runtime_constants_go125.go:24:2: other declaration of WaitReasonSemacquire [...] ``` Fixes: https://github.com/netbirdio/netbird/issues/5290 ("Does not build with Go 1.26rc3") Signed-off-by: Wesley Gimenes <wehagy@proton.me>	2026-03-17 06:09:12 +01:00
tham-le	59f5b34280	[client] add MTU option to embed.Options (#5550 ) Expose MTU configuration in the embed package so embedded clients can set the WireGuard tunnel MTU without the config file workaround. This is needed for protocols like QUIC that require larger datagrams than the default MTU of 1280. Validates MTU range via iface.ValidateMTU() at construction time to prevent invalid values from being persisted to config. Closes #5549	2026-03-17 06:03:10 +01:00
n0pashkov	dff06d0898	[misc] Add netbird-tui to community projects (#5568 )	2026-03-17 05:33:13 +01:00
Pascal Fischer	80a8816b1d	[misc] Add image build after merge to main (#5605 )	2026-03-16 18:00:23 +01:00
Viktor Liu	387e374e4b	[proxy, management] Add header auth, access restrictions, and session idle timeout (#5587 )	2026-03-16 15:22:00 +01:00
Viktor Liu	3e6baea405	[management,proxy,client] Add L4 capabilities (TLS/TCP/UDP) (#5530 )	2026-03-13 18:36:44 +01:00
Zoltan Papp	fe9b844511	[client] refactor auto update workflow (#5448 ) Auto-update logic moved out of the UI into a dedicated updatemanager.Manager service that runs in the connection layer. The UI no longer polls or checks for updates independently. The update manager supports three modes driven by the management server's auto-update policy: No policy set by mgm: checks GitHub for the latest version and notifies the user (previous behavior, now centralized) mgm enforces update: the "About" menu triggers installation directly instead of just downloading the file — user still initiates the action mgm forces update: installation proceeds automatically without user interaction updateManager lifecycle is now owned by daemon, giving the daemon server direct control via a new TriggerUpdate RPC Introduces EngineServices struct to group external service dependencies passed to NewEngine, reducing its argument count from 11 to 4	2026-03-13 17:01:28 +01:00
Pascal Fischer	2e1aa497d2	[proxy] add log-level flag (#5594 )	2026-03-13 15:28:25 +01:00
Viktor Liu	529c0314f8	[client] Fall back to getent/id for SSH user lookup in static builds (#5510 )	2026-03-13 15:22:02 +01:00
Pascal Fischer	d86875aeac	[management] Exclude proxy from peer approval (#5588 )	2026-03-13 15:01:59 +01:00
Zoltan Papp	f80fe506d5	[client] Fix DNS probe thread safety and avoid blocking engine sync (#5576 ) * Fix DNS probe thread safety and avoid blocking engine sync Refactor ProbeAvailability to prevent blocking the engine's sync mutex during slow DNS probes. The probe now derives its context from the server's own context (s.ctx) instead of accepting one from the caller, and uses a mutex to ensure only one probe runs at a time — new calls cancel the previous probe before starting. Also fixes a data race in Stop() when accessing probeCancel without the probe mutex. * Ensure DNS probe thread safety by locking critical sections Add proper locking to prevent data races when accessing shared resources during DNS probe execution and Stop(). Update handlers snapshot logic to avoid conflicts with concurrent writers. * Rename context and remove redundant cancellation * Cancel first and lock * Add locking to ensure thread safety when reactivating upstream servers	2026-03-13 13:22:43 +01:00
Maycon Santos	967c6f3cd3	[misc] Add GPG signing key support for rpm packages (#5581 ) * [misc] Add GPG signing key support for deb and rpm packages * [misc] Improve GPG key management for deb and rpm signing * [misc] Extract GPG key import logic into a reusable script * [misc] Add key fingerprint extraction and targeted export for GPG keys * [misc] Remove passphrase from GPG keys before exporting * [misc] Simplify GPG key management by removing import script * [misc] Bump GoReleaser version to v2.14.3 in release workflow * [misc] Replace GPG passphrase variables with NFPM-prefixed alternatives in workflows and configs * [misc] Update naming conventions for package IDs and passphrase variables in workflows and configs * [misc] Standardize NFPM variable naming in release workflow * [misc] Adjust NFPM variable names for consistency in release workflow * [misc] Remove Debian signing GPG key usage in workflows and configs	2026-03-13 09:47:00 +01:00
Pascal Fischer	e50e124e70	[proxy] Fix domain switching update (#5585 )	2026-03-12 17:12:26 +01:00
Pascal Fischer	c545689448	[proxy] Wildcard certificate support (#5583 )	2026-03-12 16:00:28 +01:00
Vlad	8f389fef19	[management] fix some concurrency potential issues (#5584 )	2026-03-12 15:57:36 +01:00
Pascal Fischer	d3d6a327e0	[proxy] read cert from disk if available instead of cert manager (#5574 ) * New Features * Asynchronous certificate prefetch that races live issuance with periodic on-disk cache checks to surface certificates faster. * Centralized recording and notification when certificates become available. * New on-disk certificate reading and validation to allow immediate use of cached certs. * Bug Fixes & Performance * Optimized retrieval by polling disk while fetching in background to reduce latency. * Added cancellation and timeout handling to fail stalled certificate operations reliably.	2026-03-11 19:18:37 +01:00
Vlad	b5489d4986	[management] set components network map by default and optimize memory usage (#5575 ) * Network map now defaults to compacted mode at startup; environment parsing issues yield clearer warnings and disabling compacted mode is logged. * Bug Fixes * DNS enablement and nameserver selection now correctly respect group membership, reducing incorrect DNS assignments. * Refactor * Internal routing and firewall rule generation streamlined for more consistent rule IDs and safer peer handling. * Performance * Minor memory and slice allocation improvements for peer/group processing. v0.66.4	2026-03-11 18:19:17 +01:00
Maycon Santos	7a23c57cf8	[self-hosted] Remove extra proxy domain from getting started (#5573 )	2026-03-11 15:52:42 +01:00
Pascal Fischer	11f891220e	[management] create a shallow copy of the account when buffering (#5572 )	2026-03-11 13:01:13 +01:00
Pascal Fischer	5585adce18	[management] add activity events for domains (#5548 ) * add activity events for domains * fix test * update activity codes * update activity codes v0.66.3	2026-03-09 19:04:04 +01:00
Pascal Fischer	f884299823	[proxy] refactor metrics and add usage logs (#5533 ) * New Features * Access logs now include bytes_upload and bytes_download (API and schemas updated, fields required). * Certificate issuance duration is now recorded as a metric. * Refactor * Metrics switched from Prometheus client to OpenTelemetry-backed meters; health endpoint now exposes OpenMetrics via OTLP exporter. * Tests * Metric tests updated to use OpenTelemetry Prometheus exporter and MeterProvider.	2026-03-09 18:45:45 +01:00
Maycon Santos	15aa6bae1b	[client] Fix exit node menu not refreshing on Windows (#5553 ) * [client] Fix exit node menu not refreshing on Windows TrayOpenedCh is not implemented in the systray library on Windows, so exit nodes were never refreshed after the initial connect. Combined with the management sync not having populated routes yet when the Connected status fires, this caused the exit node menu to remain empty permanently after disconnect/reconnect cycles. Add a background poller on Windows that refreshes exit nodes while connected, with fast initial polling to catch routes from management sync followed by a steady 10s interval. On macOS/Linux, TrayOpenedCh continues to handle refreshes on each tray open. Also fix a data race on connectClient assignment in the server's connect() method and add nil checks in CleanState/DeleteState to prevent panics when connectClient is nil. * Remove unused exitNodeIDs * Remove unused exitNodeState struct	2026-03-09 18:39:11 +01:00
Pascal Fischer	11eb725ac8	[management] only count login request duration for successful logins (#5545 )	2026-03-09 14:56:46 +01:00
Pascal Fischer	30c02ab78c	[management] use the cache for the pkce state (#5516 )	2026-03-09 12:23:06 +01:00
Zoltan Papp	3acd86e346	[client] "reset connection" error on wake from sleep (#5522 ) Capture engine reference before actCancel() in cleanupConnection(). After actCancel(), the connectWithRetryRuns goroutine sets engine to nil, causing connectClient.Stop() to skip shutdown. This allows the goroutine to set ErrResetConnection on the shared state after Down() clears it, causing the next Up() to fail.	2026-03-09 10:25:51 +01:00
Pascal Fischer	5c20f13c48	[management] fix domain uniqueness (#5529 )	2026-03-07 10:46:37 +01:00

1 2 3 4 5 ...

2716 Commits