netbird

mirror of https://github.com/netbirdio/netbird.git synced 2026-03-31 06:34:19 -04:00

Author	SHA1	Message	Date
Bethuel Mmbaga	c919ea149e	[misc] Add missing OpenAPI definitions (#5690 )	2026-03-30 11:20:17 +03:00
Pascal Fischer	7e1cce4b9f	[management] add terminated field to service (#5700 )	2026-03-26 16:59:08 +01:00
Bethuel Mmbaga	7be8752a00	[management] Add notification endpoints (#5590 )	2026-03-26 18:26:33 +03:00
Zoltan Papp	91f0d5cefd	[client] Feature/client metrics (#5512 ) * Add client metrics * Add client metrics system with OpenTelemetry and VictoriaMetrics support Implements a comprehensive client metrics system to track peer connection stages and performance. The system supports multiple backend implementations (OpenTelemetry, VictoriaMetrics, and no-op) and tracks detailed connection stage durations from creation through WireGuard handshake. Key changes: - Add metrics package with pluggable backend implementations - Implement OpenTelemetry metrics backend - Implement VictoriaMetrics metrics backend - Add no-op metrics implementation for disabled state - Track connection stages: creation, semaphore, signaling, connection ready, and WireGuard handshake - Move WireGuard watcher functionality to conn.go - Refactor engine to integrate metrics tracking - Add metrics export endpoint in debug server * Add signaling metrics tracking for initial and reconnection attempts * Reset connection stage timestamps during reconnections to exclude unnecessary metrics tracking * Delete otel lib from client * Update unit tests * Invoke callback on handshake success in WireGuard watcher * Add Netbird version tracking to client metrics Integrate Netbird version into VictoriaMetrics backend and metrics labels. Update `ClientMetrics` constructor and metric name formatting to include version information. * Add sync duration tracking to client metrics Introduce `RecordSyncDuration` for measuring sync message processing time. Update all metrics implementations (VictoriaMetrics, no-op) to support the new method. Refactor `ClientMetrics` to use `AgentInfo` for static agent data. * Remove no-op metrics implementation and simplify ClientMetrics constructor Eliminate unused `noopMetrics` and refactor `ClientMetrics` to always use the VictoriaMetrics implementation. Update associated logic to reflect these changes. * Add total duration tracking for connection attempts Calculate total duration for both initial connections and reconnections, accounting for different timestamp scenarios. Update `Export` method to include Prometheus HELP comments. * Add metrics push support to VictoriaMetrics integration * [client] anchor connection metrics to first signal received * Remove creation_to_semaphore connection stage metric The semaphore queuing stage (Created → SemaphoreAcquired) is no longer tracked. Connection metrics now start from SignalingReceived. Updated docs and Grafana dashboard accordingly. * [client] Add remote push config for metrics with version-based eligibility Introduce remoteconfig.Manager that fetches a remote JSON config to control metrics push interval and restrict pushing to a specific agent version range. When NB_METRICS_INTERVAL is set, remote config is bypassed entirely for local override. * [client] Add WASM-compatible NewClientMetrics implementation Replace NewClientMetrics in metrics.go with a WASM-specific stub in metrics_js.go, returning nil for compatibility with JS builds. Simplify method usage for WASM targets. * Add missing file * Update default case in DeploymentType.String to return "unknown" instead of "selfhosted" * [client] Rework metrics to use timestamped samples instead of histograms Replace cumulative Prometheus histograms with timestamped point-in-time samples that are pushed once and cleared. This fixes metrics for sparse events (connections/syncs that happen once at startup) where rate() and increase() produced incorrect or empty results. Changes: - Switch from VictoriaMetrics histogram library to raw Prometheus text format with explicit millisecond timestamps - Reset samples after successful push (no resending stale data) - Rename connection_to_handshake → connection_to_wg_handshake - Add netbird_peer_connection_count metric for ICE vs Relay tracking - Simplify dashboard: point-based scatter plots, donut pie chart - Add maxStalenessInterval=1m to VictoriaMetrics to prevent forward-fill - Fix deployment_type Unknown returning "selfhosted" instead of "unknown" - Fix inverted shouldPush condition in push.go * [client] Add InfluxDB metrics backend alongside VictoriaMetrics Add influxdb.go with timestamped line protocol export for sparse one-shot events. Restore victoria.go to use proper Prometheus histograms. Update Grafana dashboards, add InfluxDB datasource, and update docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [client] Fix metrics issues and update dev docker setup - Fix StopPush not clearing push state, preventing restart - Fix race condition reading currentConnPriority without lock in recordConnectionMetrics - Fix stale comment referencing old metrics server URL - Update docker-compose for InfluxDB: add scoped tokens, .env config, init scripts - Rename docker-compose.victoria.yml to docker-compose.yml * [client] Add anonymised peer tracking to pushed metrics Introduce peer_id and connection_pair_id tags to InfluxDB metrics. Public keys are hashed (truncated SHA-256) for anonymisation. The connection pair ID is deterministic regardless of which side computes it, enabling deduplication of reconnections in the ICE vs Relay dashboard. Also pin Grafana to v11.6.0 for file-based provisioning and fix datasource UID references. * Remove unused dependencies from go.mod and go.sum * Refactor InfluxDB ingest pipeline: extract validation logic - Move line validation logic to `validateLine` and `validateField` helper functions. - Improve error handling with structured validation and clearer separation of concerns. - Add stderr redirection for error messages in `create-tokens.sh`. * Set non-root user in Dockerfile for Ingest service * Fix Windows CI: command line too long * Remove Victoria metrics * Add hashed peer ID as Authorization header in metrics push * Revert influxdb in docker compose * Enable gzip compression and authorization validation for metrics push and ingest * Reducate code of complexity * Update debug documentation to include metrics.txt description * Increase `maxBodySize` limit to 50 MB and update gzip reader wrapping logic * Refactor deployment type detection to use URL parsing for improved accuracy * Update readme * Throttle remote config retries on fetch failure * Preserve first WG handshake timestamp, ignore rekeys * Skip adding empty metrics.txt to debug bundle in debug mode * Update default metrics server URL to https://ingest.netbird.io * Atomic metrics export-and-reset to prevent sample loss between Export and Reset calls * Fix doc * Refactor Push configuration to improve clarity and enforce minimum push interval * Remove `minPushInterval` and update push interval validation logic * Revert ExportAndReset, it is acceptable data loss * Fix metrics review issues: rename env var, remove stale infra, add tests - Rename NB_METRICS_ENABLED to NB_METRICS_PUSH_ENABLED to clarify that collection is always active (for debug bundles) and only push is opt-in - Change default config URL from staging to production (ingest.netbird.io) - Delete broken Prometheus dashboard (used non-existent metric names) - Delete unused VictoriaMetrics datasource config - Replace committed .env with .env.example containing placeholder values - Wire Grafana admin credentials through env vars in docker-compose - Make metricsStages a pointer to prevent reset-vs-write race on reconnect - Fix typed-nil interface in debug bundle path (GetClientMetrics) - Use deterministic field order in InfluxDB Export (sorted keys) - Replace Authorization header with X-Peer-ID for metrics push - Fix ingest server timeout to use time.Second instead of float - Fix gzip double-close, stale comments, trim log levels - Add tests for influxdb.go and MetricsStages * Add login duration metric, ingest tag validation, and duration bounds - Add netbird_login measurement recording login/auth duration to management server, with success/failure result tag - Validate InfluxDB tags against per-measurement allowlists in ingest server to prevent arbitrary tag injection - Cap all duration fields (_seconds) at 300s instead of only total_seconds - Add ingest server tests for tag/field validation, bounds, and auth Add arch tag to all metrics * Fix Grafana dashboard: add arch to drop columns, add login panels * Validate NB_METRICS_SERVER_URL is an absolute HTTP(S) URL * Address review comments: fix README wording, update stale comments * Clarify env var precedence does not bypass remote config eligibility * Remove accidentally committed pprof files --------- Co-authored-by: Viktor Liu <viktor@netbird.io>	2026-03-22 12:45:41 +01:00
Viktor Liu	b550a2face	[management, proxy] Add require_subdomain capability for proxy clusters (#5628 )	2026-03-20 11:29:50 +01:00
Viktor Liu	ab77508950	[client] Add env var for management gRPC max receive message size (#5622 )	2026-03-19 17:33:50 +01:00
Viktor Liu	387e374e4b	[proxy, management] Add header auth, access restrictions, and session idle timeout (#5587 )	2026-03-16 15:22:00 +01:00
Viktor Liu	3e6baea405	[management,proxy,client] Add L4 capabilities (TLS/TCP/UDP) (#5530 )	2026-03-13 18:36:44 +01:00
Zoltan Papp	fe9b844511	[client] refactor auto update workflow (#5448 ) Auto-update logic moved out of the UI into a dedicated updatemanager.Manager service that runs in the connection layer. The UI no longer polls or checks for updates independently. The update manager supports three modes driven by the management server's auto-update policy: No policy set by mgm: checks GitHub for the latest version and notifies the user (previous behavior, now centralized) mgm enforces update: the "About" menu triggers installation directly instead of just downloading the file — user still initiates the action mgm forces update: installation proceeds automatically without user interaction updateManager lifecycle is now owned by daemon, giving the daemon server direct control via a new TriggerUpdate RPC Introduces EngineServices struct to group external service dependencies passed to NewEngine, reducing its argument count from 11 to 4	2026-03-13 17:01:28 +01:00
Pascal Fischer	f884299823	[proxy] refactor metrics and add usage logs (#5533 ) * New Features * Access logs now include bytes_upload and bytes_download (API and schemas updated, fields required). * Certificate issuance duration is now recorded as a metric. * Refactor * Metrics switched from Prometheus client to OpenTelemetry-backed meters; health endpoint now exposes OpenMetrics via OTLP exporter. * Tests * Metric tests updated to use OpenTelemetry Prometheus exporter and MeterProvider.	2026-03-09 18:45:45 +01:00
Viktor Liu	e601278117	[management,proxy] Add per-target options to reverse proxy (#5501 )	2026-03-05 10:03:26 +01:00
Viktor Liu	0ca59535f1	[management] Add reverse proxy services REST client (#5454 )	2026-02-28 13:04:58 +08:00
Maycon Santos	63c83aa8d2	[client,management] Feature/client service expose (#5411 ) CLI: new expose command to publish a local port with flags for PIN, password, user groups, custom domain, name prefix and protocol (HTTP default). Management/API: create/renew/stop expose sessions (streamed status), automatic naming/domain, TTL renewals, background expiration, new management RPCs and client methods. UI/API: account settings now include peer_expose_enabled and peer_expose_groups; new activity codes for peer expose events.	2026-02-24 10:02:16 +01:00
Pascal Fischer	5ca1b64328	[management] access log sorting (#5378 )	2026-02-20 00:11:55 +01:00
Zoltan Papp	318cf59d66	[relay] reduce QUIC initial packet size to 1280 (IPv6 min MTU) (#5374 ) * [relay] reduce QUIC initial packet size to 1280 (IPv6 min MTU) * adjust QUIC initial packet size to 1232 based on RFC 9000 §14	2026-02-18 10:58:14 +01:00
Pascal Fischer	f53155562f	[management, reverse proxy] Add reverse proxy feature (#5291 ) * implement reverse proxy --------- Co-authored-by: Alisdair MacLeod <git@alisdairmacleod.co.uk> Co-authored-by: mlsmaycon <mlsmaycon@gmail.com> Co-authored-by: Eduard Gert <kontakt@eduardgert.de> Co-authored-by: Viktor Liu <viktor@netbird.io> Co-authored-by: Diego Noguês <diego.sure@gmail.com> Co-authored-by: Diego Noguês <49420+diegocn@users.noreply.github.com> Co-authored-by: Bethuel Mmbaga <bethuelmbaga12@gmail.com> Co-authored-by: Zoltan Papp <zoltan.pmail@gmail.com> Co-authored-by: Ashley Mensah <ashleyamo982@gmail.com>	2026-02-13 19:37:43 +01:00
Zoltan Papp	edce11b34d	[client] Refactor/relay conn container (#5271 ) * Fix race condition and ensure correct message ordering in connection establishment Reorder operations in OpenConn to register the connection before waiting for peer availability. This ensures: - Connection is ready to receive messages before peer subscription completes - Transport messages and onconnected events maintain proper ordering - No messages are lost during the connection establishment window - Concurrent OpenConn calls cannot create duplicate connections If peer availability check fails, the pre-registered connection is properly cleaned up. * Handle service shutdown during relay connection initialization Ensure relay connections are properly cleaned up when the service is not running by verifying `serviceIsRunning` and removing stale entries from `c.conns` to prevent unintended behaviors. * Refactor relay client Conn/connContainer ownership and decouple Conn from Client Conn previously held a direct Client pointer and called client methods (writeTo, closeConn, LocalAddr) directly, creating a tight bidirectional coupling. The message channel was also created externally in OpenConn and shared between Conn and connContainer with unclear ownership. Now connContainer fully owns the lifecycle of both the channel and the Conn it wraps: - connContainer creates the channel (sized by connChannelSize const) and the Conn internally via newConnContainer - connContainer feeds messages into the channel (writeMsg), closes and drains it on shutdown (close) - Conn reads from the channel (Read) but never closes it Conn is decoupled from Client by replacing the *Client field with three function closures (writeFn, closeFn, localAddrFn) that are wired by newConnContainer at construction time. Write, Close, and LocalAddr delegate to these closures. This removes the direct dependency while keeping the identity-check logic: writeTo and closeConn now compare connContainer pointers instead of Conn pointers to verify the caller is the current active connection for that peer.	2026-02-13 15:48:08 +01:00
Zoltan Papp	841b2d26c6	Add early message buffer for relay client (#5282 ) Add early message buffer to capture transport messages arriving before OpenConn completes, ensuring correct message ordering and no dropped messages.	2026-02-13 15:41:26 +01:00
Bethuel Mmbaga	d3eeb6d8ee	[misc] Add cloud api spec to public open api with rest client (#5222 )	2026-02-13 15:08:47 +03:00
Misha Bragin	64b849c801	[self-hosted] add netbird server (#5232 ) * Unified NetBird combined server (Management, Signal, Relay, STUN) as a single executable with richer YAML configuration, validation, and defaults. * Official Dockerfile/image for single-container deployment. * Optional in-process profiling endpoint for diagnostics. * Multiplexing to route HTTP/gRPC/WebSocket traffic via one port; runtime hooks to inject custom handlers. * Chores * Updated deployment scripts, compose files, and reverse-proxy templates to target the combined server; added example configs and getting-started updates.	2026-02-12 19:24:43 +01:00
Zoltan Papp	6981fdce7e	[client] Fix race condition and ensure correct message ordering in Relay (#5265 ) * Fix race condition and ensure correct message ordering in connection establishment Reorder operations in OpenConn to register the connection before waiting for peer availability. This ensures: - Connection is ready to receive messages before peer subscription completes - Transport messages and onconnected events maintain proper ordering - No messages are lost during the connection establishment window - Concurrent OpenConn calls cannot create duplicate connections If peer availability check fails, the pre-registered connection is properly cleaned up. * Handle service shutdown during relay connection initialization Ensure relay connections are properly cleaned up when the service is not running by verifying `serviceIsRunning` and removing stale entries from `c.conns` to prevent unintended behaviors.	2026-02-09 11:34:24 +01:00
Misha Bragin	3a0cf230a1	Disable local users for a smooth single-idp mode (#5226 ) Add LocalAuthDisabled option to embedded IdP configuration This adds the ability to disable local (email/password) authentication when using the embedded Dex identity provider. When disabled, users can only authenticate via external identity providers (Google, OIDC, etc.). This simplifies user login when there is only one external IdP configured. The login page will redirect directly to the IdP login page. Key changes: Added LocalAuthDisabled field to EmbeddedIdPConfig Added methods to check and toggle local auth: IsLocalAuthEnabled, HasNonLocalConnectors, DisableLocalAuth, EnableLocalAuth Validation prevents disabling local auth if no external connectors are configured Existing local users are preserved when disabled and can login again when re-enabled Operations are idempotent (disabling already disabled is a no-op)	2026-02-01 14:26:22 +01:00
Viktor Liu	81c11df103	[management] Streamline domain validation (#5211 )	2026-01-29 13:51:44 +01:00
Misha Bragin	7d791620a6	Add user invite link feature for embedded IdP (#5157 )	2026-01-27 09:42:20 +01:00
Maycon Santos	2381e216e4	Fix validator message with warn (#5168 )	2026-01-24 17:49:25 +01:00
Misha Bragin	4888021ba6	Add missing activity events to the API response (#5140 )	2026-01-20 15:12:22 +01:00
Misha Bragin	a0b0b664b6	Local user password change (embedded IdP) (#5132 )	2026-01-20 14:16:42 +01:00
Diego Romar	50da5074e7	[client] change notifyDisconnected call (#5138 ) On handleJobStream, when handling error codes from receiveJobRequest in the switch-case, notifying disconnected in cases where it isn't a disconnection breaks connection status reporting on mobile peers. This commit changes it so it isn't called on Canceled or Unimplemented status codes.	2026-01-20 07:14:33 -03:00
Zoltan Papp	58daa674ef	[Management/Client] Trigger debug bundle runs from API/Dashboard (#4592 ) (#4832 ) This PR adds the ability to trigger debug bundle generation remotely from the Management API/Dashboard.	2026-01-19 11:22:16 +01:00
Misha Bragin	1ff7abe909	[management, client] Fix SSH server audience validator (#5105 ) * New Features * SSH server JWT validation now accepts multiple audiences with backward-compatible handling of the previous single-audience setting and a guard ensuring at least one audience is configured. * Tests * Test suites updated and new tests added to cover multiple-audience scenarios and compatibility with existing behavior. * Other * Startup logging enhanced to report configured audiences for JWT auth.	2026-01-16 12:28:17 +01:00
Bethuel Mmbaga	067c77e49e	[management] Add custom dns zones (#4849 )	2026-01-16 12:12:05 +03:00
Viktor Liu	b12c084a50	[client] Fall through dns chain for custom dns zones (#5081 )	2026-01-12 13:56:39 +01:00
Zoltan Papp	9c9d8e17d7	Revert "Revert "[relay] Update GO version and QUIC version (#4736 )" (#5055 )" (#5071 ) This reverts commit `24df442198`.	2026-01-08 18:58:22 +01:00
Maycon Santos	24df442198	Revert "[relay] Update GO version and QUIC version (#4736 )" (#5055 ) This reverts commit `8722b79799`.	2026-01-07 19:02:20 +01:00
Zoltan Papp	8722b79799	[relay] Update GO version and QUIC version (#4736 ) - Go 1.25.5 - QUIC 0.55.0	2026-01-07 16:30:29 +01:00
Misha Bragin	e586c20e36	[management, infrastructure, idp] Simplified IdP Management - Embedded IdP (#5008 ) Embed Dex as a built-in IdP to simplify self-hosting setup. Adds an embedded OIDC Identity Provider (Dex) with local user management and optional external IdP connectors (Google/GitHub/OIDC/SAML), plus device-auth flow for CLI login. Introduces instance onboarding/setup endpoints (including owner creation), field-level encryption for sensitive user data, a streamlined self-hosting provisioning script, and expanded APIs + test coverage for IdP management. more at https://github.com/netbirdio/netbird/pull/5008#issuecomment-3718987393	2026-01-07 14:52:32 +01:00
Pascal Fischer	f022e34287	[shared] allow setting a user agent for the rest client (#5037 )	2026-01-06 10:52:36 +01:00
Louis Li	e11970e32e	[client] add reset for management backoff (#4935 ) Reset client management grpc client backoff after successful connected to management API. Current Situation: If the connection duration exceeds MaxElapsedTime, when the connection is interrupted, the backoff fails immediately due to timeout and does not actually perform a retry.	2025-12-30 08:37:49 +01:00
Zoltan Papp	67f7b2404e	[client, management] Feature/ssh fine grained access (#4969 ) Add fine-grained SSH access control with authorized users/groups	2025-12-29 12:50:41 +01:00
Zoltan Papp	011cc81678	[client, management] auto-update (#4732 )	2025-12-19 19:57:39 +01:00
Bethuel Mmbaga	031ab11178	[client] Remove select account prompt (#4912 ) Signed-off-by: bcmmbaga <bethuelmbaga12@gmail.com>	2025-12-04 14:57:29 +01:00
Pascal Fischer	7193bd2da7	[management] Refactor network map controller (#4789 )	2025-12-02 12:34:28 +01:00
Fahri Shihab	4b77359042	[management] Groups API with name query parameter (#4831 )	2025-12-01 16:57:42 +01:00
Zoltan Papp	387d43bcc1	[client, management] Add OAuth select_account prompt support to PKCE flow (#4880 ) * Add OAuth select_account prompt support to PKCE flow Extends LoginFlag enum with select_account options to enable multi-account selection during authentication. This allows users to choose which account to use when multiple accounts have active sessions with the identity provider. The new flags are backward compatible - existing LoginFlag values (0=prompt login, 1=max_age=0) retain their original behavior.	2025-12-01 14:25:52 +01:00
Maycon Santos	20973063d8	[client] Support disable search domain for custom zones (#4826 ) Two new boolean flags, SearchDomainDisabled and SkipPTRProcess, are added to CustomZone and its protobuf; they are propagated through the engine to DNS host logic. Host matching now uses SearchDomainDisabled directly, and PTR collection skips zones with SkipPTRProcess; reverse zones are initialized with SearchDomainDisabled: true.	2025-11-24 17:50:08 +01:00
Pascal Fischer	3351b38434	[management] pass config to controller (#4807 )	2025-11-19 11:52:18 +01:00
Viktor Liu	d71a82769c	[client,management] Rewrite the SSH feature (#4015 )	2025-11-17 17:10:41 +01:00
Pascal Fischer	cc97cffff1	[management] move network map logic into new design (#4774 )	2025-11-13 12:09:46 +01:00
Pascal Fischer	48475ddc05	[management] add pat rate limiting (#4741 )	2025-11-07 15:50:18 +01:00
Viktor Liu	43c9a51913	[client] Migrate deprecated grpc client code (#4687 )	2025-10-30 10:14:27 +01:00

1 2

73 Commits