* Fix DNS probe thread safety and avoid blocking engine sync
Refactor ProbeAvailability to prevent blocking the engine's sync mutex
during slow DNS probes. The probe now derives its context from the
server's own context (s.ctx) instead of accepting one from the caller,
and uses a mutex to ensure only one probe runs at a time — new calls
cancel the previous probe before starting. Also fixes a data race in
Stop() when accessing probeCancel without the probe mutex.
* Ensure DNS probe thread safety by locking critical sections
Add proper locking to prevent data races when accessing shared resources during DNS probe execution and Stop(). Update handlers snapshot logic to avoid conflicts with concurrent writers.
* Rename context and remove redundant cancellation
* Cancel first and lock
* Add locking to ensure thread safety when reactivating upstream servers
* [misc] Add GPG signing key support for deb and rpm packages
* [misc] Improve GPG key management for deb and rpm signing
* [misc] Extract GPG key import logic into a reusable script
* [misc] Add key fingerprint extraction and targeted export for GPG keys
* [misc] Remove passphrase from GPG keys before exporting
* [misc] Simplify GPG key management by removing import script
* [misc] Bump GoReleaser version to v2.14.3 in release workflow
* [misc] Replace GPG passphrase variables with NFPM-prefixed alternatives in workflows and configs
* [misc] Update naming conventions for package IDs and passphrase variables in workflows and configs
* [misc] Standardize NFPM variable naming in release workflow
* [misc] Adjust NFPM variable names for consistency in release workflow
* [misc] Remove Debian signing GPG key usage in workflows and configs
* **New Features**
* Asynchronous certificate prefetch that races live issuance with periodic on-disk cache checks to surface certificates faster.
* Centralized recording and notification when certificates become available.
* New on-disk certificate reading and validation to allow immediate use of cached certs.
* **Bug Fixes & Performance**
* Optimized retrieval by polling disk while fetching in background to reduce latency.
* Added cancellation and timeout handling to fail stalled certificate operations reliably.
* Network map now defaults to compacted mode at startup; environment parsing issues yield clearer warnings and disabling compacted mode is logged.
* **Bug Fixes**
* DNS enablement and nameserver selection now correctly respect group membership, reducing incorrect DNS assignments.
* **Refactor**
* Internal routing and firewall rule generation streamlined for more consistent rule IDs and safer peer handling.
* **Performance**
* Minor memory and slice allocation improvements for peer/group processing.
* **New Features**
* Access logs now include bytes_upload and bytes_download (API and schemas updated, fields required).
* Certificate issuance duration is now recorded as a metric.
* **Refactor**
* Metrics switched from Prometheus client to OpenTelemetry-backed meters; health endpoint now exposes OpenMetrics via OTLP exporter.
* **Tests**
* Metric tests updated to use OpenTelemetry Prometheus exporter and MeterProvider.
* [client] Fix exit node menu not refreshing on Windows
TrayOpenedCh is not implemented in the systray library on Windows,
so exit nodes were never refreshed after the initial connect. Combined
with the management sync not having populated routes yet when the
Connected status fires, this caused the exit node menu to remain empty
permanently after disconnect/reconnect cycles.
Add a background poller on Windows that refreshes exit nodes while
connected, with fast initial polling to catch routes from management
sync followed by a steady 10s interval. On macOS/Linux, TrayOpenedCh
continues to handle refreshes on each tray open.
Also fix a data race on connectClient assignment in the server's connect()
method and add nil checks in CleanState/DeleteState to prevent panics
when connectClient is nil.
* Remove unused exitNodeIDs
* Remove unused exitNodeState struct
Capture engine reference before actCancel() in cleanupConnection().
After actCancel(), the connectWithRetryRuns goroutine sets engine to nil,
causing connectClient.Stop() to skip shutdown. This allows the goroutine
to set ErrResetConnection on the shared state after Down() clears it,
causing the next Up() to fail.
The combined server was using the hostname from exposedAddress for both
singleAccountModeDomain and dnsDomain, causing fresh installs to get
the wrong domain and existing installs to break if the config changed.
Add resolveDomains() to BaseServer that reads domain from the store:
- Fresh install (0 accounts): uses "netbird.selfhosted" default
- Existing install: reads persisted domain from the account in DB
- Store errors: falls back to default safely
The combined server opts in via AutoResolveDomains flag, while the
standalone management server is unaffected.
* **Bug Fixes**
* Fixed domain configuration handling in single account mode to properly retrieve and apply domain settings from account data.
* Improved error handling when account data is unavailable with fallback to configured default domain.
* **Tests**
* Added comprehensive test coverage for single account mode domain configuration scenarios, including edge cases for missing or unavailable account data.
The expose tracker used sync.Map for in-memory TTL tracking of active expose sessions, which broke and lost all sessions on restart.
Replace with SQL-backed operations that reuse the existing meta_last_renewed_at column:
- Add store methods: RenewEphemeralService, GetExpiredEphemeralServices, CountEphemeralServicesByPeer, EphemeralServiceExists
- Move duplicate/limit checks inside a transaction with row-level locking (SELECT ... FOR UPDATE) to prevent concurrent bypass
- Reaper re-checks expiry under row lock to avoid deleting a just-renewed service and prevent duplicate event emission
- Add composite index on (source, source_peer) for efficient queries
- Batch-limit and column-select the reaper query to avoid DB/GC spikes
- Filter out malformed rows with empty source_peer
Increase DefaultJWTMaxTokenAge from 5 to 10 minutes to accommodate
identity providers like Azure Entra ID that backdate the iat claim
by up to 5 minutes, causing tokens to be immediately rejected.
Fixes#5449
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap peerStateUpdate send in a nested select to prevent goroutine
blocking when the consumer has exited, which could fill the
subscription buffer and deadlock the Status mutex.
Up() acquired s.mutex with a deferred unlock, then called waitForUp()
while still holding the lock. waitForUp() blocks for up to 50 seconds
waiting on clientRunningChan/clientGiveUpChan, starving all concurrent
gRPC calls that require the same mutex (Status, ListProfiles, etc.).
Replace the deferred unlock with explicit s.mutex.Unlock() on every
early-return path and immediately before waitForUp(), matching the
pattern already used by the clientRunning==true branch.
- Automatic Unix daemon address discovery: if the default socket is missing, the client can find and use a single available socket.
- Client startup now resolves daemon addresses more robustly while preserving non-Unix behavior.
Consolidate all expose business logic (validation, permission checks, TTL tracking, reaping) into the manager layer, making the gRPC layer a pure transport adapter that only handles proto conversion and authentication.
- Add ExposeServiceRequest/ExposeServiceResponse domain types with validation in the reverseproxy package
- Move expose tracker (TTL tracking, reaping, per-peer limits) from gRPC server into manager/expose_tracker.go
- Internalize tracking in CreateServiceFromPeer, RenewServiceFromPeer, and new StopServiceFromPeer so callers don't manage tracker state
- Untrack ephemeral services in DeleteService/DeleteAllServices to keep tracker in sync when services are deleted via API
- Simplify gRPC expose handlers to parse, auth, convert, delegate
- Remove tracker methods from Manager interface (internal detail)
In netstack (proxy) mode, the process lacks permission to create
/var/run/wireguard, making the UAPI listener unnecessary and causing
a misleading error log. Introduce NewUSPConfigurerNoUAPI and use it
for the netstack device to avoid attempting to open the UAPI socket
entirely. Also consolidate UAPI error logging to a single call site.
CLI: new expose command to publish a local port with flags for PIN, password, user groups, custom domain, name prefix and protocol (HTTP default).
Management/API: create/renew/stop expose sessions (streamed status), automatic naming/domain, TTL renewals, background expiration, new management RPCs and client methods.
UI/API: account settings now include peer_expose_enabled and peer_expose_groups; new activity codes for peer expose events.
could interleave with a sleep/wake event causing out-of-order state
transitions. The mutex now covers the full duration of each handler
including the status check, the Up/Down call, and the flag update.
Note: if Up or Down commands are triggered in parallel with sleep/wake
events, the overall ordering of up/down/sleep/wake operations is still
not guaranteed beyond what the mutex provides within the handler itself.