26 Commits

Author SHA1 Message Date
Maycon Santos
eb422a5cd3 [management,proxy] Add per-provider skip_tls_verification for agent-network (#6630)
* [management,proxy] Add per-provider skip_tls_verification for agent-network

Let agent-network providers opt into skipping upstream TLS verification for
self-hosted / internal gateways behind a private or self-signed cert.

- provider: add SkipTLSVerification (persisted via AutoMigrate) with
  request/response mapping (nil on update preserves, explicit false clears).
- openapi: skip_tls_verification on the provider request + response; types
  regenerated.
- synthesizer: carry the flag into the llm_router route config so it reaches
  the proxy.
- proxy: llm_router sets it on the UpstreamRewrite mutation, and the reverse
  proxy applies roundtrip.WithSkipTLSVerify per selected route when forwarding
  upstream (the router dials per provider, so a per-target flag alone wouldn't
  cover it).
- tests: synthesizer route config carries the flag, router rewrite propagates
  it, and the request/response round-trip incl. update semantics.

* [e2e] Validate per-provider skip_tls_verification end to end

Add a self-signed HTTPS upstream (nginx) to the harness and a test that
provisions two providers on that same upstream — one with
skip_tls_verification=true, one false — behind one proxy + client. The
skip=true provider's chat reaches the upstream (200); the skip=false
provider's fails the TLS handshake (5xx). Same upstream, opposite outcome,
which proves the flag is honoured per provider (a single target-level flag
could not, since all of an account's providers share one synthesised
target).

* [e2e] WaitProxyPeer: require >=1 connected peer, not exact 1/1

Each proxy container registers a fresh WireGuard key and its peer is not
removed on teardown, so proxy peers from earlier tests linger in the
account as disconnected. WaitProxyPeer matched the exact string
"1/1 Connected", which failed once a second proxy-using test ran in the
same package (status "1/2"). Parse the "Peers count: X/Y Connected" line
and wait for X>=1 instead: only the live proxy can be connected, and the
caller's subsequent chat is the real end-to-end assertion. Fixes the CI
failure of TestProviderSkipTLSVerification (runs after TestProvidersMatrix).
2026-07-01 20:43:15 +02:00
Maycon Santos
92a66cdd20 [management,proxy,client] 0.74.0 version (#6563)
* [management,proxy] Agent network: per-account LLM gateway (policy, metering, multi-provider) (#6555)

* [agent-network] Shared proto, OpenAPI schema, and generated types

* [agent-network] Management: store, manager, synthesizer, policy engine, provider catalog, HTTP/gRPC API

Adds the account-scoped agent-network module: provider/policy/budget CRUD and
store, the reverse-proxy service synthesizer, policy selection + limit
enforcement, the provider catalog (incl. Vertex AI and AWS Bedrock entries),
and the management HTTP + proxy gRPC surfaces.

* [management] Fix agent-network proxy-peer fan-out on affected-peer recompute

The affected-peers resolver loaded only persisted reverse-proxy services, but
agent-network services are synthesized on demand and never persisted. As a
result the embedded proxy peer was never folded into the affected set when a
client's group changed, so the proxy received no network-map update for a newly
authorised client and rejected its handshake until a full resync (restart).

loadProxyServices now merges the synthesized agent-network services (injected
via a registration hook to avoid an import cycle), so proxy peers learn newly
authorised clients immediately.

* [proxy] Reverse-proxy middleware framework, chain, and request plumbing

The per-target middleware chain (slots, dispatcher, mutation gate, metadata
merger), body capture, access-log terminal sink, and the proxy wiring that
builds + runs chains for synthesized agent-network services.

* [proxy] LLM parsers, pricing, and builtin middlewares (OpenAI, Anthropic, Vertex AI, AWS Bedrock)

Request/response parsers and SSE/event-stream metering, the embedded pricing
table, and the builtin middleware set: request parser, router, policy
limit-check/record, cost meter, guardrail, identity inject, response parser.
Includes the path-routed providers — Google Vertex AI (keyfile:: service-account
OAuth minting) and AWS Bedrock (bearer auth, invoke/converse/streaming, optional
/bedrock prefix) — plus the Models allowlist and unmeterable-publisher deny.

* [proxy] IPv6 in-place apply and TCP accept-loop hardening on netstack listeners

* [agent-network] End-to-end test suite, module docs, and deployment preset

* [agent-network] Fix codespell typos and exclude false positives

- labelgen word pool: vermillion -> vermilion, racoon -> raccoon.
- codespell ignore list: add flate (Go compress/flate package), recordin
  (a test-local identifier), and unparseable (a valid alternative spelling used
  consistently across identifiers + a metadata-value constant).

* [management] Set LastSeen on injected proxy peer in realstack test (MySQL strict-mode)

The injected embedded proxy peer had a PeerStatus with a zero LastSeen, which
serializes to '0000-00-00' and is rejected by MySQL in strict mode (SQLite
tolerates it). Set LastSeen to a valid time so SaveAccount succeeds on both
engines.

* [agent-network] Remove e2e shell-script suite from this branch

The end-to-end shell scripts under scripts/e2e/ are maintained in a separate
testing suite and are not part of this change set.

* [agent-network] Polish module docs: remove internal review scaffolding, fix links, verify diagrams

Strip PR-review framing, commit references, absolute paths, and stale internal
references from the agent-network module docs; fix broken relative links; verify
all diagrams against the current architecture. Remove the internal AI-reviewer
prompt file.

* [management] Refine session expiration handling to support 3-state encoding for SSO deadlines

* [agent-network] Relocate agentnetwork package to internals/modules

Move management/server/agentnetwork (and its catalog/, labelgen/, types/
subpackages) to management/internals/modules/agentnetwork, alongside the
reverse-proxy module, and rewrite all importers. Pure relocation: package names,
the synthesizer + affectedpeers registration hook, and store access (shared
store.Store) are unchanged, so no import cycle is introduced (affectedpeers
still depends only on the agentnetwork/types leaf).

* [agent-network] Co-locate HTTP handlers in the module (RegisterEndpoints)

Move the agent-network HTTP handlers from server/http/handlers/agentnetwork into
the module at internals/modules/agentnetwork/handlers (package handlers) and
rename the entrypoint AddEndpoints -> RegisterEndpoints, matching the
reverse-proxy module convention. Wiring in http/handler.go updated accordingly.

* Update getting started to point to rc when agent network enabled

* Add a reference to a commercial license

* Fix docs localhost link

* Fix docs localhost link

* Add private services domain note

* [management] Add agent-network telemetry metrics (#6561)

Surface agent-network adoption and usage in the self-hosted metrics
worker: distinct accounts, providers, policies, budget rules, accounts
with log collection enabled, and aggregated input/output tokens plus
cost.

Tokens and cost are summed from agent_network_request_usage (the
always-written per-request ledger) so the figures are accurate
regardless of the log-collection toggle and carry no double-counting.
All values come from a handful of indexed aggregate queries run only on
the worker's periodic tick.

Adds store.AgentNetworkMetrics with GetAgentNetworkMetrics on the Store
interface, the SqlStore implementation, and a zero-valued FileStore stub.

* Update NetBird server and proxy image versions to 0.74.0-rc.2

* [management,proxy] Reduce agent-network cognitive complexity (#6566)

Address the SonarCloud quality-gate findings in new agent-network code
by extracting focused helpers. No behavior change.

- synthesizer.go: split buildIdentityInjectConfigJSON into per-shape
  rule builders; extract mergeGuardrail from mergeGuardrails to cut
  nesting depth.
- llm_identity_inject: extract injectionEmitsAnything validation
  predicate from New.
- llm_response_parser/streaming.go: extract applyOpenAIStreamUsage and
  applyAnthropicStreamUsage (via a named anthropicStreamUsage type) and
  simplify the OpenAI scanner loop.
- reverseproxy.go: decompose ServeHTTP into serveRouteError,
  buildTargetContext, serveDirect, serveWithChain, captureRequestForChain,
  serveDeny, newResponseWriter, observeResponse, and forwardUpstream,
  preserving the defer ordering so response observation still reads the
  captured writer before it is released.

* [management] Move agent-network access-log ingest into the agentnetwork module (#6568)

The agent-network access-log ingest path (metaKey wire contract, flatten,
usage derivation, and the dual-write of the usage ledger + settings-gated
full row) lived in the reverseproxy accesslogs manager, even though the
agentnetwork module already owns the rest of that domain — types, read
(ListAccessLogs / GetUsageOverview), the budget-counter writes, and
retention cleanup.

Move it next to the rest: a stateless agentnetwork.IngestAccessLog(ctx,
store, entry) that the reverseproxy SaveAccessLog delegates to when the
entry is agent-network. Removes the agentNetworkTypes import from the
reverseproxy manager. No behavior change; the write/read table separation
is unchanged.

Adds real-store coverage for the disable->enable log-collection toggle
(usage ledger always written, full row gated) plus the metadata parse and
group-dedup helpers, which previously had no dedicated tests.

* Add session view support in the access log

* [management,proxy] Container-based agent-network e2e harness (#6577)

* [e2e] Add container-based agent-network e2e harness (Pillar 1)

Introduce a self-contained, OIDC-free e2e harness that stands up NetBird
in containers, so suites no longer depend on the hand-maintained Tilt
stack or a real IdP.

- harness brings up the combined server (management + signal + relay +
  STUN + embedded IdP) in a single container built from
  combined/Dockerfile.multistage, and mints an admin PAT through the
  unauthenticated /api/setup bootstrap (NB_SETUP_PAT_ENABLED). API access
  goes through the existing shared/management/client/rest typed client.
- the image is built via the docker CLI (BuildKit) so the Dockerfile's
  cache mounts are honored; testcontainers then runs the tagged image.
- everything is behind the `e2e` build tag so normal builds and unit
  tests never pull in testcontainers.

Adds BuildKit cache mounts to combined/Dockerfile.multistage so source
changes recompile incrementally rather than from scratch.

Pillar 1 proven by TestCombinedBootstrap: server builds, boots, mints a
PAT, and the PAT authenticates a real management API call.

* [e2e] Add management-side agent-network scenarios (Pillar 2)

Port the API-driven agent-network scenarios from the bash suites to Go,
sharing one combined server per package run (TestMain) with each test
owning its resource cleanup. Drives the /api/agent-network/* endpoints
through the shared REST client's NewRequest primitive with the generated
api types.

Scenarios:
- provider lifecycle (create/get/list/delete + 404 after delete)
- provider validation (missing api_key, unknown catalog id → 4xx)
- settings collection-toggle round-trip with cluster/subdomain immutability
- policy window floor (reject <60s enabled limit, accept at 60s)
- consumption read endpoint returns an array

All deterministic and dependency-free (dummy provider keys; no upstream
calls), so they run headless in CI.

* [e2e] Add live chat-through-proxy scenario (Pillar 3)

Stand up the full agent-network data path in containers and drive a real
chat-completion through the gateway:

- harness: a shared docker network (combined server reachable by alias),
  a proxy container built from the published reverse-proxy image
  (NB_PROXY_PRIVATE, NB_PROXY_ALLOW_INSECURE, NB_RELAY_TRANSPORT=ws to match
  the combined server's WS-multiplexed relay) with a generated self-signed
  wildcard cert, and a netbird client container that joins via a setup key.
- the combined image, proxy image, and client image default to the
  published rc.2 releases (overridable via NB_E2E_*_IMAGE; a bare local tag
  is built from source instead). Geolocation download is disabled so the
  server starts without external fetches.
- one shared domain is used for the management exposed address, the proxy
  domain, and the agent-network cluster; the proxy token is minted via the
  server CLI (global) to match the manual install.

TestChatCompletionThroughProxy provisions provider+policy+group+setup key,
runs proxy+client, drives an OpenAI chat-completion through the tunnel, and
asserts a 200 plus the ingested access-log row. Requires OPENAI_TOKEN
(skips otherwise). The provider must be created with enabled=true explicitly
— the create default is false despite the API doc.

* [e2e] Run the live chat scenario across a provider matrix

Replace the single-provider chat test with a data-driven matrix that runs
the same scenario through every provider whose credentials are present in
the environment (keys/URLs sourced from ~/.llm-keys locally, Actions
secrets in CI):

- OpenAI (chat), Anthropic (messages), Vercel, OpenRouter, Cloudflare
  (OpenAI-compatible gateways), and Bedrock (path-routed, bearer, via the
  messages shape) — covering both wire shapes and the gateway routing.
- all providers are created enabled with a unique model string so the
  proxy's connect-time snapshot carries them all and model->provider
  routing is unambiguous (provider toggles after connect don't reconcile
  to a connected proxy).
- the client supports both wire shapes (/v1/chat/completions and
  /v1/messages); Cloudflare gets the openai provider segment appended to
  its gateway URL.

Each provider must return 200 through the tunnel and produce an ingested
access-log row. Vertex is intentionally excluded from the uniform matrix:
it needs a bespoke rawPredict request shape rather than the shared
chat/messages path, so it warrants a dedicated scenario.

* [ci] Add manual workflow for the agent-network e2e suite

The e2e suite (build tag `e2e`) stands up the combined server + proxy +
client in Docker and drives live chat-completions, so it is slow and needs
provider credentials. Gate it out of normal CI (it already is, via the
build tag) and run it on demand via workflow_dispatch. Provider scenarios
skip when their secret is unset, so it degrades gracefully.

* [e2e] Add Vertex to the provider matrix; run e2e on ubuntu-latest

Vertex (Anthropic-on-Vertex) doesn't share the chat/messages wire shapes:
the model travels in a rawPredict path and the proxy mints the service
account's OAuth token. Add a Vertex client method that posts
/v1/projects/<project>/locations/<region>/publishers/anthropic/models/<model>:rawPredict
with the Vertex anthropic_version body, and wire it into the matrix as a
path-routed provider (created without a models array). It is keyed off
GOOGLE_VERTEX_SA_BASE64 + GOOGLE_VERTEX_PROJECT (region defaults to
"global", model to a pinned claude snapshot, both overridable).

Also bump the e2e workflow runner to ubuntu-latest and add the Vertex
secrets.

* Add docker/docker and docker/go-connections as direct dependencies in go.mod

* [ci] Trigger agent-network e2e workflow on push to main and pull requests

* [e2e] Fix proxy cert permission denied on Linux CI runners

The proxy bind-mounts a temp dir of self-signed certs. MkdirTemp creates
it 0700 and the key was 0600, which Docker Desktop on macOS ignores but a
non-root proxy container on Linux runners cannot traverse/read, so the
cert watcher failed with "open /certs/tls.crt: permission denied" and the
container exited. Widen the cert dir to 0755 and write the throwaway key
0644 so the proxy uid can read the bind-mounted material.

* [e2e] Build images from source by default instead of pulling rc.2

The agent-network code under test lives in this branch, so the e2e should
exercise it rather than a frozen published release. Flip the harness
default: combined/proxy/client are now built from their in-repo
Dockerfiles (combined/Dockerfile.multistage, proxy/Dockerfile.multistage,
e2e/harness/Dockerfile.client) under local tags. Pulling a published image
stays available by setting NB_E2E_*_IMAGE to a registry reference.

Builds now go through buildx --load so the Dockerfile cache mounts are
honored and the result is loaded for testcontainers. The CI workflow adds
a container-driver builder and a local layer cache (NB_E2E_BUILDX_CACHE)
persisted via actions/cache, which caches the base/apt/dep-download layers
across runs. The Go compile still re-runs each time, as BuildKit mount
caches cannot be exported to the GitHub cache.

* [e2e] Cover real providers in lifecycle + assert real consumption metering

- TestProviderLifecycle now runs per available real provider (create → get →
  list → delete → 404) instead of a single dummy provider, exercising each
  catalog's create and field round-trip. Create is offline, so it stays fast
  and burns no provider quota; falls back to a synthetic OpenAI provider when
  no keys are set.
- TestProvidersMatrix attaches a token limit (high caps, 60s window) to its
  policy, which switches on usage metering, and asserts consumption rows are
  recorded with positive token counts after the live traffic. Consumption is
  account-scoped (keyed by source group / user and window, not per provider),
  so the assertion is aggregate.
- TestProviderValidation gains invalid-upstream and blank-name cases. Create
  validation is uniform across catalogs (no per-provider required-field rules),
  so per-provider rejection cases would be redundant.

* [e2e] Assert session id propagates per provider

Each matrix request now sends a unique session id as the universal
x-session-id header and asserts it round-trips into that provider's
access-log row. This guards the session-grouping contract end to end for
every provider (header extraction runs in llm_request_parser ahead of the
parser-specific body extraction, so it is provider-agnostic).

* [e2e] Drop accidentally committed sync-phases dashboard

netbird-sync-phases.json was swept into the Pillar 1 commit by a broad
git add; it belongs to the unrelated sync-phases metrics work, not this
e2e harness. Remove it from the branch so the PR diff is scoped to the
e2e changes.

* [e2e] Revert accidentally committed sync-phase ingest spec

The netbird_sync_phase measurement spec in metrics ingest was swept into
the Pillar 1 commit; it belongs to the unrelated sync-phases metrics work,
not this e2e harness. Its emission side never landed here, so the spec was
orphaned anyway. Restore ingest/main.go to its origin/main state.

* Fix golint issues

* Fix sonar

* Add access log session test

* Fix access log tests

---------

Co-authored-by: braginini <bangvalo@gmail.com>
Co-authored-by: Zoltan Papp <zoltan.pmail@gmail.com>
2026-07-01 12:45:14 +02:00
Pascal Fischer
13200265d8 [proxy] Add no-blocking mapping updates (#6369) 2026-06-09 13:57:17 +02:00
Maycon Santos
fa1e241aea [management, client, proxy] Follow-up fixes for private reverse-proxy services (#6268)
* fix(proxy): gate tunnel-peer fast-path on inbound listener marker

forwardWithTunnelPeer previously accepted any RFC1918 / ULA / CGNAT
source IP, so a public client whose address happened to fall in those
ranges could bypass the configured operator auth scheme by colliding
with a known tunnel IP. The fast-path is now gated on
TunnelLookupFromContext(r.Context()) being present — that context value
is attached only by the per-account inbound (overlay) listener, so the
host-facing listener never enters this branch.

Tests updated to reflect the new requirement: requests that don't
carry the inbound marker now fall through to the regular auth flow.

* fix(proxy): harden inbound listener resource + startup-ctx handling

Three correctness fixes on the per-account inbound path, with tests:

- Close the logrus ErrorLog PipeWriter on tearDown. WriterLevel hands
  back an *io.PipeWriter backed by a pipe + scanner goroutine that the
  caller owns; the two writers per account (https + plain) were never
  closed, leaking the pipe and goroutine on every teardown.
- Run the post-Start hooks on context.Background(). runClientStartup
  is launched in a goroutine from AddPeer and was inheriting the
  caller's request-scoped ctx, so a cancelled request could abort the
  inbound bring-up or fail the management status notification. The
  tail is split into notifyClientReady so the contract is testable.

Tests cover the PipeWriter close behaviour and assert the readyHandler
+ NotifyStatus calls receive a non-cancelled background context.

* feat(proxy): short-circuit peer-own-target loops with 421

When a peer that hosts the target of a private service dials its own
service URL the request was being looped through the proxy and back
over WireGuard to the same peer — twice the WG round-trip for no
benefit, with no signal to the caller that something was wrong.

Add isSelfTargetLoop to ReverseProxy.ServeHTTP: when the request
arrived on the per-account overlay listener (IsOverlayOrigin) and the
source tunnel IP matches the target host, refuse the request with 421
Misdirected Request and a body pointing the operator at the backend
directly.

The gate is scoped to overlay origin so requests on the public
listener that happen to share a source IP with the target host are
forwarded normally.

* fix(management): private-service validation + tunnel-IP lookup semantics

- Require an explicit port for L4 cluster targets. validateL4Target
  exempted TargetTypeCluster from the port check, but buildPathMappings
  serializes every L4 target via net.JoinHostPort(host, port) — port=0
  shipped a ":0" upstream. Cluster targets use the same Host/Port
  fields, so the same requirement applies.
- GetPeerByIP returns NotFound on a tunnel-IP miss instead of mapping
  every error to Internal. The proxy's ValidateTunnelPeer probes IPs
  that legitimately aren't in the roster; the miss is expected and now
  distinguishable from a real store failure.
- Thread ctx into getClusterCapability's gorm query so a cancelled
  request doesn't keep the store busy.

Tests updated for the L4-cluster port requirement and the GetPeerByIP
NotFound path.

* fix(client): include offlinePeers in PeerStateByIP lookup

ReplaceOfflinePeers moves peers into d.offlinePeers but PeerStateByIP
only scanned d.peers. Callers (the local DNS filter via
localPeerConnectivity, embed.Client.IdentityForIP used by the
proxy's tunnel-peer validator) were treating known-but-offline peers
as unknown, which:

- causes the DNS filter to keep returning records pointing at peers
  that have no live tunnel, AND
- makes the proxy's local-roster check deny a request from such a
  peer rather than letting the cached management RPC carry the
  authorisation decision.

Search both slices in PeerStateByIP. Adds a unit test for the IPv4
and IPv6 offline-match paths.

* fix(rest): reject empty Delete path params in reverse-proxy clients

ReverseProxyClustersAPI.Delete and ReverseProxyTokensAPI.Delete passed
the path parameter into url.PathEscape without an empty check.
PathEscape("") returns "" which collapses the request onto the
collection endpoint ("/api/reverse-proxies/clusters/" /
"/api/reverse-proxies/proxy-tokens/"), so a caller bug delete with no
id reached a routable URL with surprising semantics (typically 405).

Short-circuit with a typed error before the request is built. Tests
mount a handler on the collection path that fails the test if hit, so
the regression is impossible to reintroduce silently.

* chore(api,ci,docs,test): private-service schema, proto-check, fixups

Non-functional cleanups and contract/CI hardening around the
private-service work:

API schema (openapi.yml):
- Require a non-empty access_groups and mode=http when private=true,
  on both Service and ServiceRequest, mirroring
  validatePrivateRequirements. mode stays optional-but-constrained
  (empty defaults to http server-side), matching runtime.

CI (proto-version-check.yml):
- Cover renamed .pb.go files (read base via previous_filename).
- Match protoc-gen-go-grpc version headers (optional "- " prefix and
  -gen-go-grpc suffix) so grpc-generated files are in scope.

Docs / comments:
- Reword Config field docs to say defaults are applied at Server.Start
  (initDefaults), not New.
- Rename the obsolete --private-inbound flag to --private across
  comments and the proto doc.

Pre-existing test fixups surfaced by review:
- Repair the integration-tagged validate_session_test.go (SignToken
  signature growth + new Manager interface methods).
- Fix the CI-skip boolean precedence so Windows isn't skipped
  unconditionally.
- Guard the router.HTTPListener type assertion with comma-ok.

* fix(proxy): background ctx for already-started AddPeer notification

The earlier ctx fix covered the async runClientStartup path but missed
the synchronous branch: when a service is added to an already-started
client, AddPeer called NotifyStatus with the caller's request-scoped
ctx. A cancelled request/stream could drop the connected notification
to management. Use context.Background() here too, matching
notifyClientReady.

Extends TestNetBird_AddPeer_ExistingStartedClient_NotifiesStatus to
pass a pre-cancelled caller ctx and assert the notification still ran
on a non-cancelled context.

* use the cmd context for roundtripper
2026-06-02 13:40:09 +02:00
Viktor Liu
e89b1e0596 [proxy, client] Bound embed client WireGuard per-Device memory (#5962) 2026-05-26 11:51:53 +02:00
Maycon Santos
7aebdd69dd [management, client, proxy] add expose NetBird-only services over tunnel peers (#6226)
Adds a new "private" service mode for the reverse proxy: services reachable exclusively over the embedded WireGuard tunnel, gated by per-peer group membership instead of operator auth schemes.

Wire contract
- ProxyMapping.private (field 13): the proxy MUST call ValidateTunnelPeer and fail closed; operator schemes are bypassed.
- ProxyCapabilities.private (4) + supports_private_service (5): capability gate. Management never streams private mappings to proxies that don't claim the capability; the broadcast path applies the same filter via filterMappingsForProxy.
- ValidateTunnelPeer RPC: resolves an inbound tunnel IP to a peer, checks the peer's groups against service.AccessGroups, and mints a session JWT on success. checkPeerGroupAccess fails closed when a private service has empty AccessGroups.
- ValidateSession/ValidateTunnelPeer responses now carry peer_group_ids + peer_group_names so the proxy can authorise policy-aware middlewares without an extra management round-trip.
- ProxyInboundListener + SendStatusUpdate.inbound_listener: per-account inbound listener state surfaced to dashboards.
- PathTargetOptions.direct_upstream (11): bypass the embedded NetBird client and dial the target via the proxy host's network stack for upstreams reachable without WireGuard.

Data model
- Service.Private (bool) + Service.AccessGroups ([]string, JSON- serialised). Validate() rejects bearer auth on private services. Copy() deep-copies AccessGroups. pgx getServices loads the columns.
- DomainConfig.Private threaded into the proxy auth middleware. Request handler routes private services through forwardWithTunnelPeer and returns 403 on validation failure.
- Account-level SynthesizePrivateServiceZones (synthetic DNS) and injectPrivateServicePolicies (synthetic ACL) gate on len(svc.AccessGroups) > 0.

Proxy
- /netbird proxy --private (embedded mode) flag; Config.Private in proxy/lifecycle.go.
- Per-account inbound listener (proxy/inbound.go) binding HTTP/HTTPS on the embedded NetBird client's WireGuard tunnel netstack.
- proxy/internal/auth/tunnel_cache: ValidateTunnelPeer response cache with single-flight de-duplication and per-account eviction.
- Local peerstore short-circuit: when the inbound IP isn't in the account roster, deny fast without an RPC.
- proxy/server.go reports SupportsPrivateService=true and redacts the full ProxyMapping JSON from info logs (auth_token + header-auth hashed values now only at debug level).

Identity forwarding
- ValidateSessionJWT returns user_id, email, method, groups, group_names. sessionkey.Claims carries Email + Groups + GroupNames so the proxy can stamp identity onto upstream requests without an extra management round-trip on every cookie-bearing request.
- CapturedData carries userEmail / userGroups / userGroupNames; the proxy stamps X-NetBird-User and X-NetBird-Groups on r.Out from the authenticated identity (strips client-supplied values first to prevent spoofing).
- AccessLog.UserGroups: access-log enrichment captures the user's group memberships at write time so the dashboard can render group context without reverse-resolving stale memberships.

OpenAPI/dashboard surface
- ReverseProxyService gains private + access_groups; ReverseProxyCluster gains private + supports_private. ReverseProxyTarget target_type enum gains "cluster". ServiceTargetOptions gains direct_upstream. ProxyAccessLog gains user_groups.
2026-05-25 17:41:50 +02:00
Pascal Fischer
6137a1fcc5 [proxy] concurrent proxy snapshot apply (#6207) 2026-05-20 18:21:22 +02:00
Viktor Liu
a852b3bd34 [client, proxy] Harden uspfilter conntrack and share TCP relay (#5936) 2026-05-11 09:59:13 +02:00
Viktor Liu
205ebcfda2 [management, client] Add IPv6 overlay support (#5631) 2026-05-07 11:33:37 +02:00
Viktor Liu
057d651d2e [client, proxy] Add packet capture to debug bundle and CLI (#5891) 2026-05-04 11:28:56 +02:00
alsruf36
c07c726ea7 [proxy] Set session cookie path to root (#5915) 2026-04-23 18:20:54 +02:00
Pascal Fischer
46fc8c9f65 [proxy] direct redirect to SSO (#5874) 2026-04-14 13:47:02 +02:00
Viktor Liu
0a30b9b275 [management, proxy] Add CrowdSec IP reputation integration for reverse proxy (#5722) 2026-04-14 12:14:58 +02:00
Viktor Liu
0fc63ea0ba [management] Allow multiple header auths with same header name (#5678) 2026-03-24 16:18:21 +01:00
Viktor Liu
82762280ee [client] Add health check flag to status command and expose daemon status in output (#5650) 2026-03-22 12:39:40 +01:00
Viktor Liu
387e374e4b [proxy, management] Add header auth, access restrictions, and session idle timeout (#5587) 2026-03-16 15:22:00 +01:00
Viktor Liu
3e6baea405 [management,proxy,client] Add L4 capabilities (TLS/TCP/UDP) (#5530) 2026-03-13 18:36:44 +01:00
Pascal Fischer
c545689448 [proxy] Wildcard certificate support (#5583) 2026-03-12 16:00:28 +01:00
Pascal Fischer
d3d6a327e0 [proxy] read cert from disk if available instead of cert manager (#5574)
* **New Features**
  * Asynchronous certificate prefetch that races live issuance with periodic on-disk cache checks to surface certificates faster.
  * Centralized recording and notification when certificates become available.
  * New on-disk certificate reading and validation to allow immediate use of cached certs.

* **Bug Fixes & Performance**
  * Optimized retrieval by polling disk while fetching in background to reduce latency.
  * Added cancellation and timeout handling to fail stalled certificate operations reliably.
2026-03-11 19:18:37 +01:00
Pascal Fischer
f884299823 [proxy] refactor metrics and add usage logs (#5533)
* **New Features**
  * Access logs now include bytes_upload and bytes_download (API and schemas updated, fields required).
  * Certificate issuance duration is now recorded as a metric.

* **Refactor**
  * Metrics switched from Prometheus client to OpenTelemetry-backed meters; health endpoint now exposes OpenMetrics via OTLP exporter.

* **Tests**
  * Metric tests updated to use OpenTelemetry Prometheus exporter and MeterProvider.
2026-03-09 18:45:45 +01:00
Viktor Liu
e601278117 [management,proxy] Add per-target options to reverse proxy (#5501) 2026-03-05 10:03:26 +01:00
Pascal Fischer
d7c8e37ff4 [management] Store connected proxies in DB (#5472)
Co-authored-by: mlsmaycon <mlsmaycon@gmail.com>
2026-03-03 18:39:46 +01:00
Viktor Liu
bbe5ae2145 [client] Flush buffer immediately to support gprc (#5469) 2026-03-02 15:17:08 +01:00
Pascal Fischer
9d123ec059 [proxy] add pre-shared key support (#5377) 2026-02-23 16:31:29 +01:00
Zoltan Papp
1bd7190954 [proxy] Support WebSocket (#5312)
* Fix WebSocket support by implementing Hijacker interface

Add responsewriter.PassthroughWriter to preserve optional HTTP interfaces
(Hijacker, Flusher, Pusher) when wrapping http.ResponseWriter in middleware.

Without this delegation:
 - WebSocket connections fail (can't hijack the connection)
 - Streaming breaks (can't flush buffers)
 - HTTP/2 push doesn't work

* Add HijackTracker to manage hijacked connections during graceful shutdown

* Refactor HijackTracker to use middleware for tracking hijacked connections

* Refactor server handler chain setup for improved readability and maintainability
2026-02-17 12:53:34 +01:00
Pascal Fischer
f53155562f [management, reverse proxy] Add reverse proxy feature (#5291)
* implement reverse proxy


---------

Co-authored-by: Alisdair MacLeod <git@alisdairmacleod.co.uk>
Co-authored-by: mlsmaycon <mlsmaycon@gmail.com>
Co-authored-by: Eduard Gert <kontakt@eduardgert.de>
Co-authored-by: Viktor Liu <viktor@netbird.io>
Co-authored-by: Diego Noguês <diego.sure@gmail.com>
Co-authored-by: Diego Noguês <49420+diegocn@users.noreply.github.com>
Co-authored-by: Bethuel Mmbaga <bethuelmbaga12@gmail.com>
Co-authored-by: Zoltan Papp <zoltan.pmail@gmail.com>
Co-authored-by: Ashley Mensah <ashleyamo982@gmail.com>
2026-02-13 19:37:43 +01:00