mirror of
https://github.com/netbirdio/netbird.git
synced 2026-07-03 04:32:27 -04:00
* [management,proxy] Agent network: per-account LLM gateway (policy, metering, multi-provider) (#6555) * [agent-network] Shared proto, OpenAPI schema, and generated types * [agent-network] Management: store, manager, synthesizer, policy engine, provider catalog, HTTP/gRPC API Adds the account-scoped agent-network module: provider/policy/budget CRUD and store, the reverse-proxy service synthesizer, policy selection + limit enforcement, the provider catalog (incl. Vertex AI and AWS Bedrock entries), and the management HTTP + proxy gRPC surfaces. * [management] Fix agent-network proxy-peer fan-out on affected-peer recompute The affected-peers resolver loaded only persisted reverse-proxy services, but agent-network services are synthesized on demand and never persisted. As a result the embedded proxy peer was never folded into the affected set when a client's group changed, so the proxy received no network-map update for a newly authorised client and rejected its handshake until a full resync (restart). loadProxyServices now merges the synthesized agent-network services (injected via a registration hook to avoid an import cycle), so proxy peers learn newly authorised clients immediately. * [proxy] Reverse-proxy middleware framework, chain, and request plumbing The per-target middleware chain (slots, dispatcher, mutation gate, metadata merger), body capture, access-log terminal sink, and the proxy wiring that builds + runs chains for synthesized agent-network services. * [proxy] LLM parsers, pricing, and builtin middlewares (OpenAI, Anthropic, Vertex AI, AWS Bedrock) Request/response parsers and SSE/event-stream metering, the embedded pricing table, and the builtin middleware set: request parser, router, policy limit-check/record, cost meter, guardrail, identity inject, response parser. Includes the path-routed providers — Google Vertex AI (keyfile:: service-account OAuth minting) and AWS Bedrock (bearer auth, invoke/converse/streaming, optional /bedrock prefix) — plus the Models allowlist and unmeterable-publisher deny. * [proxy] IPv6 in-place apply and TCP accept-loop hardening on netstack listeners * [agent-network] End-to-end test suite, module docs, and deployment preset * [agent-network] Fix codespell typos and exclude false positives - labelgen word pool: vermillion -> vermilion, racoon -> raccoon. - codespell ignore list: add flate (Go compress/flate package), recordin (a test-local identifier), and unparseable (a valid alternative spelling used consistently across identifiers + a metadata-value constant). * [management] Set LastSeen on injected proxy peer in realstack test (MySQL strict-mode) The injected embedded proxy peer had a PeerStatus with a zero LastSeen, which serializes to '0000-00-00' and is rejected by MySQL in strict mode (SQLite tolerates it). Set LastSeen to a valid time so SaveAccount succeeds on both engines. * [agent-network] Remove e2e shell-script suite from this branch The end-to-end shell scripts under scripts/e2e/ are maintained in a separate testing suite and are not part of this change set. * [agent-network] Polish module docs: remove internal review scaffolding, fix links, verify diagrams Strip PR-review framing, commit references, absolute paths, and stale internal references from the agent-network module docs; fix broken relative links; verify all diagrams against the current architecture. Remove the internal AI-reviewer prompt file. * [management] Refine session expiration handling to support 3-state encoding for SSO deadlines * [agent-network] Relocate agentnetwork package to internals/modules Move management/server/agentnetwork (and its catalog/, labelgen/, types/ subpackages) to management/internals/modules/agentnetwork, alongside the reverse-proxy module, and rewrite all importers. Pure relocation: package names, the synthesizer + affectedpeers registration hook, and store access (shared store.Store) are unchanged, so no import cycle is introduced (affectedpeers still depends only on the agentnetwork/types leaf). * [agent-network] Co-locate HTTP handlers in the module (RegisterEndpoints) Move the agent-network HTTP handlers from server/http/handlers/agentnetwork into the module at internals/modules/agentnetwork/handlers (package handlers) and rename the entrypoint AddEndpoints -> RegisterEndpoints, matching the reverse-proxy module convention. Wiring in http/handler.go updated accordingly. * Update getting started to point to rc when agent network enabled * Add a reference to a commercial license * Fix docs localhost link * Fix docs localhost link * Add private services domain note * [management] Add agent-network telemetry metrics (#6561) Surface agent-network adoption and usage in the self-hosted metrics worker: distinct accounts, providers, policies, budget rules, accounts with log collection enabled, and aggregated input/output tokens plus cost. Tokens and cost are summed from agent_network_request_usage (the always-written per-request ledger) so the figures are accurate regardless of the log-collection toggle and carry no double-counting. All values come from a handful of indexed aggregate queries run only on the worker's periodic tick. Adds store.AgentNetworkMetrics with GetAgentNetworkMetrics on the Store interface, the SqlStore implementation, and a zero-valued FileStore stub. * Update NetBird server and proxy image versions to 0.74.0-rc.2 * [management,proxy] Reduce agent-network cognitive complexity (#6566) Address the SonarCloud quality-gate findings in new agent-network code by extracting focused helpers. No behavior change. - synthesizer.go: split buildIdentityInjectConfigJSON into per-shape rule builders; extract mergeGuardrail from mergeGuardrails to cut nesting depth. - llm_identity_inject: extract injectionEmitsAnything validation predicate from New. - llm_response_parser/streaming.go: extract applyOpenAIStreamUsage and applyAnthropicStreamUsage (via a named anthropicStreamUsage type) and simplify the OpenAI scanner loop. - reverseproxy.go: decompose ServeHTTP into serveRouteError, buildTargetContext, serveDirect, serveWithChain, captureRequestForChain, serveDeny, newResponseWriter, observeResponse, and forwardUpstream, preserving the defer ordering so response observation still reads the captured writer before it is released. * [management] Move agent-network access-log ingest into the agentnetwork module (#6568) The agent-network access-log ingest path (metaKey wire contract, flatten, usage derivation, and the dual-write of the usage ledger + settings-gated full row) lived in the reverseproxy accesslogs manager, even though the agentnetwork module already owns the rest of that domain — types, read (ListAccessLogs / GetUsageOverview), the budget-counter writes, and retention cleanup. Move it next to the rest: a stateless agentnetwork.IngestAccessLog(ctx, store, entry) that the reverseproxy SaveAccessLog delegates to when the entry is agent-network. Removes the agentNetworkTypes import from the reverseproxy manager. No behavior change; the write/read table separation is unchanged. Adds real-store coverage for the disable->enable log-collection toggle (usage ledger always written, full row gated) plus the metadata parse and group-dedup helpers, which previously had no dedicated tests. * Add session view support in the access log * [management,proxy] Container-based agent-network e2e harness (#6577) * [e2e] Add container-based agent-network e2e harness (Pillar 1) Introduce a self-contained, OIDC-free e2e harness that stands up NetBird in containers, so suites no longer depend on the hand-maintained Tilt stack or a real IdP. - harness brings up the combined server (management + signal + relay + STUN + embedded IdP) in a single container built from combined/Dockerfile.multistage, and mints an admin PAT through the unauthenticated /api/setup bootstrap (NB_SETUP_PAT_ENABLED). API access goes through the existing shared/management/client/rest typed client. - the image is built via the docker CLI (BuildKit) so the Dockerfile's cache mounts are honored; testcontainers then runs the tagged image. - everything is behind the `e2e` build tag so normal builds and unit tests never pull in testcontainers. Adds BuildKit cache mounts to combined/Dockerfile.multistage so source changes recompile incrementally rather than from scratch. Pillar 1 proven by TestCombinedBootstrap: server builds, boots, mints a PAT, and the PAT authenticates a real management API call. * [e2e] Add management-side agent-network scenarios (Pillar 2) Port the API-driven agent-network scenarios from the bash suites to Go, sharing one combined server per package run (TestMain) with each test owning its resource cleanup. Drives the /api/agent-network/* endpoints through the shared REST client's NewRequest primitive with the generated api types. Scenarios: - provider lifecycle (create/get/list/delete + 404 after delete) - provider validation (missing api_key, unknown catalog id → 4xx) - settings collection-toggle round-trip with cluster/subdomain immutability - policy window floor (reject <60s enabled limit, accept at 60s) - consumption read endpoint returns an array All deterministic and dependency-free (dummy provider keys; no upstream calls), so they run headless in CI. * [e2e] Add live chat-through-proxy scenario (Pillar 3) Stand up the full agent-network data path in containers and drive a real chat-completion through the gateway: - harness: a shared docker network (combined server reachable by alias), a proxy container built from the published reverse-proxy image (NB_PROXY_PRIVATE, NB_PROXY_ALLOW_INSECURE, NB_RELAY_TRANSPORT=ws to match the combined server's WS-multiplexed relay) with a generated self-signed wildcard cert, and a netbird client container that joins via a setup key. - the combined image, proxy image, and client image default to the published rc.2 releases (overridable via NB_E2E_*_IMAGE; a bare local tag is built from source instead). Geolocation download is disabled so the server starts without external fetches. - one shared domain is used for the management exposed address, the proxy domain, and the agent-network cluster; the proxy token is minted via the server CLI (global) to match the manual install. TestChatCompletionThroughProxy provisions provider+policy+group+setup key, runs proxy+client, drives an OpenAI chat-completion through the tunnel, and asserts a 200 plus the ingested access-log row. Requires OPENAI_TOKEN (skips otherwise). The provider must be created with enabled=true explicitly — the create default is false despite the API doc. * [e2e] Run the live chat scenario across a provider matrix Replace the single-provider chat test with a data-driven matrix that runs the same scenario through every provider whose credentials are present in the environment (keys/URLs sourced from ~/.llm-keys locally, Actions secrets in CI): - OpenAI (chat), Anthropic (messages), Vercel, OpenRouter, Cloudflare (OpenAI-compatible gateways), and Bedrock (path-routed, bearer, via the messages shape) — covering both wire shapes and the gateway routing. - all providers are created enabled with a unique model string so the proxy's connect-time snapshot carries them all and model->provider routing is unambiguous (provider toggles after connect don't reconcile to a connected proxy). - the client supports both wire shapes (/v1/chat/completions and /v1/messages); Cloudflare gets the openai provider segment appended to its gateway URL. Each provider must return 200 through the tunnel and produce an ingested access-log row. Vertex is intentionally excluded from the uniform matrix: it needs a bespoke rawPredict request shape rather than the shared chat/messages path, so it warrants a dedicated scenario. * [ci] Add manual workflow for the agent-network e2e suite The e2e suite (build tag `e2e`) stands up the combined server + proxy + client in Docker and drives live chat-completions, so it is slow and needs provider credentials. Gate it out of normal CI (it already is, via the build tag) and run it on demand via workflow_dispatch. Provider scenarios skip when their secret is unset, so it degrades gracefully. * [e2e] Add Vertex to the provider matrix; run e2e on ubuntu-latest Vertex (Anthropic-on-Vertex) doesn't share the chat/messages wire shapes: the model travels in a rawPredict path and the proxy mints the service account's OAuth token. Add a Vertex client method that posts /v1/projects/<project>/locations/<region>/publishers/anthropic/models/<model>:rawPredict with the Vertex anthropic_version body, and wire it into the matrix as a path-routed provider (created without a models array). It is keyed off GOOGLE_VERTEX_SA_BASE64 + GOOGLE_VERTEX_PROJECT (region defaults to "global", model to a pinned claude snapshot, both overridable). Also bump the e2e workflow runner to ubuntu-latest and add the Vertex secrets. * Add docker/docker and docker/go-connections as direct dependencies in go.mod * [ci] Trigger agent-network e2e workflow on push to main and pull requests * [e2e] Fix proxy cert permission denied on Linux CI runners The proxy bind-mounts a temp dir of self-signed certs. MkdirTemp creates it 0700 and the key was 0600, which Docker Desktop on macOS ignores but a non-root proxy container on Linux runners cannot traverse/read, so the cert watcher failed with "open /certs/tls.crt: permission denied" and the container exited. Widen the cert dir to 0755 and write the throwaway key 0644 so the proxy uid can read the bind-mounted material. * [e2e] Build images from source by default instead of pulling rc.2 The agent-network code under test lives in this branch, so the e2e should exercise it rather than a frozen published release. Flip the harness default: combined/proxy/client are now built from their in-repo Dockerfiles (combined/Dockerfile.multistage, proxy/Dockerfile.multistage, e2e/harness/Dockerfile.client) under local tags. Pulling a published image stays available by setting NB_E2E_*_IMAGE to a registry reference. Builds now go through buildx --load so the Dockerfile cache mounts are honored and the result is loaded for testcontainers. The CI workflow adds a container-driver builder and a local layer cache (NB_E2E_BUILDX_CACHE) persisted via actions/cache, which caches the base/apt/dep-download layers across runs. The Go compile still re-runs each time, as BuildKit mount caches cannot be exported to the GitHub cache. * [e2e] Cover real providers in lifecycle + assert real consumption metering - TestProviderLifecycle now runs per available real provider (create → get → list → delete → 404) instead of a single dummy provider, exercising each catalog's create and field round-trip. Create is offline, so it stays fast and burns no provider quota; falls back to a synthetic OpenAI provider when no keys are set. - TestProvidersMatrix attaches a token limit (high caps, 60s window) to its policy, which switches on usage metering, and asserts consumption rows are recorded with positive token counts after the live traffic. Consumption is account-scoped (keyed by source group / user and window, not per provider), so the assertion is aggregate. - TestProviderValidation gains invalid-upstream and blank-name cases. Create validation is uniform across catalogs (no per-provider required-field rules), so per-provider rejection cases would be redundant. * [e2e] Assert session id propagates per provider Each matrix request now sends a unique session id as the universal x-session-id header and asserts it round-trips into that provider's access-log row. This guards the session-grouping contract end to end for every provider (header extraction runs in llm_request_parser ahead of the parser-specific body extraction, so it is provider-agnostic). * [e2e] Drop accidentally committed sync-phases dashboard netbird-sync-phases.json was swept into the Pillar 1 commit by a broad git add; it belongs to the unrelated sync-phases metrics work, not this e2e harness. Remove it from the branch so the PR diff is scoped to the e2e changes. * [e2e] Revert accidentally committed sync-phase ingest spec The netbird_sync_phase measurement spec in metrics ingest was swept into the Pillar 1 commit; it belongs to the unrelated sync-phases metrics work, not this e2e harness. Its emission side never landed here, so the spec was orphaned anyway. Restore ingest/main.go to its origin/main state. * Fix golint issues * Fix sonar * Add access log session test * Fix access log tests --------- Co-authored-by: braginini <bangvalo@gmail.com> Co-authored-by: Zoltan Papp <zoltan.pmail@gmail.com>
148 lines
7.4 KiB
Go
148 lines
7.4 KiB
Go
package http
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"net/http"
|
|
"net/netip"
|
|
|
|
"github.com/gorilla/mux"
|
|
"github.com/rs/cors"
|
|
log "github.com/sirupsen/logrus"
|
|
|
|
"github.com/netbirdio/netbird/management/internals/modules/reverseproxy/domain/manager"
|
|
|
|
"github.com/netbirdio/netbird/management/server/types"
|
|
|
|
"github.com/netbirdio/netbird/management/internals/modules/reverseproxy/accesslogs"
|
|
"github.com/netbirdio/netbird/management/internals/modules/reverseproxy/proxytoken"
|
|
"github.com/netbirdio/netbird/management/internals/modules/reverseproxy/service"
|
|
reverseproxymanager "github.com/netbirdio/netbird/management/internals/modules/reverseproxy/service/manager"
|
|
|
|
nbgrpc "github.com/netbirdio/netbird/management/internals/shared/grpc"
|
|
idpmanager "github.com/netbirdio/netbird/management/server/idp"
|
|
|
|
"github.com/netbirdio/netbird/management/internals/controllers/network_map"
|
|
"github.com/netbirdio/netbird/management/internals/modules/agentnetwork"
|
|
agentnetworkhandlers "github.com/netbirdio/netbird/management/internals/modules/agentnetwork/handlers"
|
|
"github.com/netbirdio/netbird/management/internals/modules/zones"
|
|
zonesManager "github.com/netbirdio/netbird/management/internals/modules/zones/manager"
|
|
"github.com/netbirdio/netbird/management/internals/modules/zones/records"
|
|
recordsManager "github.com/netbirdio/netbird/management/internals/modules/zones/records/manager"
|
|
"github.com/netbirdio/netbird/management/server/account"
|
|
"github.com/netbirdio/netbird/management/server/settings"
|
|
|
|
"github.com/netbirdio/netbird/management/server/permissions"
|
|
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/proxy"
|
|
|
|
"github.com/netbirdio/netbird/management/server/auth"
|
|
"github.com/netbirdio/netbird/management/server/geolocation"
|
|
nbgroups "github.com/netbirdio/netbird/management/server/groups"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/accounts"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/dns"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/events"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/groups"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/idp"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/instance"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/networks"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/peers"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/policies"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/routes"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/setup_keys"
|
|
"github.com/netbirdio/netbird/management/server/http/handlers/users"
|
|
"github.com/netbirdio/netbird/management/server/http/middleware"
|
|
"github.com/netbirdio/netbird/management/server/http/middleware/bypass"
|
|
nbinstance "github.com/netbirdio/netbird/management/server/instance"
|
|
nbnetworks "github.com/netbirdio/netbird/management/server/networks"
|
|
"github.com/netbirdio/netbird/management/server/networks/resources"
|
|
"github.com/netbirdio/netbird/management/server/networks/routers"
|
|
"github.com/netbirdio/netbird/management/server/telemetry"
|
|
)
|
|
|
|
// NewAPIHandler creates the Management service HTTP API handler registering all the available endpoints.
|
|
func NewAPIHandler(ctx context.Context, router *mux.Router, accountManager account.Manager, networksManager nbnetworks.Manager, resourceManager resources.Manager, routerManager routers.Manager, groupsManager nbgroups.Manager, LocationManager geolocation.Geolocation, authManager auth.Manager, appMetrics telemetry.AppMetrics, permissionsManager permissions.Manager, settingsManager settings.Manager, zManager zones.Manager, rManager records.Manager, networkMapController network_map.Controller, idpManager idpmanager.Manager, serviceManager service.Manager, reverseProxyDomainManager *manager.Manager, reverseProxyAccessLogsManager accesslogs.Manager, proxyGRPCServer *nbgrpc.ProxyServiceServer, trustedHTTPProxies []netip.Prefix, rateLimiter *middleware.APIRateLimiter, isValidChildAccount middleware.IsValidChildAccountFunc, agentNetworkManager agentnetwork.Manager) (http.Handler, error) {
|
|
|
|
// Register bypass paths for unauthenticated endpoints
|
|
if err := bypass.AddBypassPath("/api/instance"); err != nil {
|
|
return nil, fmt.Errorf("failed to add bypass path: %w", err)
|
|
}
|
|
if err := bypass.AddBypassPath("/api/setup"); err != nil {
|
|
return nil, fmt.Errorf("failed to add bypass path: %w", err)
|
|
}
|
|
// Public invite endpoints (tokens start with nbi_)
|
|
if err := bypass.AddBypassPath("/api/users/invites/nbi_*"); err != nil {
|
|
return nil, fmt.Errorf("failed to add bypass path: %w", err)
|
|
}
|
|
if err := bypass.AddBypassPath("/api/users/invites/nbi_*/accept"); err != nil {
|
|
return nil, fmt.Errorf("failed to add bypass path: %w", err)
|
|
}
|
|
// OAuth callback for proxy authentication
|
|
if err := bypass.AddBypassPath(types.ProxyCallbackEndpointFull); err != nil {
|
|
return nil, fmt.Errorf("failed to add bypass path: %w", err)
|
|
}
|
|
|
|
if rateLimiter == nil {
|
|
log.Warn("NewAPIHandler: nil rate limiter, rate limiting disabled")
|
|
rateLimiter = middleware.NewAPIRateLimiter(nil)
|
|
rateLimiter.SetEnabled(false)
|
|
}
|
|
|
|
authMiddleware := middleware.NewAuthMiddleware(
|
|
authManager,
|
|
accountManager.GetAccountIDFromUserAuth,
|
|
accountManager.SyncUserJWTGroups,
|
|
accountManager.GetUserFromUserAuth,
|
|
rateLimiter,
|
|
appMetrics.GetMeter(),
|
|
isValidChildAccount,
|
|
)
|
|
|
|
corsMiddleware := cors.AllowAll()
|
|
|
|
metricsMiddleware := appMetrics.HTTPMiddleware()
|
|
|
|
router.Use(metricsMiddleware.Handler, corsMiddleware.Handler, authMiddleware.Handler)
|
|
|
|
instanceManager, err := nbinstance.NewManager(ctx, accountManager.GetStore(), idpManager)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to create instance manager: %w", err)
|
|
}
|
|
|
|
accounts.AddEndpoints(accountManager, settingsManager, router)
|
|
peers.AddEndpoints(accountManager, router, networkMapController, permissionsManager)
|
|
users.AddEndpoints(accountManager, router)
|
|
users.AddInvitesEndpoints(accountManager, router)
|
|
users.AddPublicInvitesEndpoints(accountManager, router)
|
|
setup_keys.AddEndpoints(accountManager, router)
|
|
policies.AddEndpoints(accountManager, LocationManager, router)
|
|
policies.AddPostureCheckEndpoints(accountManager, LocationManager, router)
|
|
policies.AddLocationsEndpoints(accountManager, LocationManager, permissionsManager, router)
|
|
groups.AddEndpoints(accountManager, router)
|
|
routes.AddEndpoints(accountManager, router)
|
|
dns.AddEndpoints(accountManager, router)
|
|
events.AddEndpoints(accountManager, router)
|
|
networks.AddEndpoints(networksManager, resourceManager, routerManager, groupsManager, accountManager, router)
|
|
zonesManager.RegisterEndpoints(router, zManager)
|
|
recordsManager.RegisterEndpoints(router, rManager)
|
|
idp.AddEndpoints(accountManager, router)
|
|
if agentNetworkManager != nil {
|
|
agentnetworkhandlers.RegisterEndpoints(agentNetworkManager, router)
|
|
}
|
|
instance.AddEndpoints(instanceManager, accountManager, router)
|
|
instance.AddVersionEndpoint(instanceManager, router)
|
|
if serviceManager != nil && reverseProxyDomainManager != nil {
|
|
reverseproxymanager.RegisterEndpoints(serviceManager, *reverseProxyDomainManager, reverseProxyAccessLogsManager, permissionsManager, router)
|
|
}
|
|
|
|
proxytoken.RegisterEndpoints(accountManager.GetStore(), permissionsManager, router)
|
|
|
|
// Register OAuth callback handler for proxy authentication
|
|
if proxyGRPCServer != nil {
|
|
oauthHandler := proxy.NewAuthCallbackHandler(proxyGRPCServer, trustedHTTPProxies)
|
|
oauthHandler.RegisterEndpoints(router)
|
|
}
|
|
|
|
return router, nil
|
|
}
|