Signal Service - Peers are reconnecting every minute #1118

Closed
opened 2025-11-20 05:24:21 -05:00 by saavagebueno · 5 comments
Owner

Originally created by @da-wilky on GitHub (Jul 31, 2024).

Hello,

Describe the problem

Im using docker to host my own netbird management service behind traefik reverse-proxy. Everything is working, but my signal service is permently logging reconnects of the peers. Im curious if this is the expected behavior.

Example logs

signal-1  | 2024-07-31T08:44:33Z INFO signal/server/signal.go:110: peer connected [<client1>] [streamID 1722415473781694381] 
signal-1  | 2024-07-31T08:44:38Z INFO signal/server/signal.go:110: peer connected [<client2>] [streamID 1722415478424239799] 
signal-1  | 2024-07-31T08:44:56Z INFO signal/server/signal.go:110: peer connected [<client3>] [streamID 1722415496841138613] 
signal-1  | 2024-07-31T08:44:59Z INFO signal/server/signal.go:110: peer connected [<client4>] [streamID 1722415499259544723] 
signal-1  | 2024-07-31T08:44:59Z INFO signal/server/signal.go:110: peer connected [<client5>] [streamID 1722415499922935186] 
signal-1  | 2024-07-31T08:45:01Z INFO signal/server/signal.go:110: peer connected [<client6>] [streamID 1722415501822241363] 
signal-1  | 2024-07-31T08:45:33Z INFO signal/server/signal.go:95: peer disconnected [<client1>] [streamID 1722415473781694381] 
signal-1  | 2024-07-31T08:45:34Z INFO signal/server/signal.go:110: peer connected [<client1>] [streamID 1722415534361482785] 
signal-1  | 2024-07-31T08:45:38Z INFO signal/server/signal.go:95: peer disconnected [<client2>] [streamID 1722415478424239799] 
signal-1  | 2024-07-31T08:45:39Z INFO signal/server/signal.go:110: peer connected [<client2>] [streamID 1722415539494496594] 
signal-1  | 2024-07-31T08:45:56Z INFO signal/server/signal.go:95: peer disconnected [<client3>] [streamID 1722415496841138613] 
signal-1  | 2024-07-31T08:45:57Z INFO signal/server/signal.go:110: peer connected [<client3>] [streamID 1722415557714556130] 
signal-1  | 2024-07-31T08:45:59Z INFO signal/server/signal.go:95: peer disconnected [<client4>] [streamID 1722415499259544723] 
signal-1  | 2024-07-31T08:45:59Z INFO signal/server/signal.go:95: peer disconnected [<client5>] [streamID 1722415499922935186] 
signal-1  | 2024-07-31T08:46:00Z INFO signal/server/signal.go:110: peer connected [<client5>] [streamID 1722415560435622333] 
signal-1  | 2024-07-31T08:46:00Z INFO signal/server/signal.go:110: peer connected [<client4>] [streamID 1722415560465990002] 
signal-1  | 2024-07-31T08:46:01Z INFO signal/server/signal.go:95: peer disconnected [<client6>] [streamID 1722415501822241363] 
signal-1  | 2024-07-31T08:46:03Z INFO signal/server/signal.go:110: peer connected [<client6>] [streamID 1722415563203704936] 
signal-1  | 2024-07-31T08:46:34Z INFO signal/server/signal.go:95: peer disconnected [<client1>] [streamID 1722415534361482785] 
signal-1  | 2024-07-31T08:46:35Z INFO signal/server/signal.go:110: peer connected [<client1>] [streamID 1722415595862371424]

On the clients the following logs are produced:

2024-07-31T10:45:33+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-07-31T10:45:34+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream
2024-07-31T10:46:34+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-07-31T10:46:35+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream
2024-07-31T10:47:35+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-07-31T10:47:37+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream
2024-07-31T10:48:37+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-07-31T10:48:38+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream
2024-07-31T10:49:38+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-07-31T10:49:40+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream
2024-07-31T10:50:40+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-07-31T10:50:40+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream
2024-07-31T10:51:40+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers
2024-07-31T10:51:40+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream

So it reconnects every minute because of an rpc error. But just by googling I cant figure out why.
Again - everything is still working fine - at least I cant recognize any errors apart those logs.

To Reproduce

Steps to reproduce the behavior:

  1. Host your own netbird with traefik proxy.
version: "3"
services:
  #UI dashboard
  dashboard:
    image: netbirdio/dashboard:latest
    restart: unless-stopped
    #ports:
    #  - 80:80
    #  - 443:443
    environment:
      # Endpoints
      - NETBIRD_MGMT_API_ENDPOINT=https://mydomain.com:443
      - NETBIRD_MGMT_GRPC_API_ENDPOINT=https://mydomain.com:443
      # OIDC
      - AUTH_AUDIENCE=netbird-client
      - AUTH_CLIENT_ID=netbird-client
      - AUTH_CLIENT_SECRET=
      - AUTH_AUTHORITY=https://mykeycloak.com/realms/netbird
      - USE_AUTH0=false
      - AUTH_SUPPORTED_SCOPES=openid profile email offline_access api
      - AUTH_REDIRECT_URI=
      - AUTH_SILENT_REDIRECT_URI=
      - NETBIRD_TOKEN_SOURCE=accessToken
      # SSL
      - NGINX_SSL_PORT=443
    networks:
      default:
      traefik_net:
    labels:
      - traefik.enable=true
      - traefik.http.routers.netbird-dashboard.entrypoints=http
      - traefik.http.routers.netbird-dashboard.rule=Host(`mydomain.com`)
      - traefik.http.routers.netbird-dashboard.middlewares=to-https@file
      - traefik.http.routers.netbird-dashboard-secure.entrypoints=https
      - traefik.http.routers.netbird-dashboard-secure.rule=Host(`mydomain.com`)
      - traefik.http.routers.netbird-dashboard-secure.service=netbird-dashboard
      - traefik.http.services.netbird-dashboard.loadbalancer.server.port=80

  # Signal
  signal:
    image: netbirdio/signal:latest
    restart: unless-stopped
    volumes:
      - netbird-signal:/var/lib/netbird
    #ports:
    #  - 10000:80
    ## port and command for Let's Encrypt validation
    #  - 443:443
    #  command: ["--letsencrypt-domain", "", "--log-file", "console"]
    networks:
      default:
      traefik_net:
    labels:
    - traefik.enable=true
    - traefik.http.routers.netbird-signal.entrypoints=http
    - traefik.http.routers.netbird-signal.rule=Host(`mydomain.com`) && PathPrefix(`/signalexchange.SignalExchange/`)
    - traefik.http.routers.netbird-signal.middlewares=to-https@file
    - traefik.http.routers.netbird-signal-secure.entrypoints=https
    - traefik.http.routers.netbird-signal-secure.rule=Host(`mydomain.com`) && PathPrefix(`/signalexchange.SignalExchange`)
    - traefik.http.routers.netbird-signal-secure.service=netbird-signal
    - traefik.http.services.netbird-signal.loadbalancer.server.port=80
    - traefik.http.services.netbird-signal.loadbalancer.server.scheme=h2c

  # Management
  management:
    image: netbirdio/management:latest
    restart: unless-stopped
    depends_on:
      - dashboard
    volumes:
      - netbird-mgmt:/var/lib/netbird
      - ./management.json:/etc/netbird/management.json
    #ports:
    #  - 443:443 #API port
  #    # command for Let's Encrypt validation without dashboard container
  #    command: ["--letsencrypt-domain", "", "--log-file", "console"]
    command: [
      "--port", "443",
      "--log-file", "console",
      "--disable-anonymous-metrics=true",
      "--single-account-mode-domain=mydomain.com",
      "--dns-domain=netbird.selfhosted",
      "--user-delete-from-idp" # Added myself so netbird can handle users on its own
      ]
    networks:
      default:
      traefik_net:
    labels:
    - traefik.enable=true
    - traefik.http.routers.netbird-api.entrypoints=http
    - traefik.http.routers.netbird-api.rule=Host(`mydomain.com`) && PathPrefix(`/api`)
    - traefik.http.routers.netbird-api.middlewares=to-https@file
    - traefik.http.routers.netbird-api.service=netbird-api
    - traefik.http.routers.netbird-api-secure.entrypoints=https
    - traefik.http.routers.netbird-api-secure.rule=Host(`mydomain.com`) && PathPrefix(`/api`)
    - traefik.http.routers.netbird-api-secure.service=netbird-api
    - traefik.http.services.netbird-api.loadbalancer.server.port=443

    - traefik.http.routers.netbird-management.entrypoints=http
    - traefik.http.routers.netbird-management.rule=Host(`mydomain.com`) && PathPrefix(`/management.ManagementService/`)
    - traefik.http.routers.netbird-management.middlewares=to-https@file
    - traefik.http.routers.netbird-management.service=netbird-management
    - traefik.http.routers.netbird-management-secure.entrypoints=https
    - traefik.http.routers.netbird-management-secure.rule=Host(`mydomain.com`) && PathPrefix(`/management.ManagementService/`)
    - traefik.http.routers.netbird-management-secure.service=netbird-management
    - traefik.http.services.netbird-management.loadbalancer.server.port=443
    - traefik.http.services.netbird-management.loadbalancer.server.scheme=h2c

  # Coturn
  coturn:
    image: coturn/coturn:latest
    restart: unless-stopped
    domainname: mydomain.com
    volumes:
      - ./turnserver.conf:/etc/turnserver.conf:ro
    #      - ./privkey.pem:/etc/coturn/private/privkey.pem:ro
    #      - ./cert.pem:/etc/coturn/certs/cert.pem:ro
    network_mode: host
    command:
      - -c /etc/turnserver.conf

volumes:
  netbird-mgmt:
  netbird-signal:

networks:
  traefik_net:
    external: true
  1. Run the services
  2. Connect clients (I have clients using versions: 0.28.4, 0.28.5 and 0.28.6 with all the same errors.)

Expected behavior

I expected it to be working without logs. It's indeed working, but with those logs that I dont know if they are problem. But guess it is one if they are produced - at least some misconfiguration.

NetBird version

Tested on 0.28.4, 0.28.5 and 0.28.6

NetBird status -d output:

Peers detail:
 peer1.netbird.selfhosted:
  NetBird IP: <peer-1-ip>/32
  Public key: BJjacf7HSPwCzkxFkUlFlWMBbXiFQyLzhc2z6YrBd3E=
  Status: Disconnected
  -- detail --
  Connection type: P2P
  Direct: false
  ICE candidate (Local/Remote): host/srflx
  ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:62082
  Last connection update: 20 minutes, 7 seconds ago
  Last WireGuard handshake: 2 minutes, 26 seconds ago
  Transfer status (received/sent) 2.1 KiB/3.8 KiB
  Quantum resistance: false
  Routes: -
  Latency: 0s

 peer2.netbird.selfhosted:
  NetBird IP: <peer-2-ip>/32
  Public key: FGy/FK+e8p3qTFulkif4V+hI81U9I+QFKqttzBx7qE4=
  Status: Disconnected
  -- detail --
  Connection type: P2P
  Direct: false
  ICE candidate (Local/Remote): host/srflx
  ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:62082
  Last connection update: 20 minutes, 7 seconds ago
  Last WireGuard handshake: 2 minutes, 26 seconds ago
  Transfer status (received/sent) 2.1 KiB/3.8 KiB
  Quantum resistance: false
  Routes: -
  Latency: 0s

 peer3.netbird.selfhosted:
  NetBird IP: <peer-3-ip>
  Public key: qXy69ZUHM8IQJIapY/5z54AGTa2UyXPSefDISbzyByA=
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): srflx/srflx
  ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:51820
  Last connection update: 34 minutes, 36 seconds ago
  Last WireGuard handshake: 1 minute, 57 seconds ago
  Transfer status (received/sent) 3.0 MiB/25.1 MiB
  Quantum resistance: false
  Routes: -
  Latency: 12.297417ms

 peer4.netbird.selfhosted:
  NetBird IP: <peer-4-ip>
  Public key: MvDBVV63GFtK+MdEy3lV9/73Fuw5eUaRmNfQTMEi3DQ=
  Status: Connected
  -- detail --
  Connection type: P2P
  Direct: true
  ICE candidate (Local/Remote): host/srflx
  ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:62082
  Last connection update: 34 minutes, 38 seconds ago
  Last WireGuard handshake: 2 minutes, 26 seconds ago
  Transfer status (received/sent) 2.1 KiB/3.8 KiB
  Quantum resistance: false
  Routes: -
  Latency: 12.580678ms

OS: linux/amd64
Daemon version: 0.28.4
CLI version: 0.28.4
Management: Connected to https://mydomain.com:443
Signal: Connected to https://mydomain.com:443
Relays: 
  [stun:mydomain.com:3478] is Available
  [turn:mydomain.com:3478?transport=udp] is Available
Nameservers: 
  [1.1.1.1:53, 1.0.0.1:53] for [.] is Available
FQDN: peer5.netbird.selfhosted
NetBird IP: <peer-5-ip>/16
Interface type: Kernel
Quantum resistance: false
Routes: -
Peers count: 2/4 Connected

Glad if anyone has an idea. If any information is missing please request it.

Have a good one!

Originally created by @da-wilky on GitHub (Jul 31, 2024). Hello, **Describe the problem** Im using docker to host my own netbird management service behind traefik reverse-proxy. Everything is working, but my signal service is permently logging reconnects of the peers. Im curious if this is the expected behavior. Example logs ``` signal-1 | 2024-07-31T08:44:33Z INFO signal/server/signal.go:110: peer connected [<client1>] [streamID 1722415473781694381] signal-1 | 2024-07-31T08:44:38Z INFO signal/server/signal.go:110: peer connected [<client2>] [streamID 1722415478424239799] signal-1 | 2024-07-31T08:44:56Z INFO signal/server/signal.go:110: peer connected [<client3>] [streamID 1722415496841138613] signal-1 | 2024-07-31T08:44:59Z INFO signal/server/signal.go:110: peer connected [<client4>] [streamID 1722415499259544723] signal-1 | 2024-07-31T08:44:59Z INFO signal/server/signal.go:110: peer connected [<client5>] [streamID 1722415499922935186] signal-1 | 2024-07-31T08:45:01Z INFO signal/server/signal.go:110: peer connected [<client6>] [streamID 1722415501822241363] signal-1 | 2024-07-31T08:45:33Z INFO signal/server/signal.go:95: peer disconnected [<client1>] [streamID 1722415473781694381] signal-1 | 2024-07-31T08:45:34Z INFO signal/server/signal.go:110: peer connected [<client1>] [streamID 1722415534361482785] signal-1 | 2024-07-31T08:45:38Z INFO signal/server/signal.go:95: peer disconnected [<client2>] [streamID 1722415478424239799] signal-1 | 2024-07-31T08:45:39Z INFO signal/server/signal.go:110: peer connected [<client2>] [streamID 1722415539494496594] signal-1 | 2024-07-31T08:45:56Z INFO signal/server/signal.go:95: peer disconnected [<client3>] [streamID 1722415496841138613] signal-1 | 2024-07-31T08:45:57Z INFO signal/server/signal.go:110: peer connected [<client3>] [streamID 1722415557714556130] signal-1 | 2024-07-31T08:45:59Z INFO signal/server/signal.go:95: peer disconnected [<client4>] [streamID 1722415499259544723] signal-1 | 2024-07-31T08:45:59Z INFO signal/server/signal.go:95: peer disconnected [<client5>] [streamID 1722415499922935186] signal-1 | 2024-07-31T08:46:00Z INFO signal/server/signal.go:110: peer connected [<client5>] [streamID 1722415560435622333] signal-1 | 2024-07-31T08:46:00Z INFO signal/server/signal.go:110: peer connected [<client4>] [streamID 1722415560465990002] signal-1 | 2024-07-31T08:46:01Z INFO signal/server/signal.go:95: peer disconnected [<client6>] [streamID 1722415501822241363] signal-1 | 2024-07-31T08:46:03Z INFO signal/server/signal.go:110: peer connected [<client6>] [streamID 1722415563203704936] signal-1 | 2024-07-31T08:46:34Z INFO signal/server/signal.go:95: peer disconnected [<client1>] [streamID 1722415534361482785] signal-1 | 2024-07-31T08:46:35Z INFO signal/server/signal.go:110: peer connected [<client1>] [streamID 1722415595862371424] ``` On the clients the following logs are produced: ``` 2024-07-31T10:45:33+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers 2024-07-31T10:45:34+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream 2024-07-31T10:46:34+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers 2024-07-31T10:46:35+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream 2024-07-31T10:47:35+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers 2024-07-31T10:47:37+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream 2024-07-31T10:48:37+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers 2024-07-31T10:48:38+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream 2024-07-31T10:49:38+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers 2024-07-31T10:49:40+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream 2024-07-31T10:50:40+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers 2024-07-31T10:50:40+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream 2024-07-31T10:51:40+02:00 WARN signal/client/grpc.go:160: disconnected from the Signal service but will retry silently. Reason: rpc error: code = Internal desc = server closed the stream without sending trailers 2024-07-31T10:51:40+02:00 INFO signal/client/grpc.go:147: connected to the Signal Service stream ``` So it reconnects every minute because of an rpc error. But just by googling I cant figure out why. Again - everything is still working fine - at least I cant recognize any errors apart those logs. **To Reproduce** Steps to reproduce the behavior: 1. Host your own netbird with traefik proxy. ```yml version: "3" services: #UI dashboard dashboard: image: netbirdio/dashboard:latest restart: unless-stopped #ports: # - 80:80 # - 443:443 environment: # Endpoints - NETBIRD_MGMT_API_ENDPOINT=https://mydomain.com:443 - NETBIRD_MGMT_GRPC_API_ENDPOINT=https://mydomain.com:443 # OIDC - AUTH_AUDIENCE=netbird-client - AUTH_CLIENT_ID=netbird-client - AUTH_CLIENT_SECRET= - AUTH_AUTHORITY=https://mykeycloak.com/realms/netbird - USE_AUTH0=false - AUTH_SUPPORTED_SCOPES=openid profile email offline_access api - AUTH_REDIRECT_URI= - AUTH_SILENT_REDIRECT_URI= - NETBIRD_TOKEN_SOURCE=accessToken # SSL - NGINX_SSL_PORT=443 networks: default: traefik_net: labels: - traefik.enable=true - traefik.http.routers.netbird-dashboard.entrypoints=http - traefik.http.routers.netbird-dashboard.rule=Host(`mydomain.com`) - traefik.http.routers.netbird-dashboard.middlewares=to-https@file - traefik.http.routers.netbird-dashboard-secure.entrypoints=https - traefik.http.routers.netbird-dashboard-secure.rule=Host(`mydomain.com`) - traefik.http.routers.netbird-dashboard-secure.service=netbird-dashboard - traefik.http.services.netbird-dashboard.loadbalancer.server.port=80 # Signal signal: image: netbirdio/signal:latest restart: unless-stopped volumes: - netbird-signal:/var/lib/netbird #ports: # - 10000:80 ## port and command for Let's Encrypt validation # - 443:443 # command: ["--letsencrypt-domain", "", "--log-file", "console"] networks: default: traefik_net: labels: - traefik.enable=true - traefik.http.routers.netbird-signal.entrypoints=http - traefik.http.routers.netbird-signal.rule=Host(`mydomain.com`) && PathPrefix(`/signalexchange.SignalExchange/`) - traefik.http.routers.netbird-signal.middlewares=to-https@file - traefik.http.routers.netbird-signal-secure.entrypoints=https - traefik.http.routers.netbird-signal-secure.rule=Host(`mydomain.com`) && PathPrefix(`/signalexchange.SignalExchange`) - traefik.http.routers.netbird-signal-secure.service=netbird-signal - traefik.http.services.netbird-signal.loadbalancer.server.port=80 - traefik.http.services.netbird-signal.loadbalancer.server.scheme=h2c # Management management: image: netbirdio/management:latest restart: unless-stopped depends_on: - dashboard volumes: - netbird-mgmt:/var/lib/netbird - ./management.json:/etc/netbird/management.json #ports: # - 443:443 #API port # # command for Let's Encrypt validation without dashboard container # command: ["--letsencrypt-domain", "", "--log-file", "console"] command: [ "--port", "443", "--log-file", "console", "--disable-anonymous-metrics=true", "--single-account-mode-domain=mydomain.com", "--dns-domain=netbird.selfhosted", "--user-delete-from-idp" # Added myself so netbird can handle users on its own ] networks: default: traefik_net: labels: - traefik.enable=true - traefik.http.routers.netbird-api.entrypoints=http - traefik.http.routers.netbird-api.rule=Host(`mydomain.com`) && PathPrefix(`/api`) - traefik.http.routers.netbird-api.middlewares=to-https@file - traefik.http.routers.netbird-api.service=netbird-api - traefik.http.routers.netbird-api-secure.entrypoints=https - traefik.http.routers.netbird-api-secure.rule=Host(`mydomain.com`) && PathPrefix(`/api`) - traefik.http.routers.netbird-api-secure.service=netbird-api - traefik.http.services.netbird-api.loadbalancer.server.port=443 - traefik.http.routers.netbird-management.entrypoints=http - traefik.http.routers.netbird-management.rule=Host(`mydomain.com`) && PathPrefix(`/management.ManagementService/`) - traefik.http.routers.netbird-management.middlewares=to-https@file - traefik.http.routers.netbird-management.service=netbird-management - traefik.http.routers.netbird-management-secure.entrypoints=https - traefik.http.routers.netbird-management-secure.rule=Host(`mydomain.com`) && PathPrefix(`/management.ManagementService/`) - traefik.http.routers.netbird-management-secure.service=netbird-management - traefik.http.services.netbird-management.loadbalancer.server.port=443 - traefik.http.services.netbird-management.loadbalancer.server.scheme=h2c # Coturn coturn: image: coturn/coturn:latest restart: unless-stopped domainname: mydomain.com volumes: - ./turnserver.conf:/etc/turnserver.conf:ro # - ./privkey.pem:/etc/coturn/private/privkey.pem:ro # - ./cert.pem:/etc/coturn/certs/cert.pem:ro network_mode: host command: - -c /etc/turnserver.conf volumes: netbird-mgmt: netbird-signal: networks: traefik_net: external: true ``` 2. Run the services 3. Connect clients (I have clients using versions: `0.28.4`, `0.28.5` and `0.28.6` with all the same errors.) **Expected behavior** I expected it to be working without logs. It's indeed working, but with those logs that I dont know if they are problem. But guess it is one if they are produced - at least some misconfiguration. **NetBird version** Tested on `0.28.4`, `0.28.5` and `0.28.6` **NetBird status -d output:** ``` Peers detail: peer1.netbird.selfhosted: NetBird IP: <peer-1-ip>/32 Public key: BJjacf7HSPwCzkxFkUlFlWMBbXiFQyLzhc2z6YrBd3E= Status: Disconnected -- detail -- Connection type: P2P Direct: false ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:62082 Last connection update: 20 minutes, 7 seconds ago Last WireGuard handshake: 2 minutes, 26 seconds ago Transfer status (received/sent) 2.1 KiB/3.8 KiB Quantum resistance: false Routes: - Latency: 0s peer2.netbird.selfhosted: NetBird IP: <peer-2-ip>/32 Public key: FGy/FK+e8p3qTFulkif4V+hI81U9I+QFKqttzBx7qE4= Status: Disconnected -- detail -- Connection type: P2P Direct: false ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:62082 Last connection update: 20 minutes, 7 seconds ago Last WireGuard handshake: 2 minutes, 26 seconds ago Transfer status (received/sent) 2.1 KiB/3.8 KiB Quantum resistance: false Routes: - Latency: 0s peer3.netbird.selfhosted: NetBird IP: <peer-3-ip> Public key: qXy69ZUHM8IQJIapY/5z54AGTa2UyXPSefDISbzyByA= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): srflx/srflx ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:51820 Last connection update: 34 minutes, 36 seconds ago Last WireGuard handshake: 1 minute, 57 seconds ago Transfer status (received/sent) 3.0 MiB/25.1 MiB Quantum resistance: false Routes: - Latency: 12.297417ms peer4.netbird.selfhosted: NetBird IP: <peer-4-ip> Public key: MvDBVV63GFtK+MdEy3lV9/73Fuw5eUaRmNfQTMEi3DQ= Status: Connected -- detail -- Connection type: P2P Direct: true ICE candidate (Local/Remote): host/srflx ICE candidate endpoints (Local/Remote): <management-server-ip>:51820/<ip-I-dont-know>:62082 Last connection update: 34 minutes, 38 seconds ago Last WireGuard handshake: 2 minutes, 26 seconds ago Transfer status (received/sent) 2.1 KiB/3.8 KiB Quantum resistance: false Routes: - Latency: 12.580678ms OS: linux/amd64 Daemon version: 0.28.4 CLI version: 0.28.4 Management: Connected to https://mydomain.com:443 Signal: Connected to https://mydomain.com:443 Relays: [stun:mydomain.com:3478] is Available [turn:mydomain.com:3478?transport=udp] is Available Nameservers: [1.1.1.1:53, 1.0.0.1:53] for [.] is Available FQDN: peer5.netbird.selfhosted NetBird IP: <peer-5-ip>/16 Interface type: Kernel Quantum resistance: false Routes: - Peers count: 2/4 Connected ``` Glad if anyone has an idea. If any information is missing please request it. Have a good one!
saavagebueno added the management-service label 2025-11-20 05:24:21 -05:00
Author
Owner

@Sleepingme00 commented on GitHub (Aug 1, 2024):

I also have the same issue, appreciate anyone give a suggestion.

@Sleepingme00 commented on GitHub (Aug 1, 2024): I also have the same issue, appreciate anyone give a suggestion.
Author
Owner

@lixmal commented on GitHub (Aug 1, 2024):

Do you have any proxy in front that cuts the grpc connection after some duration? Could be traefik itself

@lixmal commented on GitHub (Aug 1, 2024): Do you have any proxy in front that cuts the grpc connection after some duration? Could be traefik itself
Author
Owner

@da-wilky commented on GitHub (Aug 1, 2024):

You are completly right, thank you for the hint! I had some research before in the direction of traefik, but didnt find anything, so I thought its about the signaling service, even tho it's indeed a traefik misconfiguration and not related to netbird.

Traefik has a default readTimeout of 60s. (https://doc.traefik.io/traefik/routing/entrypoints/#respondingtimeouts)
This is to protect traefik from DoS Attacks, see the following resources:

Currently its not possible to set this timeout on an per router/service level with traefik (resource 1, resource 2).
Hoepfully they plan to implement this feature.

So the current possibilities are:

  1. Live with the logs - it's still working fine and will be reconnecting immediatly after traefik cuts the connection. You might also increase the default duration by a few seconds to minimize the logs. Your decision if this is worth it - as said, its working fine and will be automatically reconnecting. You would just get a few less logs.
  2. Disable readTimeout by setting it to 0 with the huge drawback of exposing your traefik to the vulnerability.
  3. Switch from traefik to e.g. NGINX, that supports setting a custom read timeout per server/location.

Hope that helps somebody. Thanks to @lixmal for pointing the direction!

Cheers

@da-wilky commented on GitHub (Aug 1, 2024): You are completly right, thank you for the hint! I had some research before in the direction of `traefik`, but didnt find anything, so I thought its about the signaling service, even tho it's indeed a **traefik misconfiguration** and not related to netbird. Traefik has a default `readTimeout` of `60s`. (https://doc.traefik.io/traefik/routing/entrypoints/#respondingtimeouts) This is to protect traefik from DoS Attacks, see the following resources: - [Default Read Timeout got introduced](https://github.com/traefik/traefik/issues/10652) - [Disable Read Timeout makes traefik vulnerable to CVE-2024-28869](https://github.com/traefik/traefik/issues/10622) - [CVE-2024-28869](https://github.com/traefik/traefik/security/advisories/GHSA-4vwx-54mw-vqfw) Currently its not possible to set this timeout on an per router/service level with traefik ([resource 1](https://community.traefik.io/t/setting-timeout-per-service-router/5650/1), [resource 2](https://github.com/traefik/traefik/issues/10962)). Hoepfully they plan to implement this feature. So the current possibilities are: 1. Live with the logs - it's still working fine and will be reconnecting immediatly after traefik cuts the connection. You might also increase the default duration by a few seconds to minimize the logs. Your decision if this is worth it - as said, its working fine and will be automatically reconnecting. You would just get a few less logs. 2. Disable `readTimeout` by setting it to `0` with the huge drawback of exposing your traefik to the vulnerability. 3. Switch from traefik to e.g. NGINX, that supports setting a custom read timeout per server/location. Hope that helps somebody. Thanks to @lixmal for pointing the direction! Cheers
Author
Owner

@Sleepingme00 commented on GitHub (Aug 1, 2024):

You are completly right, thank you for the hint! I had some research before in the direction of traefik, but didnt find anything, so I thought its about the signaling service, even tho it's indeed a traefik misconfiguration and not related to netbird.

Traefik has a default readTimeout of 60s. (https://doc.traefik.io/traefik/routing/entrypoints/#respondingtimeouts) This is to protect traefik from DoS Attacks, see the following resources:

Currently its not possible to set this timeout on an per router/service level with traefik (resource 1, resource 2). Hoepfully they plan to implement this feature.

So the current possibilities are:

  1. Live with the logs - it's still working fine and will be reconnecting immediatly after traefik cuts the connection. You might also increase the default duration by a few seconds to minimize the logs. Your decision if this is worth it - as said, its working fine and will be automatically reconnecting. You would just get a few less logs.
  2. Disable readTimeout by setting it to 0 with the huge drawback of exposing your traefik to the vulnerability.
  3. Switch from traefik to e.g. NGINX, that supports setting a custom read timeout per server/location.

Hope that helps somebody. Thanks to @lixmal for pointing the direction!

Cheers

Refer to @da-wilky 's suggestion, i also has updated the Traefik config as following.

entryPoints:
  http:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: https
          scheme: https
  https:
    address: ":443"
    transport:
      respondingTimeouts:
        readTimeout: 0

after retarting the sercies, peers are reconnecting every minute was gone! The corresponding risk also follows, of course, this balance can only be measured by myself. Such a great reminder, appreciate @da-wilky @lixmal 's guidance!

signal-1-log

@Sleepingme00 commented on GitHub (Aug 1, 2024): > You are completly right, thank you for the hint! I had some research before in the direction of `traefik`, but didnt find anything, so I thought its about the signaling service, even tho it's indeed a **traefik misconfiguration** and not related to netbird. > > Traefik has a default `readTimeout` of `60s`. (https://doc.traefik.io/traefik/routing/entrypoints/#respondingtimeouts) This is to protect traefik from DoS Attacks, see the following resources: > > * [Default Read Timeout got introduced](https://github.com/traefik/traefik/issues/10652) > * [Disable Read Timeout makes traefik vulnerable to CVE-2024-28869](https://github.com/traefik/traefik/issues/10622) > * [CVE-2024-28869](https://github.com/traefik/traefik/security/advisories/GHSA-4vwx-54mw-vqfw) > > Currently its not possible to set this timeout on an per router/service level with traefik ([resource 1](https://community.traefik.io/t/setting-timeout-per-service-router/5650/1), [resource 2](https://github.com/traefik/traefik/issues/10962)). Hoepfully they plan to implement this feature. > > So the current possibilities are: > > 1. Live with the logs - it's still working fine and will be reconnecting immediatly after traefik cuts the connection. You might also increase the default duration by a few seconds to minimize the logs. Your decision if this is worth it - as said, its working fine and will be automatically reconnecting. You would just get a few less logs. > 2. Disable `readTimeout` by setting it to `0` with the huge drawback of exposing your traefik to the vulnerability. > 3. Switch from traefik to e.g. NGINX, that supports setting a custom read timeout per server/location. > > Hope that helps somebody. Thanks to @lixmal for pointing the direction! > > Cheers Refer to @da-wilky 's suggestion, i also has updated the Traefik config as following. ``` entryPoints: http: address: ":80" http: redirections: entryPoint: to: https scheme: https https: address: ":443" transport: respondingTimeouts: readTimeout: 0 ``` after retarting the sercies, peers are reconnecting every minute was gone! The corresponding risk also follows, of course, this balance can only be measured by myself. Such a great reminder, appreciate @da-wilky @lixmal 's guidance! ![signal-1-log](https://github.com/user-attachments/assets/79387717-a949-4bc0-9902-db1f2bd035d6)
Author
Owner

@thewickedengineer commented on GitHub (Oct 23, 2025):

@da-wilky @lixmal ,
We plan to use 10 minutes of read timeout and since there is no update on per service timeout, we are keeping the setting of 10 m for entire network.
What risks do you see with this setting, is there any other setting that can block the DDos attack possible in those 10 minutes

@thewickedengineer commented on GitHub (Oct 23, 2025): @da-wilky @lixmal , We plan to use 10 minutes of read timeout and since there is no update on per service timeout, we are keeping the setting of 10 m for entire network. What risks do you see with this setting, is there any other setting that can block the DDos attack possible in those 10 minutes
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SVI/netbird#1118