Skip to content

Newt stopped handling Private resources with EOF UDP relay error #271

@mprokopiev

Description

@mprokopiev

Describe the Bug

Hi team,
I lately had two outages with Newt handling private resources. All resources went down and didn't recover until Newt restart. Newt was handling public resources without any problems - only private resources associated with failed Newt instance were not working.

Previous incident (GMT+2):

Feb 25 01:01:48 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:48 Ping attempt 2/2 failed: failed to read ICMP packet: i/o
timeout
Feb 25 01:01:48 container.ro.internal docker/newt/e95bf3d3f088[608089]: WARN: 2026/02/24 23:01:48 Periodic ping failed (4 consecutive failures): all 2 ping
attempts failed, last error: failed to read ICMP packet: i/o timeout
Feb 25 01:01:48 container.ro.internal docker/newt/e95bf3d3f088[608089]: WARN: 2026/02/24 23:01:48 Connection to server lost after 4 failures. Continuous rec
onnection attempts will be made.
Feb 25 01:01:48 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:48 Sending message: newt/wg/register, data: map[backwardsCom
patible:true publicKey:uFHIEv1mkfUdOIt3KDVqQqcdaByxKOEc1ibsSnxXIlA=]
Feb 25 01:01:48 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:48 Increased ping check interval to 3.9s due to consecutive
failures
....
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Received Docker container fetch request
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Docker container list sent, count: 68
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Received registration message
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: INFO: 2026/02/24 23:01:49 Stopping ping check
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 Device closing
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 Routine: receive incoming v4 - stopped
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 Routine: receive incoming v6 - stopped
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 peer(QFYN…mVAs) - Stopping
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 peer(QFYN…mVAs) - Routine: sequential sender - stopped
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 Routine: event worker - stopped
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 Routine: TUN reader - stopped
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: gerbil-wireguard: 2026/02/24 23:01:49 peer(QFYN…mVAs) - Routine: sequential receiver - stopped
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Direct UDP relay read error: EOF
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Direct UDP relay read error: EOF
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Direct UDP relay read error: EOF
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Direct UDP relay read error: EOF
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Direct UDP relay read error: EOF
Feb 25 01:01:49 container.ro.internal docker/newt/e95bf3d3f088[608089]: DEBUG: 2026/02/24 23:01:49 Direct UDP relay read error: EOF

There were near ~40k of "Direct UDP relay read error: EOF" log lines. Looking at logs I decided it could be related to gerbil, however gerbil logs were rotated. I waited for the next occurrence which happened recently. It started 20 seconds after newt container was recreated due to label change. There was no Gerbil connectivity errors I saw during the previous incident (GMT+2):

бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Sent UDP hole punch to 204.x.x.x:21820: {"ephemeralP
ublicKey":"+NhVtOTzFqrkp4AbutlIQR2JzLR5X4UjCCBwukyrcQ4=","nonce":"9g58gUBaT+Lmh3mT","ciphertext":"M+3V3rzWEs8jpTxkGxheJ0zioGyjF9M4X0SFJDHv0sw9NUqiZ8ofnQmO8J
fbc3PKnQHlbgdai2TWsKJRpK7Y7Qbd53JENaRgIKeZFJ7UECBLaBxGpBKOWjkU9NOGZwKfwVKl7sO3dx0l2nz75rF318gQSkaGtA/gnu1zbFPaMfIRnrMYELe1SOxl4gKg4btLPk0="}
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Increased hole punch interval to 2s
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Target 34: health check passed (status: 200)
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: INFO: 2026/03/12 20:37:37 Target 34 initial status: healthy
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Target 34: initial check interval set to 30s
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Health check status update for 26 targets
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Docker container list sent, count: 67
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Received registration message
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: INFO: 2026/03/12 20:37:37 Stopping ping check
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Target 26: health check passed (status: 200)
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: INFO: 2026/03/12 20:37:37 Target 26 initial status: healthy
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Target 26: initial check interval set to 30s
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Health check status update for 26 targets
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: gerbil-wireguard: 2026/03/12 20:37:37 Device closing
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF
бер 12 20:37:37 container.ro.internal docker/newt/85cf2658115e[577300]: DEBUG: 2026/03/12 20:37:37 Direct UDP relay read error: EOF

And ~20k of the same UDP relay errors.

Gerbil reports this during the same time (UTC) - maybe that's the IP of failed Newt instance:

Mar 12 18:37:36 edge.us.internal docker/gerbil[2719088]: INFO: 2026/03/12 18:37:36 Clearing connections for added peer with WG IP: 100.89.128.32
Mar 12 18:37:36 edge.us.internal docker/gerbil[2719088]: DEBUG: 2026/03/12 18:37:36 Error reading response from 100.89.128.32:52517: read udp 100.89.128.1:3
2857->100.89.128.32:52517: use of closed network connection
Mar 12 18:37:36 edge.us.internal docker/gerbil[2719088]: DEBUG: 2026/03/12 18:37:36 Closing connection for WG IP 100.89.128.32: 100.89.128.32:52517-216.246.
119.122:50699
Mar 12 18:37:36 edge.us.internal docker/gerbil[2719088]: DEBUG: 2026/03/12 18:37:36 Error reading response from 100.89.128.32:52517: read udp 100.89.128.1:3
7410->100.89.128.32:52517: use of closed network connection
Mar 12 18:37:36 edge.us.internal docker/gerbil[2719088]: DEBUG: 2026/03/12 18:37:36 Closing connection for WG IP 100.89.128.32: 100.89.128.32:52517-35.207.1
8.217:52366
Mar 12 18:37:36 edge.us.internal docker/gerbil[2719088]: INFO: 2026/03/12 18:37:36 Cleared 2 connections for WG IP: 100.89.128.32

Newt is running in docker:

services:
  newt:
    image: docker.io/fosrl/newt:1.10.2
    container_name: newt
    environment:
      TZ: ${TZ}
      LOG_LEVEL: debug
      PANGOLIN_ENDPOINT: https://${NEWT_PANGOLIN_DOMAIN}
      NEWT_ID: ${NEWT_ID}
      NEWT_SECRET: ${NEWT_SECRET}
      CONFIG_FILE: /run/newt.json
      # Rootless socket
      DOCKER_SOCKET: /var/run/docker.sock
    user: ${PUID}:${PGID}
    cap_drop:
      - ALL
    read_only: true
    security_opt:
      - no-new-privileges:true
    volumes:
      - socket-proxy:/var/run
    tmpfs:
      - /tmp:uid=${PUID},gid=${PGID}
      - /run:uid=${PUID},gid=${PGID}
    networks:
      newt:
        gw_priority: 10
    restart: unless-stopped
    depends_on:
      socket-proxy:
        condition: service_healthy
        restart: true
    stop_signal: SIGTERM
    stop_grace_period: 30s
    cpu_count: 2
    mem_limit: 4096m
    labels:
      cadvisor.monitor: true

networks:
  newt:
    name: newt

What could be the reasons behind this? After the error, public resources worked fine.

Environment

  • OS Type & Version: Fedora 43
  • Pangolin Version: 1.16.2
  • Gerbil Version: 1.3.0
  • Traefik Version: 3.6.10
  • Newt Version: 1.10.2
  • Olm Version: 1.4.2

To Reproduce

I found to specific way to reproduce this. I wasn't using Private resources much until last month. It's just a matter of time for this to happen.

Expected Behavior

Newt to reconnect after failure and start handling private resources again.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions