Skip to content

Fix notification-delivery crash under load (thread-unsafe gethostbyname)#1937

Merged
kzangeli merged 4 commits intodevelopfrom
hardening/notify-crash-under-load
Apr 16, 2026
Merged

Fix notification-delivery crash under load (thread-unsafe gethostbyname)#1937
kzangeli merged 4 commits intodevelopfrom
hardening/notify-crash-under-load

Conversation

@kzangeli
Copy link
Copy Markdown
Collaborator

orionldServerConnect used gethostbyname, which on glibc returns a pointer into a single process-wide static hostent buffer. Concurrent notification delivery from multiple MHD threads raced on that shared buffer and could segfault — reproduced reliably by the NOTIFY perf scenario against -mongocOnly.

Replace with getaddrinfo, which is POSIX-mandated thread-safe and returns per-call heap storage freed by freeaddrinfo. Behaviour is otherwise equivalent (IPv4, SOCK_STREAM, numeric port via AI_NUMERICSERV).

orionldServerConnect used gethostbyname, which on glibc returns a pointer
into a single process-wide static hostent buffer. Concurrent notification
delivery from multiple MHD threads raced on that shared buffer and could
segfault — reproduced reliably by the NOTIFY perf scenario against
-mongocOnly.

Replace with getaddrinfo, which is POSIX-mandated thread-safe and returns
per-call heap storage freed by freeaddrinfo. Behaviour is otherwise
equivalent (IPv4, SOCK_STREAM, numeric port via AI_NUMERICSERV).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 15, 2026

CLA Assistant Lite bot All contributors have signed the CLA ✍️

kzangeli and others added 3 commits April 16, 2026 11:01
The functional-test workflow ran eclipse-mosquitto:1.6, which only
speaks MQTT 3.1.1 and older. Tests that request MQTT-Version "mqtt5.0"
(five of them, including ngsild_subCache-with-mongoc-on-startup) get
MQTTClient_connect error -15 and an E: trace from mqttConnect.

For most of those tests the MQTT connect happens after broker startup,
so the harness doesn't care. But ngsild_subCache-with-mongoc-on-startup
restarts the broker at step 07, the subCache reloads the MQTT sub, and
the connect fails during startup — roughly 2 ms before the listening
port opens. Since b83961a the harness's startup gate treats any ^E:
on stderr as a dead broker, and GA runner jitter in that 2 ms window
turns it into a flake.

eclipse-mosquitto:2.0 supports MQTT 3.1.1 and 5.0, which removes the
connect failure, the E: trace, and the race.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two small updates to doc/manuals-ld/installation-guide-*:

- Ubuntu 18.04 / 20.04 / 22.04: section headers still said "Postgres 12"
  while the install commands below them already installed postgresql-17
  (and CI runs postgis/postgis:17-3.5-alpine). Bump the headers to 17
  so the page is internally consistent.

- All six guides (Ubuntu 18.04/20.04/22.04, Debian 9/10, CentOS 7): add
  a one-line note that Orion-LD's MQTT subscription support includes
  MQTT 5.0, which requires Mosquitto 2.0 or later. On Ubuntu 22.04 the
  distro package is already 2.x; older distros need the upstream
  PPA / mosquitto.org repo if MQTT 5.0 is wanted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit bumped the mosquitto service container from
eclipse-mosquitto:1.6 to 2.0 to unblock MQTT 5.0 tests. But the 2.0
Docker image defaults to allow_anonymous=false, so every test that
creates an MQTT subscription now fails with MQTTClient_connect
error -1 (broker closes the unauthenticated connection).

Fix by dropping the service container and starting plain MQTT in the
existing MQTTS step instead, where we already control the config:

  - Install mosquitto via apt (Ubuntu 24.04 ships 2.0.18 — supports
    MQTT 5.0).
  - Stop the systemd-launched instance (its default config also denies
    anonymous).
  - Launch on 1883 with allow_anonymous=true.
  - Keep the existing 8883 MQTTS instance alongside it.

Rename the step from "Start MQTTS broker" to "Start MQTT + MQTTS
brokers" to reflect what it now does.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@kzangeli kzangeli merged commit b8dda72 into develop Apr 16, 2026
13 checks passed
@kzangeli kzangeli deleted the hardening/notify-crash-under-load branch April 16, 2026 12:39
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant