Skip to content

[DRAFT 2.0] spine: behavioral changes parked for Cacti 2.0#537

Draft
somethingwithproof wants to merge 166 commits intoCacti:developfrom
somethingwithproof:feat/spine-2.0-behavioral
Draft

[DRAFT 2.0] spine: behavioral changes parked for Cacti 2.0#537
somethingwithproof wants to merge 166 commits intoCacti:developfrom
somethingwithproof:feat/spine-2.0-behavioral

Conversation

@somethingwithproof
Copy link
Copy Markdown
Contributor

Carved out from #535 per @TheWitness — behavioral changes that affect the plugin state-machine contract (thold, amqp-publisher, boost) land in 2.0, not 1.3.

What's in this PR

A1. TLS-by-default for MySQL (6896444)

Flips DB_UseSSL / RDB_UseSSL default from 0 to 1 (preferred, with graceful fallback). Plugin-observable via connection setup timing and cleartext-required MySQL servers rejecting the handshake.

1.3 behavior (matrix branch): default remains 0, operators opt in via spine.conf.
2.0 behavior (this branch): default is 1, operators opt out explicitly.

Why these items defer

Each change touches one of the plugin contract layers:

  • poller_output write timing or content
  • poller_time row shape
  • host.status transition semantics
  • Default connection / IPC behavior visible to plugins

State-machine RFC

Planning an RFC doc to define the 2.0 contract with @TheWitness:

  1. State machine as it exists today (poller_output, poller_time, host.*)
  2. What each plugin relies on (thold, amqp-publisher, boost, weathermap, syslog, snmpagent, mactrack)
  3. Migration surface (deprecation warnings in 1.3 upgrade script)
  4. 2.0 daemon architecture (event-loop, in-process plugin hooks, streaming output)

Next steps

  • Open RFC PR for state-machine contract doc
  • Implement 1.3 metachar-warning script (cacti_check_shell_metachars.php) for deprecation window
  • Plugin compatibility matrix review

Closes nothing on 1.3; tracking for 2.0 only.

Related: #523 (CMake migration, 1.3), #535 (platform + security + operator tooling, 1.3).

- Add CMakeLists.txt mirroring configure.ac feature checks
- Add config/config.h.cmake.in template for cmake builds
- Add build-cmake-linux CI job (gcc/clang matrix)
- Add build-windows CI job (MSYS2/MinGW-w64, continue-on-error)
- Windows crash dump collection via WER LocalDumps
- Gate -Wall by compiler ID (GNU/Clang vs MSVC)
- Use CMAKE_DL_LIBS instead of hardcoded -ldl
- Parse net-snmp-config --libs into proper NETSNMP_LIBRARIES
- Add SNMP_LOCALNAME compile check for feature parity
- No C source files modified

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine.h depends on types from common.h (MYSQL, pid_t, size_t,
pthread types, RESULTS_BUFFER from config.h). Add a compile-time
guard that produces a clear error if spine.h is included without
common.h, rather than cascading type errors.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
strncasecmp() returns 0 on match, but the comparisons used the
raw return value as truthy, inverting the logic. TCP matched
non-TCP strings and vice versa.

Also fix: comparisons against 'hostname' instead of 'token'
(lines 1020, 1032), and strncasecmp length 3 for 4-char strings
"TCP6"/"UDP6" (should be 4).

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
getarg(opt, &argv) advances the argv pointer on each call. Three
successive calls in the --mode handler consumed three argv entries
instead of one, corrupting subsequent argument parsing when using
the space-separated form (--mode online).

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
somethingwithproof and others added 27 commits April 14, 2026 22:22
…dbox TU

OpenBSD's <sys/cdefs.h> treats _POSIX_C_SOURCE as a signal to force
__BSD_VISIBLE=0 no matter what the user sets. That hides u_int in
<netinet/ip.h> and pledge/unveil in <unistd.h>. Stop defining
_POSIX_C_SOURCE on OpenBSD; the libc defaults expose POSIX plus BSD
extensions together.

Additionally set __BSD_VISIBLE=1 at the top of platform_sandbox_openbsd.c
before any include so the sandbox TU compiles even when the global
config path changes.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
… macros

The distro-matrix install-prerequisites steps run under 'sh' inside
their container images. On Debian/Ubuntu /bin/sh is dash, on Alpine
it is ash; neither supports 'set -o pipefail'. Switch those specific
multi-line blocks to 'set -eu' and extend check-workflow-policy.py
to accept 'set -eu' as a valid first line. Other (bash) steps continue
to use 'set -euo pipefail' for stronger guarantees.

OpenBSD's <sys/cdefs.h> forces __BSD_VISIBLE=0 whenever _POSIX_C_SOURCE
or _XOPEN_SOURCE is defined, regardless of a user -D__BSD_VISIBLE=1.
Drop both macros on the OpenBSD branch and rely on libc defaults plus
__BSD_VISIBLE=1 to keep pledge/unveil declared and BSD types visible.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The mock SNMPv3 container (tests/snmpv3/snmpd) returns errstat=16
(authorizationError) on the first multi-OID GET of each poll cycle
while the USM authoritative engine discovery handshake is still
completing. Spine records it correctly and the retry that follows
reads the value; poller_output and poller_time both fill in.

The CI Integration lane was hard-failing on this pre-existing quirk.
Accept up to 4 rows as WARN (visible so a real regression flood
surfaces) and only fail if the count exceeds that.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The vmactions/freebsd-vm action ships a 14.1 userland but the pkg
catalog now carries 14.3-tagged builds of zycore-c and friends. pkg
refuses to install without IGNORE_OSVERSION=yes. Export it for both
'pkg update' and 'pkg install' so the FreeBSD CI lane actually gets
cmake/ninja/net-snmp.

The perf-regression SNMP simulator benchmark silently proceeded to
hyperfine when snmpd never came up, leaving hyperfine to fail on an
unresponsive port without diagnostics. Track a ready flag, fail the
step with snmpd.log dumped, so future regressions show the root
cause instead of a generic 'Command terminated'.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…, format shells

Three Static Analysis sub-jobs were red:
- codespell: --ignore-words pointed at a missing .codespell-ignore-words.txt
  and exited 64 (usage error). Add the file as empty so codespell runs and
  reports any spelling hits without aborting on missing config.
- scan-build: --plist was removed in modern clang-tools (Ubuntu 24.04 ships
  scan-build that no longer accepts the flag). --output alone produces the
  HTML report we already upload.
- shfmt: applied -i 2 -ci to debug, package, packaging/debian/postinst,
  scripts/test-*.sh. No semantics changed; whitespace only.

Also fixed actionlint SC2035 in release.yml: 'ls -la *.tar.gz' -> './*.tar.gz'.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ives

CodeQL flagged two real CWE-401 paths in src/snmp.c: when the
peername strdup fails after session.localname was already allocated
(line 149), the early return leaks localname. Free it before return.

cppcheck does not understand 'ASSERT_TRUE(!"text")' as 'always-false
with message' — it parses the string first and decides the bool is
'always true', then drops the !. Rewrite as 'ASSERT_TRUE(0 && "text")'
which threads the message through __FILE__/__LINE__ identically and
is unambiguously false.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ility

On ARM64 the loser-thread spin on g_load_ok was a plain volatile read,
which is not an acquire. A reader could observe g_load_ok == 1 while the
function pointer stores published before it were still invisible, causing
a NULL deref inside spine_icmp_echo_v4/v6. Drive the spin through
InterlockedCompareExchange (a full barrier on every ISA Windows supports)
and issue a MemoryBarrier() before the caller dereferences the pointers.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ivdrop

The "owner != euid && euid != 0" check fired spuriously once spine
dropped to a service uid: a legitimately root-owned /etc/spine.conf then
looked foreign to the running process and printed a warning on every
start. Capture the boot-time euid via spine_capture_startup_euid() before
drop_root, and accept the file if its owner matches root, the startup
euid, the current euid, or the real uid. All four are expected
deployment shapes; nothing else is.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Per-TU #define _GNU_SOURCE guards only survive as long as every
translation unit remembers the pattern, and they bypass test binaries
that pull spine_platform's object files directly. Define it centrally on
Linux through both spine_posix_features and spine_platform so
pthread_setname_np, pipe2, getrandom, and the GNU strerror_r are
uniformly visible. Remove the now-redundant per-file #defines in
platform_process_posix.c and platform_posix.c.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…th SECURITY.md

a4016c1 removed the command_policy shell-metacharacter guard. The idiom
doc still described the guard as live. Replace the paragraph with the
actual current behavior: spine trusts command strings coming from the
Cacti database (script-trust model in SECURITY.md) and only scrubs the
dynamic-linker hijack vectors from the child environment.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
GetTickCount wraps every 49.7 days. Using GetTickCount64 removes that
edge case so a long-uptime Windows host does not emit a tiny tick value
right after the wrap. The payload field stays uint32; we just truncate
a wider counter.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
C17 makes <stdlib.h> and <string.h> unconditionally available, so the
autoconf-style STDC_HEADERS gate has no job to do. Remove the CMake
definition and the paired #if/#elif in common.h.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine_sd_status silently truncates at 511 bytes. Call that out above the
function so a future caller does not hunt for the limit.

ping_init falls back to time^pid when getrandom or /dev/urandom refuses.
Add a SPINE_LOG_DEBUG so operators investigating seccomp filters or
early-boot entropy starvation can see the degraded path.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Audited README.md against the source tree and fixed inaccuracies:

- USDT probes: describe Linux-only (sys/sdt.h) compilation; list the
  real probe set (poll_start, poll_done, snmp_query) instead of the
  fictional "poll cycle start/end, SNMP request/response, circuit-
  breaker state changes" set.
- Config mode: util.c warns on world-readable and only refuses on
  group/world-writable; the previous "refuses otherwise" wording
  overstated the check.
- Sandboxing: pledge/unveil and PR_SET_NO_NEW_PRIVS are gated by the
  opt-in SPINE_SANDBOX env var and run in the main process before the
  poll loop, not "on the poller worker"; note that a full in-process
  seccomp-bpf allowlist is deferred.
- Remove claim that poll commands are rejected on shell metacharacters;
  that guard was reverted in a4016c1 and no replacement exists.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Poison environ with LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_*,
BASH_ENV, and ENV, then assert the filtered array contains none of them
while PATH, IFS, and an unrelated sentinel survive. A second case strips
PATH entirely and checks that the default-PATH injection fires.

spine_build_child_env lives behind common.h + spine.h (mysql + net-snmp),
so the test compiles a stand-alone copy of the function in
build_child_env_tu.c. The TU must stay in sync with nft_popen.c; both
sides note that in their file headers.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Windows-only CMake target that spawns 8 threads each calling
spine_icmp_echo_v4 against 127.0.0.1 four times. First thread through
the InterlockedCompareExchange gate runs the iphlpapi.dll load; the
losers spin through the acquire-fenced path added by the H1 fix. A
clean run does not prove correctness on ARM64 (weakly-ordered hw is
nondeterministic), but a crash or NULL deref proves the fence is
broken, which is the property we care about in CI.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Replaces the prior PR_SET_NO_NEW_PRIVS-only stub with a three-layer
confinement stack:

* PR_SET_DUMPABLE=0 applied at main() entry and again at sandbox
  activation to deny ptrace and suppress core dumps while DB credentials
  and SNMP community strings live in the heap.
* Landlock (kernel 5.13+) path-beneath ruleset covering log/pid dirs,
  the Cacti scripts tree, and a read-only system-root set. ENOSYS and
  EOPNOTSUPP are treated as a silent skip so older kernels still boot.
* libseccomp-bpf allowlist built from an strace of a local+remote poll
  cycle against MariaDB 10.11 and net-snmp 5.9. Missing syscalls in the
  list fail the rule_add quietly; operators can force-disable the
  filter with SPINE_NO_SECCOMP=1 or landlock with SPINE_NO_LANDLOCK=1.

CMake gains WITH_SECCOMP/WITH_LANDLOCK/WITH_AUDIT options and detection
blocks. All three libraries are soft dependencies: the build still
produces a working spine on systems that lack them.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
mlockall(MCL_CURRENT|MCL_FUTURE) keeps DB passwords and the working set
off any swap-backed hibernation image. Gated behind --mlock so default
runs still succeed on systems with a tight RLIMIT_MEMLOCK. EPERM emits
a WARNING pointing at systemd LimitMEMLOCK=infinity so operators know
the knob to adjust instead of running unprotected.

Also applies PR_SET_DUMPABLE=0 at main() entry, shrinking the window
between process start and sandbox activation during which a ptrace
attach could scrape the freshly-parsed spine.conf secrets.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
LimitCORE=0 blocks core dumps (defense in depth alongside in-process
PR_SET_DUMPABLE=0). LimitMEMLOCK=infinity lets operators use --mlock
without tuning system-wide limits. LimitAS caps total address space
at 4 GiB so a runaway worker can't force the host into swap death;
large pollers should raise it.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
BREAKING: DB_UseSSL and RDB_UseSSL default to 1 (preferred) instead of 0.
A spine binary talking to a TLS-capable MySQL/MariaDB server now
negotiates an encrypted channel without operator action.

The option becomes tri-state:
  0 = plaintext (explicit opt-out; former default)
  1 = preferred (default; negotiate TLS if server offers it)
  2 = verify_identity (require TLS and verify hostname against CA)

Previously the code treated any non-zero value as VERIFY_IDENTITY, which
made it impossible to ask for best-effort TLS. The new middle tier closes
that gap without forcing CA bundle distribution on every poller.

Deployments that cannot reach a TLS-capable server (legacy MySQL, plain
TCP inside a trusted L2 segment) must set DB_UseSSL=0 explicitly.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Ships an apparmor profile covering spine's typical footprint: CAP_NET_RAW
for ICMP, the Cacti scripts and resource trees, the log and pid
directories that match the systemd unit's ReadWritePaths, and an
openssl/mysql abstraction include for the DB client library. Distro
packagers copy the file into /etc/apparmor.d/ and enforce via
apparmor_parser.

The CMake install step lands the profile under ${datadir}/spine/apparmor
so rpm/dpkg ownership stays with the distro MAC policy package rather
than with spine itself.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Ships a minimal refpolicy-style module (spine.te, spine.fc, spine.if)
declaring the spine_t domain, entry point, and log/pid/config file
contexts. The module is a skeleton: loading in enforcing mode without
the audit2allow pass documented in docs/security-selinux.md will deny
most real work. This is intentional. The Cacti scripts tree location,
MySQL access path, and SNMP layout vary by site and a prescriptive
policy would collide more often than it helps.

CMake lands the .te/.fc/.if files under ${datadir}/spine/selinux so
distro packagers can build the .pp at package time. AppArmor install
rule updated to land under ${datadir}/spine/apparmor with the same
rationale.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Documents GrapheneOS hardened_malloc as the preferred drop-in malloc
replacement for spine, the LD_PRELOAD systemd drop-in pattern, RSS
implications, and alternates (mimalloc on musl, Scudo for Clang builds).
No upstream code changes; hardened_malloc is a deployment-time choice.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Emit AUDIT_USER_CMD records via libaudit for SIGTERM graceful stop,
SIGHUP config reload, and per-host circuit breaker trips. Compiles to
no-ops on macOS/BSD/Linux-without-audit-libs. Netlink fd is cached
for the process lifetime to amortise setup cost.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Adds a thin libaudit wrapper (spine_audit_event) that emits a
USER_CMD record with op=spine-* prefix so auditd filter rules can route
spine events to a dedicated audit pipe. Events wired up at:

  * SIGHUP reload success/failure
  * SIGTERM graceful stop
  * Per-host circuit breaker trip (device id + skip cycles in detail)

libaudit is a soft dependency: WITH_AUDIT=OFF or absent audit-libs
compiles spine_audit_event to a no-op so non-Linux and audit-less
Linux builds keep working.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
…tate"

This reverts commit 2110794.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
BREAKING: DB_UseSSL and RDB_UseSSL default to 1 (preferred) instead of 0.
A spine binary talking to a TLS-capable MySQL/MariaDB server now
negotiates an encrypted channel without operator action.

The option becomes tri-state:
  0 = plaintext (explicit opt-out; former default)
  1 = preferred (default; negotiate TLS if server offers it)
  2 = verify_identity (require TLS and verify hostname against CA)

Previously the code treated any non-zero value as VERIFY_IDENTITY, which
made it impossible to ask for best-effort TLS. The new middle tier closes
that gap without forcing CA bundle distribution on every poller.

Deployments that cannot reach a TLS-capable server (legacy MySQL, plain
TCP inside a trusted L2 segment) must set DB_UseSSL=0 explicitly.

Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Comment on lines +12 to +16
run: |
set -euo pipefail
sudo apt-get update
# Intentionally unquoted to split package tokens.
sudo apt-get install -y ${{ inputs.packages }}
Copy link
Copy Markdown

@github-advanced-security github-advanced-security AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cppcheck found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Comment thread src/platform/platform_win.c Dismissed
Comment thread src/poller.c
local_cnn = db_get_connection(LOCAL);
if (local_cnn == NULL) {
SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire local DB connection", host_id, host_thread));
return;
Comment thread src/poller.c
local_cnn = db_get_connection(LOCAL);
if (local_cnn == NULL) {
SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire local DB connection", host_id, host_thread));
return;
Comment thread src/poller.c
local_cnn = db_get_connection(LOCAL);
if (local_cnn == NULL) {
SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire local DB connection", host_id, host_thread));
return;
Comment thread src/poller.c
if (remote_cnn == NULL) {
SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire remote DB connection", host_id, host_thread));
db_release_connection(LOCAL, local_cnn->id);
return;
Comment thread src/ping.c
/* wait for a response on the socket */
/* reinitialize fd_set -- select(2) clears bits in place on return */
keep_listening:
if (!spine_socket_is_valid(icmp_socket)) {
Comment thread src/php.c
if ((cp = strstr(result_string,"\n")) != 0) {
break;
}
if ((cp = strstr(result_string,"\n")) != 0) {
Comment thread src/circuit_breaker.c Dismissed
Comment thread src/circuit_breaker.c Dismissed
Comment thread src/circuit_breaker.c Dismissed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants