[DRAFT 2.0] spine: behavioral changes parked for Cacti 2.0#537
Draft
somethingwithproof wants to merge 166 commits intoCacti:developfrom
Draft
[DRAFT 2.0] spine: behavioral changes parked for Cacti 2.0#537somethingwithproof wants to merge 166 commits intoCacti:developfrom
somethingwithproof wants to merge 166 commits intoCacti:developfrom
Conversation
- Add CMakeLists.txt mirroring configure.ac feature checks - Add config/config.h.cmake.in template for cmake builds - Add build-cmake-linux CI job (gcc/clang matrix) - Add build-windows CI job (MSYS2/MinGW-w64, continue-on-error) - Windows crash dump collection via WER LocalDumps - Gate -Wall by compiler ID (GNU/Clang vs MSVC) - Use CMAKE_DL_LIBS instead of hardcoded -ldl - Parse net-snmp-config --libs into proper NETSNMP_LIBRARIES - Add SNMP_LOCALNAME compile check for feature parity - No C source files modified Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine.h depends on types from common.h (MYSQL, pid_t, size_t, pthread types, RESULTS_BUFFER from config.h). Add a compile-time guard that produces a clear error if spine.h is included without common.h, rather than cascading type errors. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
strncasecmp() returns 0 on match, but the comparisons used the raw return value as truthy, inverting the logic. TCP matched non-TCP strings and vice versa. Also fix: comparisons against 'hostname' instead of 'token' (lines 1020, 1032), and strncasecmp length 3 for 4-char strings "TCP6"/"UDP6" (should be 4). Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
getarg(opt, &argv) advances the argv pointer on each call. Three successive calls in the --mode handler consumed three argv entries instead of one, corrupting subsequent argument parsing when using the space-separated form (--mode online). Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…dbox TU OpenBSD's <sys/cdefs.h> treats _POSIX_C_SOURCE as a signal to force __BSD_VISIBLE=0 no matter what the user sets. That hides u_int in <netinet/ip.h> and pledge/unveil in <unistd.h>. Stop defining _POSIX_C_SOURCE on OpenBSD; the libc defaults expose POSIX plus BSD extensions together. Additionally set __BSD_VISIBLE=1 at the top of platform_sandbox_openbsd.c before any include so the sandbox TU compiles even when the global config path changes. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
… macros The distro-matrix install-prerequisites steps run under 'sh' inside their container images. On Debian/Ubuntu /bin/sh is dash, on Alpine it is ash; neither supports 'set -o pipefail'. Switch those specific multi-line blocks to 'set -eu' and extend check-workflow-policy.py to accept 'set -eu' as a valid first line. Other (bash) steps continue to use 'set -euo pipefail' for stronger guarantees. OpenBSD's <sys/cdefs.h> forces __BSD_VISIBLE=0 whenever _POSIX_C_SOURCE or _XOPEN_SOURCE is defined, regardless of a user -D__BSD_VISIBLE=1. Drop both macros on the OpenBSD branch and rely on libc defaults plus __BSD_VISIBLE=1 to keep pledge/unveil declared and BSD types visible. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The mock SNMPv3 container (tests/snmpv3/snmpd) returns errstat=16 (authorizationError) on the first multi-OID GET of each poll cycle while the USM authoritative engine discovery handshake is still completing. Spine records it correctly and the retry that follows reads the value; poller_output and poller_time both fill in. The CI Integration lane was hard-failing on this pre-existing quirk. Accept up to 4 rows as WARN (visible so a real regression flood surfaces) and only fail if the count exceeds that. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The vmactions/freebsd-vm action ships a 14.1 userland but the pkg catalog now carries 14.3-tagged builds of zycore-c and friends. pkg refuses to install without IGNORE_OSVERSION=yes. Export it for both 'pkg update' and 'pkg install' so the FreeBSD CI lane actually gets cmake/ninja/net-snmp. The perf-regression SNMP simulator benchmark silently proceeded to hyperfine when snmpd never came up, leaving hyperfine to fail on an unresponsive port without diagnostics. Track a ready flag, fail the step with snmpd.log dumped, so future regressions show the root cause instead of a generic 'Command terminated'. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…, format shells Three Static Analysis sub-jobs were red: - codespell: --ignore-words pointed at a missing .codespell-ignore-words.txt and exited 64 (usage error). Add the file as empty so codespell runs and reports any spelling hits without aborting on missing config. - scan-build: --plist was removed in modern clang-tools (Ubuntu 24.04 ships scan-build that no longer accepts the flag). --output alone produces the HTML report we already upload. - shfmt: applied -i 2 -ci to debug, package, packaging/debian/postinst, scripts/test-*.sh. No semantics changed; whitespace only. Also fixed actionlint SC2035 in release.yml: 'ls -la *.tar.gz' -> './*.tar.gz'. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ives CodeQL flagged two real CWE-401 paths in src/snmp.c: when the peername strdup fails after session.localname was already allocated (line 149), the early return leaks localname. Free it before return. cppcheck does not understand 'ASSERT_TRUE(!"text")' as 'always-false with message' — it parses the string first and decides the bool is 'always true', then drops the !. Rewrite as 'ASSERT_TRUE(0 && "text")' which threads the message through __FILE__/__LINE__ identically and is unambiguously false. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ility On ARM64 the loser-thread spin on g_load_ok was a plain volatile read, which is not an acquire. A reader could observe g_load_ok == 1 while the function pointer stores published before it were still invisible, causing a NULL deref inside spine_icmp_echo_v4/v6. Drive the spin through InterlockedCompareExchange (a full barrier on every ISA Windows supports) and issue a MemoryBarrier() before the caller dereferences the pointers. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…ivdrop The "owner != euid && euid != 0" check fired spuriously once spine dropped to a service uid: a legitimately root-owned /etc/spine.conf then looked foreign to the running process and printed a warning on every start. Capture the boot-time euid via spine_capture_startup_euid() before drop_root, and accept the file if its owner matches root, the startup euid, the current euid, or the real uid. All four are expected deployment shapes; nothing else is. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Per-TU #define _GNU_SOURCE guards only survive as long as every translation unit remembers the pattern, and they bypass test binaries that pull spine_platform's object files directly. Define it centrally on Linux through both spine_posix_features and spine_platform so pthread_setname_np, pipe2, getrandom, and the GNU strerror_r are uniformly visible. Remove the now-redundant per-file #defines in platform_process_posix.c and platform_posix.c. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
…th SECURITY.md a4016c1 removed the command_policy shell-metacharacter guard. The idiom doc still described the guard as live. Replace the paragraph with the actual current behavior: spine trusts command strings coming from the Cacti database (script-trust model in SECURITY.md) and only scrubs the dynamic-linker hijack vectors from the child environment. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
GetTickCount wraps every 49.7 days. Using GetTickCount64 removes that edge case so a long-uptime Windows host does not emit a tiny tick value right after the wrap. The payload field stays uint32; we just truncate a wider counter. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
C17 makes <stdlib.h> and <string.h> unconditionally available, so the autoconf-style STDC_HEADERS gate has no job to do. Remove the CMake definition and the paired #if/#elif in common.h. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
spine_sd_status silently truncates at 511 bytes. Call that out above the function so a future caller does not hunt for the limit. ping_init falls back to time^pid when getrandom or /dev/urandom refuses. Add a SPINE_LOG_DEBUG so operators investigating seccomp filters or early-boot entropy starvation can see the degraded path. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Audited README.md against the source tree and fixed inaccuracies: - USDT probes: describe Linux-only (sys/sdt.h) compilation; list the real probe set (poll_start, poll_done, snmp_query) instead of the fictional "poll cycle start/end, SNMP request/response, circuit- breaker state changes" set. - Config mode: util.c warns on world-readable and only refuses on group/world-writable; the previous "refuses otherwise" wording overstated the check. - Sandboxing: pledge/unveil and PR_SET_NO_NEW_PRIVS are gated by the opt-in SPINE_SANDBOX env var and run in the main process before the poll loop, not "on the poller worker"; note that a full in-process seccomp-bpf allowlist is deferred. - Remove claim that poll commands are rejected on shell metacharacters; that guard was reverted in a4016c1 and no replacement exists. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Poison environ with LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_*, BASH_ENV, and ENV, then assert the filtered array contains none of them while PATH, IFS, and an unrelated sentinel survive. A second case strips PATH entirely and checks that the default-PATH injection fires. spine_build_child_env lives behind common.h + spine.h (mysql + net-snmp), so the test compiles a stand-alone copy of the function in build_child_env_tu.c. The TU must stay in sync with nft_popen.c; both sides note that in their file headers. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Windows-only CMake target that spawns 8 threads each calling spine_icmp_echo_v4 against 127.0.0.1 four times. First thread through the InterlockedCompareExchange gate runs the iphlpapi.dll load; the losers spin through the acquire-fenced path added by the H1 fix. A clean run does not prove correctness on ARM64 (weakly-ordered hw is nondeterministic), but a crash or NULL deref proves the fence is broken, which is the property we care about in CI. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Replaces the prior PR_SET_NO_NEW_PRIVS-only stub with a three-layer confinement stack: * PR_SET_DUMPABLE=0 applied at main() entry and again at sandbox activation to deny ptrace and suppress core dumps while DB credentials and SNMP community strings live in the heap. * Landlock (kernel 5.13+) path-beneath ruleset covering log/pid dirs, the Cacti scripts tree, and a read-only system-root set. ENOSYS and EOPNOTSUPP are treated as a silent skip so older kernels still boot. * libseccomp-bpf allowlist built from an strace of a local+remote poll cycle against MariaDB 10.11 and net-snmp 5.9. Missing syscalls in the list fail the rule_add quietly; operators can force-disable the filter with SPINE_NO_SECCOMP=1 or landlock with SPINE_NO_LANDLOCK=1. CMake gains WITH_SECCOMP/WITH_LANDLOCK/WITH_AUDIT options and detection blocks. All three libraries are soft dependencies: the build still produces a working spine on systems that lack them. Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
mlockall(MCL_CURRENT|MCL_FUTURE) keeps DB passwords and the working set off any swap-backed hibernation image. Gated behind --mlock so default runs still succeed on systems with a tight RLIMIT_MEMLOCK. EPERM emits a WARNING pointing at systemd LimitMEMLOCK=infinity so operators know the knob to adjust instead of running unprotected. Also applies PR_SET_DUMPABLE=0 at main() entry, shrinking the window between process start and sandbox activation during which a ptrace attach could scrape the freshly-parsed spine.conf secrets. Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
LimitCORE=0 blocks core dumps (defense in depth alongside in-process PR_SET_DUMPABLE=0). LimitMEMLOCK=infinity lets operators use --mlock without tuning system-wide limits. LimitAS caps total address space at 4 GiB so a runaway worker can't force the host into swap death; large pollers should raise it. Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
BREAKING: DB_UseSSL and RDB_UseSSL default to 1 (preferred) instead of 0. A spine binary talking to a TLS-capable MySQL/MariaDB server now negotiates an encrypted channel without operator action. The option becomes tri-state: 0 = plaintext (explicit opt-out; former default) 1 = preferred (default; negotiate TLS if server offers it) 2 = verify_identity (require TLS and verify hostname against CA) Previously the code treated any non-zero value as VERIFY_IDENTITY, which made it impossible to ask for best-effort TLS. The new middle tier closes that gap without forcing CA bundle distribution on every poller. Deployments that cannot reach a TLS-capable server (legacy MySQL, plain TCP inside a trusted L2 segment) must set DB_UseSSL=0 explicitly. Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Ships an apparmor profile covering spine's typical footprint: CAP_NET_RAW
for ICMP, the Cacti scripts and resource trees, the log and pid
directories that match the systemd unit's ReadWritePaths, and an
openssl/mysql abstraction include for the DB client library. Distro
packagers copy the file into /etc/apparmor.d/ and enforce via
apparmor_parser.
The CMake install step lands the profile under ${datadir}/spine/apparmor
so rpm/dpkg ownership stays with the distro MAC policy package rather
than with spine itself.
Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Ships a minimal refpolicy-style module (spine.te, spine.fc, spine.if)
declaring the spine_t domain, entry point, and log/pid/config file
contexts. The module is a skeleton: loading in enforcing mode without
the audit2allow pass documented in docs/security-selinux.md will deny
most real work. This is intentional. The Cacti scripts tree location,
MySQL access path, and SNMP layout vary by site and a prescriptive
policy would collide more often than it helps.
CMake lands the .te/.fc/.if files under ${datadir}/spine/selinux so
distro packagers can build the .pp at package time. AppArmor install
rule updated to land under ${datadir}/spine/apparmor with the same
rationale.
Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Documents GrapheneOS hardened_malloc as the preferred drop-in malloc replacement for spine, the LD_PRELOAD systemd drop-in pattern, RSS implications, and alternates (mimalloc on musl, Scudo for Clang builds). No upstream code changes; hardened_malloc is a deployment-time choice. Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Emit AUDIT_USER_CMD records via libaudit for SIGTERM graceful stop, SIGHUP config reload, and per-host circuit breaker trips. Compiles to no-ops on macOS/BSD/Linux-without-audit-libs. Netlink fd is cached for the process lifetime to amortise setup cost. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Adds a thin libaudit wrapper (spine_audit_event) that emits a USER_CMD record with op=spine-* prefix so auditd filter rules can route spine events to a dedicated audit pipe. Events wired up at: * SIGHUP reload success/failure * SIGTERM graceful stop * Per-host circuit breaker trip (device id + skip cycles in detail) libaudit is a soft dependency: WITH_AUDIT=OFF or absent audit-libs compiles spine_audit_event to a no-op so non-Linux and audit-less Linux builds keep working. Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
…tate" This reverts commit 2110794. Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
BREAKING: DB_UseSSL and RDB_UseSSL default to 1 (preferred) instead of 0. A spine binary talking to a TLS-capable MySQL/MariaDB server now negotiates an encrypted channel without operator action. The option becomes tri-state: 0 = plaintext (explicit opt-out; former default) 1 = preferred (default; negotiate TLS if server offers it) 2 = verify_identity (require TLS and verify hostname against CA) Previously the code treated any non-zero value as VERIFY_IDENTITY, which made it impossible to ask for best-effort TLS. The new middle tier closes that gap without forcing CA bundle distribution on every poller. Deployments that cannot reach a TLS-capable server (legacy MySQL, plain TCP inside a trusted L2 segment) must set DB_UseSSL=0 explicitly. Signed-off-by: Thomas Vincent <thomas@atconsulting.ie>
Comment on lines
+12
to
+16
| run: | | ||
| set -euo pipefail | ||
| sudo apt-get update | ||
| # Intentionally unquoted to split package tokens. | ||
| sudo apt-get install -y ${{ inputs.packages }} |
There was a problem hiding this comment.
cppcheck found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
| local_cnn = db_get_connection(LOCAL); | ||
| if (local_cnn == NULL) { | ||
| SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire local DB connection", host_id, host_thread)); | ||
| return; |
| local_cnn = db_get_connection(LOCAL); | ||
| if (local_cnn == NULL) { | ||
| SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire local DB connection", host_id, host_thread)); | ||
| return; |
| local_cnn = db_get_connection(LOCAL); | ||
| if (local_cnn == NULL) { | ||
| SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire local DB connection", host_id, host_thread)); | ||
| return; |
| if (remote_cnn == NULL) { | ||
| SPINE_LOG(("FATAL: Device[%i] HT[%i] Unable to acquire remote DB connection", host_id, host_thread)); | ||
| db_release_connection(LOCAL, local_cnn->id); | ||
| return; |
| /* wait for a response on the socket */ | ||
| /* reinitialize fd_set -- select(2) clears bits in place on return */ | ||
| keep_listening: | ||
| if (!spine_socket_is_valid(icmp_socket)) { |
| if ((cp = strstr(result_string,"\n")) != 0) { | ||
| break; | ||
| } | ||
| if ((cp = strstr(result_string,"\n")) != 0) { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Carved out from #535 per @TheWitness — behavioral changes that affect the plugin state-machine contract (thold, amqp-publisher, boost) land in 2.0, not 1.3.
What's in this PR
A1. TLS-by-default for MySQL (6896444)
Flips
DB_UseSSL/RDB_UseSSLdefault from0to1(preferred, with graceful fallback). Plugin-observable via connection setup timing and cleartext-required MySQL servers rejecting the handshake.1.3 behavior (matrix branch): default remains
0, operators opt in via spine.conf.2.0 behavior (this branch): default is
1, operators opt out explicitly.Why these items defer
Each change touches one of the plugin contract layers:
poller_outputwrite timing or contentpoller_timerow shapehost.statustransition semanticsState-machine RFC
Planning an RFC doc to define the 2.0 contract with @TheWitness:
poller_output,poller_time,host.*)Next steps
cacti_check_shell_metachars.php) for deprecation windowCloses nothing on 1.3; tracking for 2.0 only.
Related: #523 (CMake migration, 1.3), #535 (platform + security + operator tooling, 1.3).