Overview
Four clusters of open issues point to related architectural improvements for the polling engine. This issue tracks them as a coordinated roadmap rather than isolated fixes.
1. Resilient Poller Engine
The poller is fragile under load. Overruns cascade into parallel cycles, advisory locks stall without timeout, stale data causes doubled RRD writes, and DDL operations thrash the DB during routine polling.
Goal: a poller engine that fails gracefully under contention.
Key changes:
- Abort on overrun instead of deleting active process locks and spawning parallel cycles
- Advisory locks with hard retry limits (fail-closed, retry next cycle)
- Transaction-safe poller_output processing to prevent stale row doubling
- Deterministic child process lifecycle (no orphaned zombies on timeout)
Related issues: #6569, #6738, #6777, #6781, #7024, #7030
2. Efficient Poll Cycle
Each poll cycle wastes resources through unclosed SNMP sessions, per-data-source script server spawning, and unbatched SNMP GETs for realtime graphs.
Goal: 50% reduction in per-cycle process spawns and connection overhead.
Key changes:
- Reuse SNMP sessions across OIDs per host, close deterministically at end of cycle
- Batch script server queries per host (spawn once, feed all queries, tear down once)
- Runtime detection and bypass of php-snmp when net-snmp binaries are faster
- Batched SNMP GET for realtime graph polling
Related issues: #6669, #6722, #6735, #7025
3. Multi-Protocol Data Input
SNMP is insufficient for modern network devices. Vendors are investing in gRPC/OpenConfig telemetry and REST APIs. Device-specific SNMP quirks (e.g., Juniper trailing-zero OID padding) consume disproportionate maintenance effort.
Goal: pluggable data input abstraction that feeds the existing poller_output/RRD pipeline.
Key changes:
- Define a data input interface contract (collect, format, write to poller_output)
- gRPC/gNMI collector (sidecar or native) for Juniper JTI, Cisco MDT, Arista OpenConfig
- REST/API-based availability checks as an alternative to ICMP/SNMP ping
- Adaptive polling backoff for consistently failing devices
- Auto-detection of SNMP OID index quirks (trailing-zero padding, etc.)
Related issues: #5919, #6108, #6787, #7032, #7033
4. Zero-DDL Polling Pipeline
The boost subsystem and poller_output table use CREATE/DROP/RENAME/ALTER TABLE as runtime operations every poll cycle. This causes metadata lock contention, query cache invalidation, and InnoDB dictionary mutex stalls at scale.
Goal: no schema mutations during normal polling.
Key changes:
- Replace MEMORY table DDL-per-cycle with persistent InnoDB tables and TRUNCATE
- Row-level expiry for boost cache instead of table-level DROP/CREATE
- Bounded boost_max_records enforcement to prevent unbounded growth
Related issues: #6775, #6786, #7030
Priority
Cluster 1 (stability) and Cluster 4 (storage) are prerequisites for large-scale deployments. Cluster 2 (performance) provides measurable improvement for existing users. Cluster 3 (multi-protocol) is forward-looking and can develop in parallel as a plugin before core integration.
Overview
Four clusters of open issues point to related architectural improvements for the polling engine. This issue tracks them as a coordinated roadmap rather than isolated fixes.
1. Resilient Poller Engine
The poller is fragile under load. Overruns cascade into parallel cycles, advisory locks stall without timeout, stale data causes doubled RRD writes, and DDL operations thrash the DB during routine polling.
Goal: a poller engine that fails gracefully under contention.
Key changes:
Related issues: #6569, #6738, #6777, #6781, #7024, #7030
2. Efficient Poll Cycle
Each poll cycle wastes resources through unclosed SNMP sessions, per-data-source script server spawning, and unbatched SNMP GETs for realtime graphs.
Goal: 50% reduction in per-cycle process spawns and connection overhead.
Key changes:
Related issues: #6669, #6722, #6735, #7025
3. Multi-Protocol Data Input
SNMP is insufficient for modern network devices. Vendors are investing in gRPC/OpenConfig telemetry and REST APIs. Device-specific SNMP quirks (e.g., Juniper trailing-zero OID padding) consume disproportionate maintenance effort.
Goal: pluggable data input abstraction that feeds the existing poller_output/RRD pipeline.
Key changes:
Related issues: #5919, #6108, #6787, #7032, #7033
4. Zero-DDL Polling Pipeline
The boost subsystem and poller_output table use CREATE/DROP/RENAME/ALTER TABLE as runtime operations every poll cycle. This causes metadata lock contention, query cache invalidation, and InnoDB dictionary mutex stalls at scale.
Goal: no schema mutations during normal polling.
Key changes:
Related issues: #6775, #6786, #7030
Priority
Cluster 1 (stability) and Cluster 4 (storage) are prerequisites for large-scale deployments. Cluster 2 (performance) provides measurable improvement for existing users. Cluster 3 (multi-protocol) is forward-looking and can develop in parallel as a plugin before core integration.