Description
When poller.php detects processes from a previous cycle are still running
(poller_time rows with end_time = 0000-00-00), it logs a warning but
unconditionally executes DELETE FROM poller_time. This wipes the state
locks of the still-running processes and launches a parallel polling cycle.
Both old and new pollers simultaneously query the same devices, write to
the same DB rows, and pipe to the same RRD files, causing MySQL deadlocks
(Error 1213) and RRD header corruption. As server load spikes, subsequent
cron runs also overrun, creating a cascading death spiral.
Remediation
Fail-closed on overrun: if processes from the previous cycle are still
running, abort the current cron execution with exit(1) rather than
deleting active locks and spawning parallel processes.
Description
When poller.php detects processes from a previous cycle are still running
(poller_time rows with end_time = 0000-00-00), it logs a warning but
unconditionally executes DELETE FROM poller_time. This wipes the state
locks of the still-running processes and launches a parallel polling cycle.
Both old and new pollers simultaneously query the same devices, write to
the same DB rows, and pipe to the same RRD files, causing MySQL deadlocks
(Error 1213) and RRD header corruption. As server load spikes, subsequent
cron runs also overrun, creating a cascading death spiral.
Remediation
Fail-closed on overrun: if processes from the previous cycle are still
running, abort the current cron execution with exit(1) rather than
deleting active locks and spawning parallel processes.