Skip to content

Drain LibMR threads to a safe point before fork() (MOD-15307)#97

Draft
gabsow wants to merge 1 commit into
masterfrom
mod-15307-prefork-drain
Draft

Drain LibMR threads to a safe point before fork() (MOD-15307)#97
gabsow wants to merge 1 commit into
masterfrom
mod-15307-prefork-drain

Conversation

@gabsow

@gabsow gabsow commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds MR_DrainForFork() / MR_ResumeAfterFork(). Called from the embedding module's pre-fork handler (main thread), MR_DrainForFork():

  • parks the event-loop thread at a between-tasks safe point via a posted task, and
  • bounded-waits the worker pool to idle,

so no LibMR thread holds a libc lock at fork() (which would ghost-lock the child). MR_ResumeAfterFork() releases the parked event-loop thread (called after the fork, on success or cancel).

Cooperative, bounded, fail-open. Deliberately not the existing mr_thpool_pause (SIGUSR2), which can freeze a worker mid-malloc holding the arena lock — exactly the ghost-lock this prevents. A worker still blocked acquiring the GIL is already malloc-safe, so a drain timeout there is benign.

Why / depends on

Fixes the RedisTimeSeries ASM-migration nightly hangs (MOD-14615 valgrind, MOD-14239 sanitizer). The embedding module wires this to redis core's new FORK_CHILD_PRE subevent (redis/redis#15327).

Pre-merge note

The two RedisModule_Log(..., "notice", ...) lines in MR_DrainForFork/MR_ResumeAfterFork should be downgraded to debug (kept at notice to confirm the drain fires during CI validation).

🤖 Generated with Claude Code

Add MR_DrainForFork()/MR_ResumeAfterFork(). On the main thread (from the module's
FORK_CHILD_PRE handler) park the event-loop thread at a between-tasks safe point via
a posted task and bounded-wait the worker pool to idle, so no LibMR thread holds a
libc lock at fork() (ghost-lock). Bounded + fail-open; cooperative (not the SIGUSR2
mr_thpool_pause, which can freeze a worker mid-malloc). Resume releases the parked
event-loop thread.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant