Skip to content

stop orphaed jobs#1374

Merged
giurgiur99 merged 1 commit into
mainfrom
fix/stop-orphaned-build-pull-jobs
May 18, 2026
Merged

stop orphaed jobs#1374
giurgiur99 merged 1 commit into
mainfrom
fix/stop-orphaned-build-pull-jobs

Conversation

@giurgiur99
Copy link
Copy Markdown
Contributor

@giurgiur99 giurgiur99 commented May 18, 2026

Fixes #1373

Changes proposed in this PR:

Problem

  • Jobs stuck forever in BuildImage / PullImage after a node restart (processJob had no branch for these statuses).
  • stopRequested ignored during build/pull (only the RunningAlgorithm branch read it).

Fix

  • New activeBuildAborts mapping registered by buildImage / pullImage
  • New processJob branch for BuildImage / PullImage:
    • Controller present + stop requested → abort()
    • No controller → orphan -> Failed

@giurgiur99
Copy link
Copy Markdown
Contributor Author

/run-security-scan

Copy link
Copy Markdown
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: low

Summary:
The PR successfully implements a robust mechanism to track, abort, and identify orphaned pull and build jobs using AbortController and a mapping of active abort handlers. It prevents resource leaks and hanging processes during job terminations or node restarts, correctly cleaning up timeouts. LGTM!

Comments:
• [INFO][bug] Consider a potential race condition edge-case: If checkJob is invoked by a periodic timer exactly after a job's status is updated in the database to PullImage / BuildImage, but right before the asynchronous pullImage or buildImage execution adds the controller to this.activeBuildAborts, it might falsely identify a fresh job as an orphan. Ensuring that activeBuildAborts.set(...) happens before or immediately surrounding the database status update minimizes this risk.
• [INFO][style] Good implementation replacing AbortSignal.timeout with a custom AbortController and setTimeout. This elegantly allows you to handle explicit abort requests (via stopRequested) while preserving the pull timeout limit safety constraint.
• [INFO][performance] Excellent use of the finally block to clear the timer and remove the entry from activeBuildAborts. This guarantees there are no hanging references or memory leaks regardless of whether the Docker process succeeded, failed, or timed out.

LGTM!

@giurgiur99 giurgiur99 marked this pull request as ready for review May 18, 2026 11:37
@giurgiur99 giurgiur99 merged commit 372c013 into main May 18, 2026
18 of 19 checks passed
@giurgiur99 giurgiur99 deleted the fix/stop-orphaned-build-pull-jobs branch May 18, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix building algorithm stall

3 participants