Skip to content

[Build]estimated compile parallel jobs to avoid OOM#219

Open
jikunshang wants to merge 1 commit intovllm-project:mainfrom
jikunshang:kunshang/jobs_compute
Open

[Build]estimated compile parallel jobs to avoid OOM#219
jikunshang wants to merge 1 commit intovllm-project:mainfrom
jikunshang:kunshang/jobs_compute

Conversation

@jikunshang
Copy link
Copy Markdown
Collaborator

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

A lot folks complain compile fail/killed, add some logic to estimate parallel compile number.

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Copilot AI review requested due to automatic review settings March 24, 2026 10:06
@jikunshang jikunshang changed the title [Build]compute estimated jobs to avoid OOM [Build]estimated compile parallel jobs to avoid OOM Mar 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improve build reliability by auto-estimating a safe parallel compilation job count to reduce OOM/killed builds on memory-constrained machines.

Changes:

  • Add memory-based job estimation using total system RAM and an ~8GB-per-compile heuristic.
  • Clamp parallel jobs to min(cpu_cores, mem_based_jobs) with logging for auto-detected values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +103 to +105
import psutil
mem_bytes = psutil.virtual_memory().total

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally importing psutil inside setup.py can break installs/builds where psutil is not present in the build environment (raising ImportError before any fallback logic runs). Consider wrapping the import in try/except ImportError and falling back to CPU-based detection, or avoid the external dependency entirely by using stdlib /proc parsing where applicable.

Suggested change
import psutil
mem_bytes = psutil.virtual_memory().total
try:
import psutil
except ImportError:
psutil = None # type: ignore[assignment]
mem_bytes = None
if psutil is not None:
mem_bytes = psutil.virtual_memory().total
else:
# Fallback: try to estimate memory using /proc/meminfo on Linux.
if sys.platform.startswith("linux"):
try:
with open("/proc/meminfo", "r") as f:
for line in f:
if line.startswith("MemTotal:"):
parts = line.split()
if len(parts) >= 2:
kbytes = int(parts[1])
mem_bytes = kbytes * 1024
break
except OSError:
mem_bytes = None

Copilot uses AI. Check for mistakes.
# of memory, so we limit jobs to avoid OOM on memory-constrained
# machines.
import psutil
mem_bytes = psutil.virtual_memory().total
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using total system memory (virtual_memory().total) can significantly overestimate usable memory in containers/cgroup-limited environments, which may still lead to OOM despite the cap. Prefer a cgroup-aware limit (if available) or at least virtual_memory().available/a conservative fraction of total to better reflect real headroom during compilation.

Suggested change
mem_bytes = psutil.virtual_memory().total
vm = psutil.virtual_memory()
# Prefer available memory (more cgroup/container-aware) and fall
# back to total if available is not provided.
mem_bytes = getattr(vm, "available", None) or vm.total

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +116
if mem_bytes is not None:
# Assume each compile process may require ~8GB.
mem_jobs = max(1, mem_bytes // (8 * 1024**3))
num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using total system memory (virtual_memory().total) can significantly overestimate usable memory in containers/cgroup-limited environments, which may still lead to OOM despite the cap. Prefer a cgroup-aware limit (if available) or at least virtual_memory().available/a conservative fraction of total to better reflect real headroom during compilation.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +119
mem_jobs = max(1, mem_bytes // (8 * 1024**3))
num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))
logger.info(
"Auto-detected: cpu core: %d, memory_limit: %d, using: %d",
cpu_jobs,
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log label memory_limit is misleading: the value logged (mem_jobs) is a job-count derived from memory, not a memory limit (bytes/GB). Consider renaming the label (e.g., memory_based_jobs) and/or logging both the detected memory (in GB) and the computed job cap for clarity.

Suggested change
mem_jobs = max(1, mem_bytes // (8 * 1024**3))
num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))
logger.info(
"Auto-detected: cpu core: %d, memory_limit: %d, using: %d",
cpu_jobs,
mem_gb = mem_bytes / (1024**3)
mem_jobs = max(1, mem_bytes // (8 * 1024**3))
num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))
logger.info(
"Auto-detected: cpu_cores: %d, system_memory_gb: %.2f, "
"memory_based_jobs_cap: %d, using_jobs: %d",
cpu_jobs,
mem_gb,

Copilot uses AI. Check for mistakes.
num_jobs = int(num_jobs)
logger.info("Using MAX_JOBS=%d as the number of jobs.", num_jobs)
else:
# Estimate the number of jobs. Each compile process may take ~8GB
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description checklist fields for Test Plan and Test Result are still empty. Please add the command(s) used (e.g., pip install/build invocation) and observed results so reviewers can validate the change.

Copilot uses AI. Check for mistakes.

if mem_bytes is not None:
# Assume each compile process may require ~8GB.
mem_jobs = max(1, mem_bytes // (8 * 1024**3))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a job uses 35G+

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we no longer have such compile process after #126
pls double confirm.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. After splitting the dtype, it will be reduced to ~2GB.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's possible that the last link job take 35GB, but each standalone template build does use <8GB now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants