[Build]estimated compile parallel jobs to avoid OOM#219
[Build]estimated compile parallel jobs to avoid OOM#219jikunshang wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
There was a problem hiding this comment.
Pull request overview
Improve build reliability by auto-estimating a safe parallel compilation job count to reduce OOM/killed builds on memory-constrained machines.
Changes:
- Add memory-based job estimation using total system RAM and an ~8GB-per-compile heuristic.
- Clamp parallel jobs to
min(cpu_cores, mem_based_jobs)with logging for auto-detected values.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import psutil | ||
| mem_bytes = psutil.virtual_memory().total | ||
|
|
There was a problem hiding this comment.
Unconditionally importing psutil inside setup.py can break installs/builds where psutil is not present in the build environment (raising ImportError before any fallback logic runs). Consider wrapping the import in try/except ImportError and falling back to CPU-based detection, or avoid the external dependency entirely by using stdlib /proc parsing where applicable.
| import psutil | |
| mem_bytes = psutil.virtual_memory().total | |
| try: | |
| import psutil | |
| except ImportError: | |
| psutil = None # type: ignore[assignment] | |
| mem_bytes = None | |
| if psutil is not None: | |
| mem_bytes = psutil.virtual_memory().total | |
| else: | |
| # Fallback: try to estimate memory using /proc/meminfo on Linux. | |
| if sys.platform.startswith("linux"): | |
| try: | |
| with open("/proc/meminfo", "r") as f: | |
| for line in f: | |
| if line.startswith("MemTotal:"): | |
| parts = line.split() | |
| if len(parts) >= 2: | |
| kbytes = int(parts[1]) | |
| mem_bytes = kbytes * 1024 | |
| break | |
| except OSError: | |
| mem_bytes = None |
| # of memory, so we limit jobs to avoid OOM on memory-constrained | ||
| # machines. | ||
| import psutil | ||
| mem_bytes = psutil.virtual_memory().total |
There was a problem hiding this comment.
Using total system memory (virtual_memory().total) can significantly overestimate usable memory in containers/cgroup-limited environments, which may still lead to OOM despite the cap. Prefer a cgroup-aware limit (if available) or at least virtual_memory().available/a conservative fraction of total to better reflect real headroom during compilation.
| mem_bytes = psutil.virtual_memory().total | |
| vm = psutil.virtual_memory() | |
| # Prefer available memory (more cgroup/container-aware) and fall | |
| # back to total if available is not provided. | |
| mem_bytes = getattr(vm, "available", None) or vm.total |
| if mem_bytes is not None: | ||
| # Assume each compile process may require ~8GB. | ||
| mem_jobs = max(1, mem_bytes // (8 * 1024**3)) | ||
| num_jobs = max(1, min(cpu_jobs, int(mem_jobs))) |
There was a problem hiding this comment.
Using total system memory (virtual_memory().total) can significantly overestimate usable memory in containers/cgroup-limited environments, which may still lead to OOM despite the cap. Prefer a cgroup-aware limit (if available) or at least virtual_memory().available/a conservative fraction of total to better reflect real headroom during compilation.
| mem_jobs = max(1, mem_bytes // (8 * 1024**3)) | ||
| num_jobs = max(1, min(cpu_jobs, int(mem_jobs))) | ||
| logger.info( | ||
| "Auto-detected: cpu core: %d, memory_limit: %d, using: %d", | ||
| cpu_jobs, |
There was a problem hiding this comment.
The log label memory_limit is misleading: the value logged (mem_jobs) is a job-count derived from memory, not a memory limit (bytes/GB). Consider renaming the label (e.g., memory_based_jobs) and/or logging both the detected memory (in GB) and the computed job cap for clarity.
| mem_jobs = max(1, mem_bytes // (8 * 1024**3)) | |
| num_jobs = max(1, min(cpu_jobs, int(mem_jobs))) | |
| logger.info( | |
| "Auto-detected: cpu core: %d, memory_limit: %d, using: %d", | |
| cpu_jobs, | |
| mem_gb = mem_bytes / (1024**3) | |
| mem_jobs = max(1, mem_bytes // (8 * 1024**3)) | |
| num_jobs = max(1, min(cpu_jobs, int(mem_jobs))) | |
| logger.info( | |
| "Auto-detected: cpu_cores: %d, system_memory_gb: %.2f, " | |
| "memory_based_jobs_cap: %d, using_jobs: %d", | |
| cpu_jobs, | |
| mem_gb, |
| num_jobs = int(num_jobs) | ||
| logger.info("Using MAX_JOBS=%d as the number of jobs.", num_jobs) | ||
| else: | ||
| # Estimate the number of jobs. Each compile process may take ~8GB |
There was a problem hiding this comment.
The PR description checklist fields for Test Plan and Test Result are still empty. Please add the command(s) used (e.g., pip install/build invocation) and observed results so reviewers can validate the change.
|
|
||
| if mem_bytes is not None: | ||
| # Assume each compile process may require ~8GB. | ||
| mem_jobs = max(1, mem_bytes // (8 * 1024**3)) |
There was a problem hiding this comment.
I think we no longer have such compile process after #126
pls double confirm.
There was a problem hiding this comment.
correct. After splitting the dtype, it will be reduced to ~2GB.
There was a problem hiding this comment.
it's possible that the last link job take 35GB, but each standalone template build does use <8GB now.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.
Purpose
A lot folks complain compile fail/killed, add some logic to estimate parallel compile number.
Test Plan
Test Result
(Optional) Documentation Update
BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)