[Build]estimated compile parallel jobs to avoid OOM by jikunshang · Pull Request #219 · vllm-project/vllm-xpu-kernels

jikunshang · 2026-03-24T10:06:32Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

A lot folks complain compile fail/killed, add some logic to estimate parallel compile number.

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Copilot

Pull request overview

Improve build reliability by auto-estimating a safe parallel compilation job count to reduce OOM/killed builds on memory-constrained machines.

Changes:

Add memory-based job estimation using total system RAM and an ~8GB-per-compile heuristic.
Clamp parallel jobs to min(cpu_cores, mem_based_jobs) with logging for auto-detected values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T10:12:57Z

setup.py

+            import psutil
+            mem_bytes = psutil.virtual_memory().total
+


Unconditionally importing psutil inside setup.py can break installs/builds where psutil is not present in the build environment (raising ImportError before any fallback logic runs). Consider wrapping the import in try/except ImportError and falling back to CPU-based detection, or avoid the external dependency entirely by using stdlib /proc parsing where applicable.

Suggested change

import psutil

mem_bytes = psutil.virtual_memory().total

try:

import psutil

except ImportError:

psutil = None # type: ignore[assignment]

mem_bytes = None

if psutil is not None:

mem_bytes = psutil.virtual_memory().total

else:

# Fallback: try to estimate memory using /proc/meminfo on Linux.

if sys.platform.startswith("linux"):

try:

with open("/proc/meminfo", "r") as f:

for line in f:

if line.startswith("MemTotal:"):

parts = line.split()

if len(parts) >= 2:

kbytes = int(parts[1])

mem_bytes = kbytes * 1024

break

except OSError:

mem_bytes = None

Copilot · 2026-03-24T10:12:57Z

setup.py

+            # of memory, so we limit jobs to avoid OOM on memory-constrained
+            # machines.
+            import psutil
+            mem_bytes = psutil.virtual_memory().total


Using total system memory (virtual_memory().total) can significantly overestimate usable memory in containers/cgroup-limited environments, which may still lead to OOM despite the cap. Prefer a cgroup-aware limit (if available) or at least virtual_memory().available/a conservative fraction of total to better reflect real headroom during compilation.

Suggested change

mem_bytes = psutil.virtual_memory().total

vm = psutil.virtual_memory()

# Prefer available memory (more cgroup/container-aware) and fall

# back to total if available is not provided.

mem_bytes = getattr(vm, "available", None) or vm.total

Copilot · 2026-03-24T10:12:58Z

setup.py

+            if mem_bytes is not None:
+                # Assume each compile process may require ~8GB.
+                mem_jobs = max(1, mem_bytes // (8 * 1024**3))
+                num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))


Using total system memory (virtual_memory().total) can significantly overestimate usable memory in containers/cgroup-limited environments, which may still lead to OOM despite the cap. Prefer a cgroup-aware limit (if available) or at least virtual_memory().available/a conservative fraction of total to better reflect real headroom during compilation.

Copilot · 2026-03-24T10:12:58Z

setup.py

+                mem_jobs = max(1, mem_bytes // (8 * 1024**3))
+                num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))
+                logger.info(
+                    "Auto-detected: cpu core: %d, memory_limit: %d, using: %d",
+                    cpu_jobs,


The log label memory_limit is misleading: the value logged (mem_jobs) is a job-count derived from memory, not a memory limit (bytes/GB). Consider renaming the label (e.g., memory_based_jobs) and/or logging both the detected memory (in GB) and the computed job cap for clarity.

Suggested change

mem_jobs = max(1, mem_bytes // (8 * 1024**3))

num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))

logger.info(

"Auto-detected: cpu core: %d, memory_limit: %d, using: %d",

cpu_jobs,

mem_gb = mem_bytes / (1024**3)

mem_jobs = max(1, mem_bytes // (8 * 1024**3))

num_jobs = max(1, min(cpu_jobs, int(mem_jobs)))

logger.info(

"Auto-detected: cpu_cores: %d, system_memory_gb: %.2f, "

"memory_based_jobs_cap: %d, using_jobs: %d",

cpu_jobs,

mem_gb,

Copilot · 2026-03-24T10:12:58Z

setup.py

            num_jobs = int(num_jobs)
            logger.info("Using MAX_JOBS=%d as the number of jobs.", num_jobs)
        else:
+            # Estimate the number of jobs. Each compile process may take ~8GB


The PR description checklist fields for Test Plan and Test Result are still empty. Please add the command(s) used (e.g., pip install/build invocation) and observed results so reviewers can validate the change.

majian4work · 2026-03-25T05:39:06Z

setup.py

+
+            if mem_bytes is not None:
+                # Assume each compile process may require ~8GB.
+                mem_jobs = max(1, mem_bytes // (8 * 1024**3))


I see a job uses 35G+

I think we no longer have such compile process after #126
pls double confirm.

correct. After splitting the dtype, it will be reduced to ~2GB.

it's possible that the last link job take 35GB, but each standalone template build does use <8GB now.

compute estimated jobs to avoid OOM

66c9d0d

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Copilot AI review requested due to automatic review settings March 24, 2026 10:06

jikunshang changed the title ~~[Build]compute estimated jobs to avoid OOM~~ [Build]estimated compile parallel jobs to avoid OOM Mar 24, 2026

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Copilot started reviewing on behalf of jikunshang March 24, 2026 10:18 View session

majian4work reviewed Mar 25, 2026

View reviewed changes

jikunshang requested review from baodii, rogerxfeng8 and xinyu-intel March 25, 2026 08:35

xinyu-intel approved these changes Mar 25, 2026

View reviewed changes

jikunshang mentioned this pull request Mar 27, 2026

compile vlllm-xpu-kernels takes too much RAM #225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build]estimated compile parallel jobs to avoid OOM#219

[Build]estimated compile parallel jobs to avoid OOM#219
jikunshang wants to merge 1 commit intovllm-project:mainfrom
jikunshang:kunshang/jobs_compute

jikunshang commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

majian4work Mar 25, 2026

Uh oh!

jikunshang Mar 25, 2026

Uh oh!

xinyu-intel Mar 25, 2026

Uh oh!

jikunshang Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-            import psutil
-            mem_bytes = psutil.virtual_memory().total
+            try:
+                import psutil
+            except ImportError:
+                psutil = None  # type: ignore[assignment]
+            mem_bytes = None
+            if psutil is not None:
+                mem_bytes = psutil.virtual_memory().total
+            else:
+                # Fallback: try to estimate memory using /proc/meminfo on Linux.
+                if sys.platform.startswith("linux"):
+                    try:
+                        with open("/proc/meminfo", "r") as f:
+                            for line in f:
+                                if line.startswith("MemTotal:"):
+                                    parts = line.split()
+                                    if len(parts) >= 2:
+                                        kbytes = int(parts[1])
+                                        mem_bytes = kbytes * 1024
+                                    break
+                    except OSError:
+                        mem_bytes = None

-            mem_bytes = psutil.virtual_memory().total
+            vm = psutil.virtual_memory()
+            # Prefer available memory (more cgroup/container-aware) and fall
+            # back to total if available is not provided.
+            mem_bytes = getattr(vm, "available", None) or vm.total

Conversation

jikunshang commented Mar 24, 2026

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

majian4work Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

jikunshang Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

xinyu-intel Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

jikunshang Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants