build-parallel: Optimize max_workers based on dependency graph parallelism

**Problem**

The build-parallel command currently uses Python's ThreadPoolExecutor default for max_workers when not specified, which calculates as min(32, cpu_count + 4). However, this doesn't account for the actual parallelism available in the dependency graph.
For example:

- If a dependency graph only allows 2 packages to build in parallel (due to dependency constraints)
- But the system has 8 CPU cores (resulting in 12 default workers)
- Then 10 workers will sit idle most of the time

**Proposed Solution**

Calculate max_workers based on the maximum parallelism available in the dependency graph:
```
topo = graph.get_build_topology(context=wkctx)
max_parallelism = max(len(batch) for batch in topo.static_batches())

if max_workers is None:
    cpu_default = min(32, (os.cpu_count() or 1) + 4)
    max_workers = min(cpu_default, max_parallelism)
    logger.info(f"graph allows max {max_parallelism} parallel builds, using {max_workers} workers")
```
The static_batches() method already exists in TrackingTopologicalSorter and can analyze the maximum number of packages that can be built concurrently at any point

Related code:
src/fromager/commands/build.py - build_parallel() function
src/fromager/dependency_graph.py - TrackingTopologicalSorter.static_batches()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build-parallel: Optimize max_workers based on dependency graph parallelism #880

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

build-parallel: Optimize max_workers based on dependency graph parallelism #880

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions