Skip to content

build-parallel: Optimize max_workers based on dependency graph parallelism #880

@LalatenduMohanty

Description

@LalatenduMohanty

Problem

The build-parallel command currently uses Python's ThreadPoolExecutor default for max_workers when not specified, which calculates as min(32, cpu_count + 4). However, this doesn't account for the actual parallelism available in the dependency graph.
For example:

  • If a dependency graph only allows 2 packages to build in parallel (due to dependency constraints)
  • But the system has 8 CPU cores (resulting in 12 default workers)
  • Then 10 workers will sit idle most of the time

Proposed Solution

Calculate max_workers based on the maximum parallelism available in the dependency graph:

topo = graph.get_build_topology(context=wkctx)
max_parallelism = max(len(batch) for batch in topo.static_batches())

if max_workers is None:
    cpu_default = min(32, (os.cpu_count() or 1) + 4)
    max_workers = min(cpu_default, max_parallelism)
    logger.info(f"graph allows max {max_parallelism} parallel builds, using {max_workers} workers")

The static_batches() method already exists in TrackingTopologicalSorter and can analyze the maximum number of packages that can be built concurrently at any point

Related code:
src/fromager/commands/build.py - build_parallel() function
src/fromager/dependency_graph.py - TrackingTopologicalSorter.static_batches()

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions