[SYCL][UR] Move max_global_work_groups query to UR#21840
[SYCL][UR] Move max_global_work_groups query to UR#21840uditagarwal97 wants to merge 8 commits intosyclfrom
max_global_work_groups query to UR#21840Conversation
max_work_group_global query to URmax_global_work_groups query to UR
There was a problem hiding this comment.
Pull request overview
This PR moves SYCL’s ext::oneapi::experimental::info::device::max_global_work_groups query from being handled in the SYCL runtime to being surfaced as a Unified Runtime (UR) urDeviceGetInfo property (UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS) and wires it through adapters, printing utilities, and conformance tooling.
Changes:
- Adds
UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPSto UR API surface (YAML + generated header) and to UR printing / urinfo output. - Implements
UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPShandling across multiple UR adapters (OpenCL, CUDA, HIP, Level Zero, Offload; NativeCPU reports unsupported). - Updates SYCL device trait mapping + return type mapping and switches SYCL
device_implto query UR formax_global_work_groups.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| unified-runtime/tools/urinfo/urinfo.hpp | Adds printing of UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS in urinfo. |
| unified-runtime/test/conformance/device/urDeviceGetInfo.cpp | Adds conformance test coverage for the new device info query. |
| unified-runtime/source/adapters/opencl/device.cpp | Adds adapter support for UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS. |
| unified-runtime/source/adapters/offload/device.cpp | Adds adapter support for UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS and includes <limits>. |
| unified-runtime/source/adapters/native_cpu/device.cpp | Marks UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS as unsupported. |
| unified-runtime/source/adapters/level_zero/device.cpp | Adds adapter support for UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS. |
| unified-runtime/source/adapters/hip/device.cpp | Adds adapter support for UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS. |
| unified-runtime/source/adapters/cuda/device.cpp | Adds adapter support for UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS. |
| unified-runtime/scripts/core/device.yml | Adds the new device info enumerator to the UR spec YAML. |
| unified-runtime/include/unified-runtime/ur_print.hpp | Adds enum-to-string and tagged printing support for the new device info. |
| unified-runtime/include/unified-runtime/ur_api.h | Adds UR_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS to ur_device_info_t. |
| sycl/source/detail/ur_device_info_ret_types.inc | Maps the new UR device info to size_t for SYCL’s UR return-type machinery. |
| sycl/source/detail/device_impl.hpp | Switches SYCL to query UR for max_global_work_groups and adjusts max_work_groups<3> behavior. |
| sycl/include/sycl/info/ext_oneapi_device_traits.def | Maps max_global_work_groups to the new UR device info enumerator. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
iclsrc
left a comment
There was a problem hiding this comment.
This PR moves the max_global_work_groups device query out of the SYCL RT into UR adapters, which is a nice cleanup. The approach is sound, but there is one compilation-breaking typo in the offload adapter that must be fixed before merging. A few smaller issues noted below.
Co-authored-by: iclsrc <iclsrc@intel.com>
This PR makes the following changes:
max_global_work_groupsto UR. This query has been implemented in SYCL RT because there's no backend support formax_global_work_groupsquery. However, it was recently decided that OpenCL will add a corresponding query. See CMPLRLLVM-73572 for more info.max_global_work_groupsfromINT_MAXtoSIZE_MAXfor all backends. For CUDA, HIP, OFFLOAD, and L0 adapter, we calculate the value ofmax_global_work_groupsby taking minimum ofSIZE_MAXand multiplication of per-dimension max group size.max_work_groups<3>so thatmax_global_work_groupsno longer limits per-dimension max work group size.