[SYCL][Doc] Updates to the "root group" extension#21838
Open
gmlueck wants to merge 4 commits intointel:syclfrom
Open
[SYCL][Doc] Updates to the "root group" extension#21838gmlueck wants to merge 4 commits intointel:syclfrom
gmlueck wants to merge 4 commits intointel:syclfrom
Conversation
* Change the `use_root_sync` property from a "kernel property" to a "kernel launch property". This is necessary because we want it to be possible to determine at runtime on a per-launch basis whether a kernel is launched in the special way that allows root-group synchronization. Kernels are allowed to statically contain a call to `group_barrier` with `root_group` even if they are not launched this way. However, the kernel must only dynamically call `group_barrier` with `root_group` if it is launched in the special way. This behavior is not possible if `use_root_sync` is a "kernel property" because kernel properties are the immutable from launch to launch. * No longer depend on "sycl_ext_oneapi_launch_queries" for the query that tells the maximum number of work-groups when using root-group synchronization. Instead, add a new kernel information descriptor `max_num_work_groups_sync` and new overloads of `kernel::get_info` that provide this information. We decided that the generality of "sycl_ext_oneapi_launch_queries" was overkill. * Add shortcut functions that allow an application to query `max_num_work_groups_sync` without first getting a kernel bundle. This is similar to existing shortcuts we provide already via "sycl_ext_oneapi_get_kernel_info". Add these shortcuts both for kernels defined with a type-name and for kernels defined as free-function kernels.
According to the Level Zero team, launch properties like `cache_config` can also affect the maximum number of work-groups that are allowed when doing root-group synchronization. Therefore, add a `LaunchProperties` parameter to the query, and require the application to pass the list of kernel launch properties.
When an application uses "sycl_ext_oneapi_work_group_scratch_memory" to allocate its dynamic work-group local memory, it can pass the size of that memory more conveniently via `props`. Add wording to clarify that this is allowed. When applications do this, `bytes` is normally zero. In order to make application code less verbose in this case, switch the parameter order so that `bytes` is last. This way, applications can allow it to be defaulted, rather than passing an explicit `0`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change the
use_root_syncproperty from a "kernel property" to a "kernel launch property". This is necessary because we want it to be possible to determine at runtime on a per-launch basis whether a kernel is launched in the special way that allows root-group synchronization. Kernels are allowed to statically contain a call togroup_barrierwithroot_groupeven if they are not launched this way. However, the kernel must only dynamically callgroup_barrierwithroot_groupif it is launched in the special way. This behavior is not possible ifuse_root_syncis a "kernel property" because kernel properties are the immutable from launch to launch.No longer depend on "sycl_ext_oneapi_launch_queries" for the query that tells the maximum number of work-groups when using root-group synchronization. Instead, add a new kernel information descriptor
max_num_work_groups_syncand new overloads ofkernel::get_infothat provide this information. We decided that the generality of "sycl_ext_oneapi_launch_queries" was overkill.Change the query so that it take the set of "launch properties". This is necessary because some kernel launch properties like
cache_configcan affect the result of the query.Add shortcut functions that allow an application to query
max_num_work_groups_syncwithout first getting a kernel bundle. This is similar to existing shortcuts we provide already via "sycl_ext_oneapi_get_kernel_info". Add these shortcuts both for kernels defined with a type-name and for kernels defined as free-function kernels.