Skip to content

[SYCL][Doc] Updates to the "root group" extension#21838

Open
gmlueck wants to merge 4 commits intointel:syclfrom
gmlueck:me/root-group-sync
Open

[SYCL][Doc] Updates to the "root group" extension#21838
gmlueck wants to merge 4 commits intointel:syclfrom
gmlueck:me/root-group-sync

Conversation

@gmlueck
Copy link
Copy Markdown
Contributor

@gmlueck gmlueck commented Apr 21, 2026

  • Change the use_root_sync property from a "kernel property" to a "kernel launch property". This is necessary because we want it to be possible to determine at runtime on a per-launch basis whether a kernel is launched in the special way that allows root-group synchronization. Kernels are allowed to statically contain a call to group_barrier with root_group even if they are not launched this way. However, the kernel must only dynamically call group_barrier with root_group if it is launched in the special way. This behavior is not possible if use_root_sync is a "kernel property" because kernel properties are the immutable from launch to launch.

  • No longer depend on "sycl_ext_oneapi_launch_queries" for the query that tells the maximum number of work-groups when using root-group synchronization. Instead, add a new kernel information descriptor max_num_work_groups_sync and new overloads of kernel::get_info that provide this information. We decided that the generality of "sycl_ext_oneapi_launch_queries" was overkill.

  • Change the query so that it take the set of "launch properties". This is necessary because some kernel launch properties like cache_config can affect the result of the query.

  • Add shortcut functions that allow an application to query max_num_work_groups_sync without first getting a kernel bundle. This is similar to existing shortcuts we provide already via "sycl_ext_oneapi_get_kernel_info". Add these shortcuts both for kernels defined with a type-name and for kernels defined as free-function kernels.

* Change the `use_root_sync` property from a "kernel property" to a
  "kernel launch property".  This is necessary because we want it to
  be possible to determine at runtime on a per-launch basis whether a
  kernel is launched in the special way that allows root-group
  synchronization.  Kernels are allowed to statically contain a call
  to `group_barrier` with `root_group` even if they are not launched
  this way.  However, the kernel must only dynamically call
  `group_barrier` with `root_group` if it is launched in the special
  way.  This behavior is not possible if `use_root_sync` is a "kernel
  property" because kernel properties are the immutable from launch to
  launch.

* No longer depend on "sycl_ext_oneapi_launch_queries" for the query
  that tells the maximum number of work-groups when using root-group
  synchronization.  Instead, add a new kernel information descriptor
  `max_num_work_groups_sync` and new overloads of `kernel::get_info`
  that provide this information.  We decided that the generality of
  "sycl_ext_oneapi_launch_queries" was overkill.

* Add shortcut functions that allow an application to query
  `max_num_work_groups_sync` without first getting a kernel bundle.
  This is similar to existing shortcuts we provide already via
  "sycl_ext_oneapi_get_kernel_info".  Add these shortcuts both for
  kernels defined with a type-name and for kernels defined as
  free-function kernels.
@gmlueck gmlueck requested a review from a team as a code owner April 21, 2026 21:41
gmlueck added 3 commits April 22, 2026 08:52
According to the Level Zero team, launch properties like `cache_config`
can also affect the maximum number of work-groups that are allowed when
doing root-group synchronization.  Therefore, add a `LaunchProperties`
parameter to the query, and require the application to pass the list of
kernel launch properties.
When an application uses "sycl_ext_oneapi_work_group_scratch_memory" to
allocate its dynamic work-group local memory, it can pass the size of
that memory more conveniently via `props`.  Add wording to clarify that
this is allowed.

When applications do this, `bytes` is normally zero.  In order to make
application code less verbose in this case, switch the parameter order
so that `bytes` is last.  This way, applications can allow it to be
defaulted, rather than passing an explicit `0`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant