Skip to content

[clang][SYCL] Disable address space optimization for kernel args#21781

Merged
KornevNikita merged 5 commits intointel:syclfrom
Fznamznon:disable-as-type-punning
Apr 23, 2026
Merged

[clang][SYCL] Disable address space optimization for kernel args#21781
KornevNikita merged 5 commits intointel:syclfrom
Fznamznon:disable-as-type-punning

Conversation

@Fznamznon
Copy link
Copy Markdown
Contributor

For struct/class SYCL kernel arguments with pointer fields clang generated a new type with the same field types except all pointers have global address space instead of default. A parameter of new generated type appeared within kernel parameters. Then in kernel body some level of type punning was applied to reuse original AST without heavily modifying it in order to initialize kernel object with an argument of the generated type. This was implemented 5+ years back in order to achieve better performance. The current patch disables this optimization by default for two reasons:

  • We now believe that address space optimizations should be done by IGC
  • IGC is already capable of performing address space optimizations

This PR also introduces a cc1 option that allows to roll back to the old behavior in case there is performance regressions after this change.

Assisted-by: claude in test cases updating

For struct/class SYCL kernel agruments with pointer fields clang
now generates a new type with the same field types except all pointers
have global address space instead of default. A parameter of new
generated type appears within kernel parameters. Then in kernel body some
level of type punning is applied to reuse original AST without heavily
modifying it in order to initialize kernel object with an argument of
the generated type. This was implemented 5+ years back in order to achieve
better performace. The current patch disables this optimization for two
reasons:
- We now believe that address space optimizations should be done by IGC
- IGC is already capable of performing address space optimizations

This PR also introduces a cc1 option that allows to roll back to the old
behavior in case there is performace regressions after this change.
@Fznamznon Fznamznon requested a review from a team as a code owner April 15, 2026 14:00
@Fznamznon Fznamznon requested a review from bader April 15, 2026 14:00
Copy link
Copy Markdown
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks!

Copy link
Copy Markdown
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

Copy link
Copy Markdown
Contributor

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall. I suggested one minor change to avoid any possible confusion regarding "glob" and "global" and a couple of questions regarding tests that don't perform both the GLOB-AS and GEN-AS checks.

Comment on lines +3 to +4
// RUN: %clang_cc1 -internal-isystem %S/Inputs -fsycl-is-device -triple spir64 \
// RUN: -fsycl-force-glob-as-in-kernel-args -emit-llvm %s -o - | FileCheck %s --check-prefix=GLOB-AS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In -fsycl-force-glob-as-in-kernel-args mode, the CHECK lines won't be validated. Is that intentional/ok? That could lead to spurious matches.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that is not intentional. Great catch, thanks. Fixed in 1cf995d .

@@ -1,4 +1,4 @@
// RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
// RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm -fsycl-force-glob-as-in-kernel-args %s -o - | FileCheck %s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test be exercised in non -fsycl-force-glob-as-in-kernel-args mode as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done intentionally. The test is named generated-types-initialization.cpp and seems to be intended to check newly generated types that contain pointers with fixed address spaces. Since there is no generated types in non -fsycl-force-glob-as-in-kernel-args , the test doesn't make sense in this mode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, that makes sense.

h.single_task<class Pointer>([=]() { return SimpleStructWithPtr.i; });
});
// CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (__generated_StructWithPtr) __attribute__((device_kernel))'
// CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (StructWithPtr) __attribute__((device_kernel))'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for not exercising this test both with and without -fsycl-force-glob-as-in-kernel-args?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test seemed to be focused on accessor and other special types and decomposition mechanism. Since pointers don't trigger decomposition, I reduced PR size by not exercising non fsycl-force-glob-as-in-kernel-args mode. This seems to be covered well by other tests in this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. Thanks!

Comment thread clang/include/clang/Options/Options.td Outdated
Comment on lines 9636 to 9639
def fsycl_force_glob_as_in_kernel_args : Flag<["-"], "fsycl-force-glob-as-in-kernel-args">,
HelpText<"Force global address space for USM pointers within SYCL kernel arguments">,
MarshallingInfoFlag<LangOpts<"SYCLForceGlobASInKernelArgs">>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the option name is already kind of long and that "glob" has alternative interpretations, I suggest we spell out global to avoid any confusion.

Suggested change
def fsycl_force_glob_as_in_kernel_args : Flag<["-"], "fsycl-force-glob-as-in-kernel-args">,
HelpText<"Force global address space for USM pointers within SYCL kernel arguments">,
MarshallingInfoFlag<LangOpts<"SYCLForceGlobASInKernelArgs">>;
def fsycl_force_global_as_in_kernel_args : Flag<["-"], "fsycl-force-global-as-in-kernel-args">,
HelpText<"Force global address space for USM pointers within SYCL kernel arguments">,
MarshallingInfoFlag<LangOpts<"SYCLForceGlobalASInKernelArgs">>;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, renamed in 47ea2a6 with test fixes in d51534e .

@Fznamznon
Copy link
Copy Markdown
Contributor Author

@tahonermann , thanks for the feedback, I replied to all comments.

@Fznamznon Fznamznon requested a review from tahonermann April 22, 2026 12:41
Copy link
Copy Markdown
Contributor

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @Fznamznon!

h.single_task<class Pointer>([=]() { return SimpleStructWithPtr.i; });
});
// CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (__generated_StructWithPtr) __attribute__((device_kernel))'
// CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (StructWithPtr) __attribute__((device_kernel))'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. Thanks!

@@ -1,4 +1,4 @@
// RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
// RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm -fsycl-force-glob-as-in-kernel-args %s -o - | FileCheck %s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, that makes sense.

@Fznamznon
Copy link
Copy Markdown
Contributor Author

@intel/llvm-gatekeepers , this is ready to merge.

@KornevNikita KornevNikita merged commit 437f95c into intel:sycl Apr 23, 2026
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants