[clang][SYCL] Disable address space optimization for kernel args#21781
[clang][SYCL] Disable address space optimization for kernel args#21781KornevNikita merged 5 commits intointel:syclfrom
Conversation
For struct/class SYCL kernel agruments with pointer fields clang now generates a new type with the same field types except all pointers have global address space instead of default. A parameter of new generated type appears within kernel parameters. Then in kernel body some level of type punning is applied to reuse original AST without heavily modifying it in order to initialize kernel object with an argument of the generated type. This was implemented 5+ years back in order to achieve better performace. The current patch disables this optimization for two reasons: - We now believe that address space optimizations should be done by IGC - IGC is already capable of performing address space optimizations This PR also introduces a cc1 option that allows to roll back to the old behavior in case there is performace regressions after this change.
tahonermann
left a comment
There was a problem hiding this comment.
Looks great overall. I suggested one minor change to avoid any possible confusion regarding "glob" and "global" and a couple of questions regarding tests that don't perform both the GLOB-AS and GEN-AS checks.
| // RUN: %clang_cc1 -internal-isystem %S/Inputs -fsycl-is-device -triple spir64 \ | ||
| // RUN: -fsycl-force-glob-as-in-kernel-args -emit-llvm %s -o - | FileCheck %s --check-prefix=GLOB-AS |
There was a problem hiding this comment.
In -fsycl-force-glob-as-in-kernel-args mode, the CHECK lines won't be validated. Is that intentional/ok? That could lead to spurious matches.
There was a problem hiding this comment.
No, that is not intentional. Great catch, thanks. Fixed in 1cf995d .
| @@ -1,4 +1,4 @@ | |||
| // RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s | |||
| // RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm -fsycl-force-glob-as-in-kernel-args %s -o - | FileCheck %s | |||
There was a problem hiding this comment.
Shouldn't this test be exercised in non -fsycl-force-glob-as-in-kernel-args mode as well?
There was a problem hiding this comment.
This is done intentionally. The test is named generated-types-initialization.cpp and seems to be intended to check newly generated types that contain pointers with fixed address spaces. Since there is no generated types in non -fsycl-force-glob-as-in-kernel-args , the test doesn't make sense in this mode.
There was a problem hiding this comment.
Ok, thanks, that makes sense.
| h.single_task<class Pointer>([=]() { return SimpleStructWithPtr.i; }); | ||
| }); | ||
| // CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (__generated_StructWithPtr) __attribute__((device_kernel))' | ||
| // CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (StructWithPtr) __attribute__((device_kernel))' |
There was a problem hiding this comment.
What is the reason for not exercising this test both with and without -fsycl-force-glob-as-in-kernel-args?
There was a problem hiding this comment.
The test seemed to be focused on accessor and other special types and decomposition mechanism. Since pointers don't trigger decomposition, I reduced PR size by not exercising non fsycl-force-glob-as-in-kernel-args mode. This seems to be covered well by other tests in this PR.
There was a problem hiding this comment.
Ok, that makes sense. Thanks!
| def fsycl_force_glob_as_in_kernel_args : Flag<["-"], "fsycl-force-glob-as-in-kernel-args">, | ||
| HelpText<"Force global address space for USM pointers within SYCL kernel arguments">, | ||
| MarshallingInfoFlag<LangOpts<"SYCLForceGlobASInKernelArgs">>; | ||
|
|
There was a problem hiding this comment.
Given that the option name is already kind of long and that "glob" has alternative interpretations, I suggest we spell out global to avoid any confusion.
| def fsycl_force_glob_as_in_kernel_args : Flag<["-"], "fsycl-force-glob-as-in-kernel-args">, | |
| HelpText<"Force global address space for USM pointers within SYCL kernel arguments">, | |
| MarshallingInfoFlag<LangOpts<"SYCLForceGlobASInKernelArgs">>; | |
| def fsycl_force_global_as_in_kernel_args : Flag<["-"], "fsycl-force-global-as-in-kernel-args">, | |
| HelpText<"Force global address space for USM pointers within SYCL kernel arguments">, | |
| MarshallingInfoFlag<LangOpts<"SYCLForceGlobalASInKernelArgs">>; |
|
@tahonermann , thanks for the feedback, I replied to all comments. |
tahonermann
left a comment
There was a problem hiding this comment.
Looks great, thanks @Fznamznon!
| h.single_task<class Pointer>([=]() { return SimpleStructWithPtr.i; }); | ||
| }); | ||
| // CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (__generated_StructWithPtr) __attribute__((device_kernel))' | ||
| // CHECK: FunctionDecl {{.*}}Pointer{{.*}} 'void (StructWithPtr) __attribute__((device_kernel))' |
There was a problem hiding this comment.
Ok, that makes sense. Thanks!
| @@ -1,4 +1,4 @@ | |||
| // RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s | |||
| // RUN: %clang_cc1 -fno-sycl-force-inline-kernel-lambda -fsycl-is-device -internal-isystem %S/Inputs -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm -fsycl-force-glob-as-in-kernel-args %s -o - | FileCheck %s | |||
There was a problem hiding this comment.
Ok, thanks, that makes sense.
|
@intel/llvm-gatekeepers , this is ready to merge. |
For struct/class SYCL kernel arguments with pointer fields clang generated a new type with the same field types except all pointers have global address space instead of default. A parameter of new generated type appeared within kernel parameters. Then in kernel body some level of type punning was applied to reuse original AST without heavily modifying it in order to initialize kernel object with an argument of the generated type. This was implemented 5+ years back in order to achieve better performance. The current patch disables this optimization by default for two reasons:
This PR also introduces a cc1 option that allows to roll back to the old behavior in case there is performance regressions after this change.
Assisted-by: claude in test cases updating