Potential bug for CUDA reduction

In the CUDA model's reduction, we omit `__syncthreads()` at the warp level: https://github.com/UoB-HPC/CloverLeaf/blob/3e10ff9268d3704a744e5a2a4ea3c62831078082/src/cuda/context.h#L220

I suspect that this is incorrect for post-Volta GPUs, where at least a `__syncwarp()` might still be needed. So I suspect that we would need something like 
```c++
if(offset > warpSize/2)
  __syncthreads();
else
  __syncwarp();
```

This would also solve portability issues when running the code e.g. using AdaptiveCpp PCUDA on OpenCL devices or CPU devices, where the warp size might be smaller than 32.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug for CUDA reduction #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential bug for CUDA reduction #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions