-
Notifications
You must be signed in to change notification settings - Fork 985
Closed
Labels
Milestone
Description
Description
reference: comment
the launch_l2_norm_batch function can attempt to launch an invalid CUDA kernel when num_samples exceeds CUDA_MAX_GRID_DIM_1D (65535).
Root Cause
When num_samples > 65535, even with blocks_per_sample = 1, the calculated gridSize = num_samples * 1 = num_samples still exceeds the CUDA 1D grid dimension limit (65535), leading to an invalid kernel launch.
The existing code attempts to reduce blocks_per_sample when gridSize > max_grid:
mahout/qdp/qdp-kernels/src/amplitude.cu
Lines 613 to 620 in ef00f92
| const size_t max_grid = CUDA_MAX_GRID_DIM_1D; // CUDA grid dimension limit for 1D launch | |
| if (gridSize > max_grid) { | |
| blocks_per_sample = max_grid / num_samples; | |
| if (blocks_per_sample == 0) { | |
| blocks_per_sample = 1; | |
| } | |
| gridSize = num_samples * blocks_per_sample; | |
| } |
However, when num_samples > max_grid, even with blocks_per_sample = 1, gridSize = num_samples still exceeds the limit, causing a CUDA error.
Reactions are currently unavailable