Skip to content

[Bug] Batch amplitude and norm kernels can issue misaligned vector loads for odd-length samples #1107

@viiccwen

Description

@viiccwen

What

The batch amplitude encoding and batched L2 norm CUDA kernels assume each sample base is aligned for double2 / float2 vector loads. That assumption does not hold when sample_len is odd and sample_idx > 0.

In those cases:

  • input_batch + sample_idx * sample_len is only naturally aligned to the scalar type
  • reinterpreting that address as double2* or float2* can produce misaligned accesses
  • CUDA may surface this as CUDA_ERROR_MISALIGNED_ADDRESS

Affected kernels

  • amplitude_encode_batch_kernel
  • l2_norm_batch_kernel
  • l2_norm_batch_kernel_f32

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions