[Bug] Batch amplitude and norm kernels can issue misaligned vector loads for odd-length samples

### What

The batch amplitude encoding and batched L2 norm CUDA kernels assume each sample base is aligned for `double2` / `float2` vector loads. That assumption does not hold when `sample_len` is odd and `sample_idx > 0`.

In those cases:

- `input_batch + sample_idx * sample_len` is only naturally aligned to the scalar type
- reinterpreting that address as `double2*` or `float2*` can produce misaligned accesses
- CUDA may surface this as `CUDA_ERROR_MISALIGNED_ADDRESS`

### Affected kernels

- `amplitude_encode_batch_kernel`
- `l2_norm_batch_kernel`
- `l2_norm_batch_kernel_f32`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Batch amplitude and norm kernels can issue misaligned vector loads for odd-length samples #1107

What

Affected kernels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Batch amplitude and norm kernels can issue misaligned vector loads for odd-length samples #1107

Description

What

Affected kernels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions