[QDP] Add zero-copy amplitude batch encoding from float32 GPU tensors by viiccwen · Pull Request #1029 · apache/mahout

viiccwen · 2026-02-07T14:05:06Z

Purpose of PR

This PR adds batch float32 amplitude encoding support in QDP core and kernels.

It extends the existing float32 GPU-pointer amplitude path from single-sample encoding to batched encoding, and refactors batch state allocation so the output precision is selected explicitly at allocation time.

What changed

Core

Added QdpEngine::encode_batch_from_gpu_ptr_f32
Added QdpEngine::encode_batch_from_gpu_ptr_f32_with_stream
Added the corresponding float32 batch amplitude entry point in AmplitudeEncoder

Kernels

Added launch_amplitude_encode_batch_f32
Added the float32 batch amplitude CUDA kernel wiring in qdp-kernels
Reused the existing float32 batched L2 norm reduction for batch normalization

Allocation

Refactored GpuStateVector::new_batch to accept an explicit Precision
Updated all batch allocation call sites to pass the correct output precision
- amplitude batch uses Float64 or Float32 as appropriate
- angle, basis, IQP, and the streaming pipeline continue to request Float64

Python

Kept float32 + amplitude batch encoding unsupported in Python bindings for now
Existing Python-facing error behavior remains unchanged and can be handled in a follow-up PR

Tests

Added Float32 batch DLPack shape coverage
Added core GPU-pointer tests for float32 batch amplitude success and validation cases

Related Issues or PRs

closes #1028

Changes Made

Breaking Changes

Yes
No

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes
Successfully built and ran all unit tests or manual tests locally
PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
Code follows ASF guidelines

ryankert01 · 2026-02-08T05:11:39Z

Please resolve the conflict, Ty for the contribuiton.

viiccwen · 2026-02-09T03:13:33Z

solved!

rich7420 · 2026-02-11T11:52:16Z

sorry for the late,
Overall lg
I'll review deeper later

rich7420

@viiccwen thanks for the patch!
left some comments:
I think this new f32 batch GPU‑pointer API and related pipelines are not fully covered by tests.

rich7420 · 2026-02-11T13:40:30Z

qdp/qdp-core/src/gpu/encodings/amplitude.rs

-    launch_amplitude_encode, launch_amplitude_encode_batch, launch_l2_norm, launch_l2_norm_batch,
-    launch_l2_norm_f32,
+    launch_amplitude_encode, launch_amplitude_encode_batch, launch_amplitude_encode_batch_f32,
+    launch_l2_norm, launch_l2_norm_batch, launch_l2_norm_batch_f32, launch_l2_norm_f32,


In this file around lines 251–276 and 351–376, amplitude_encode_batch_kernel / _f32 compute input_base = sample_idx * input_len and then do reinterpret_cast<const double2*>(input_batch + input_base) + elem_pair / float2. For odd input_len and sample_idx > 0 this base pointer is only 8‑byte (double) / 4‑byte (float) aligned, not 16‑byte, so the double2/float2 loads are potentially misaligned. This alignment pattern already existed in the original f64 batch kernel and this PR copies it into the new f32 batch path; please either enforce even input_len at the Rust call‑site or rework the kernels to index from a properly aligned double2* / float2* base pointer with a scalar fallback.

ryankert01 · 2026-02-21T12:35:21Z

Please help resolve conflicts

Copilot

Pull request overview

This PR adds comprehensive float32 amplitude batch encoding support to the QDP engine, enabling zero-copy GPU encoding from PyTorch float32 CUDA tensors.

Changes:

Added f32 batch amplitude encoding kernels and GPU pointer APIs in core
Refactored GpuStateVector::new_batch to accept precision parameter for flexible buffer allocation
Updated all batch encoding call sites to specify output precision

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
qdp/qdp-kernels/src/amplitude.cu	Added f32 batch amplitude kernel (`amplitude_encode_batch_kernel_f32`) and launcher, plus f32 batch L2 norm support
qdp/qdp-kernels/src/lib.rs	Added FFI declarations for f32 batch kernels and stubs for non-CUDA builds
qdp/qdp-core/src/lib.rs	Added `encode_batch_from_gpu_ptr_f32` public API with validation and added `precision()` accessor
qdp/qdp-core/src/gpu/memory.rs	Refactored `new_batch` to accept `Precision` parameter for f32 or f64 buffer allocation
qdp/qdp-core/src/gpu/encodings/mod.rs	Added default `encode_batch_from_gpu_ptr_f32` trait method
qdp/qdp-core/src/gpu/encodings/amplitude.rs	Implemented f32 batch encoding with norm validation
qdp/qdp-core/src/gpu/encodings/{angle,basis,iqp}.rs	Updated `new_batch` calls to pass `Precision::Float64`
qdp/qdp-core/src/encoding/mod.rs	Updated streaming pipeline to use `engine.precision()`
qdp/qdp-python/src/lib.rs	Added validation for f32/f64 amplitude tensors; 1D f32 supported, 2D f32 returns clear error
qdp/qdp-core/tests/gpu_ptr_encoding.rs	Comprehensive f32 batch tests covering success and error paths
qdp/qdp-core/tests/dlpack.rs	Updated to pass precision to `new_batch`
testing/qdp/test_bindings.py	Added tests for f32 input with both f32 and f64 engine precision
qdp/qdp-python/tests/test_dlpack_validation.py	Updated to verify 1D f32 works and 2D f32 gives clear error

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

qdp/qdp-python/src/lib.rs

ryankert01 added this to the Qumat 0.6.0 milestone Feb 8, 2026

viiccwen force-pushed the add-batch-f32-amplitude-encoding branch from db93ede to b6a4f4f Compare February 9, 2026 06:42

rich7420 reviewed Feb 11, 2026

View reviewed changes

viiccwen force-pushed the add-batch-f32-amplitude-encoding branch from b6a4f4f to ce9d876 Compare February 12, 2026 03:59

ryankert01 requested review from Copilot and rich7420 February 21, 2026 12:34

Copilot started reviewing on behalf of ryankert01 February 21, 2026 12:35 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

qdp/qdp-python/src/lib.rs Outdated Show resolved Hide resolved

viiccwen added 2 commits March 2, 2026 17:18

Add batch float32 amplitude encoding

685a960

Add batch float32 amplitude encoding tests

3d8485f

viiccwen force-pushed the add-batch-f32-amplitude-encoding branch from ce9d876 to 3d8485f Compare March 2, 2026 17:38

viiccwen requested review from 400Ping, guan404ming and ryankert01 as code owners March 2, 2026 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QDP] Add zero-copy amplitude batch encoding from float32 GPU tensors#1029

[QDP] Add zero-copy amplitude batch encoding from float32 GPU tensors#1029
viiccwen wants to merge 2 commits intoapache:mainfrom
viiccwen:add-batch-f32-amplitude-encoding

viiccwen commented Feb 7, 2026 •

edited

Loading

Uh oh!

ryankert01 commented Feb 8, 2026

Uh oh!

viiccwen commented Feb 9, 2026

Uh oh!

rich7420 commented Feb 11, 2026

Uh oh!

rich7420 left a comment

Uh oh!

rich7420 Feb 11, 2026

Uh oh!

ryankert01 commented Feb 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

viiccwen commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of PR

What changed

Core

Kernels

Allocation

Python

Tests

Related Issues or PRs

Changes Made

Breaking Changes

Checklist

Uh oh!

ryankert01 commented Feb 8, 2026

Uh oh!

viiccwen commented Feb 9, 2026

Uh oh!

rich7420 commented Feb 11, 2026

Uh oh!

rich7420 left a comment

Choose a reason for hiding this comment

Uh oh!

rich7420 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ryankert01 commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

viiccwen commented Feb 7, 2026 •

edited

Loading

ryankert01 commented Feb 21, 2026 •

edited

Loading