Thanks for taking the time. A few notes to keep review loops short.
Build and run the parity tests following the
Build section of the
README. The parity binaries under tools/parity/ are the correctness
gate:
aes_parity,xs_parity,t1_parity,t2_parity,t3_parity— bit-exact CPU vs GPU per-phase agreement with pos2-chip's reference.sycl_sort_parity,sycl_g_x_parity,sycl_bucket_offsets_parity— the SYCL/AdaptiveCpp backends vs the CUDA reference, so AMD/Intel breakage is caught on NVIDIA hardware too.plot_file_parity— writer + reader round-trip on the final.plot2.
Any change that touches a kernel, the sort path, or the plot file format must keep the parity tests passing at k=22 (quick) and at k=28 (slow — the realistic production k). Output bytes are specified to be identical to the pos2-chip CPU reference; this is the hard invariant.
After a functional change, spot-check one real batch end-to-end with
xchplot2 verify <plot> — zero proofs over 100 random challenges is
a regression even if all parity tests pass.
Short imperative subjects, lowercase scope prefix, no trailing period:
gpu: split xs-sort keys_a to d_storage tail — drops pool VRAM min ~1.3 GB
docs: tighten streaming peak (~7.3 GB measured), add AMD row
CMakeLists: re-enable -O3 for SYCL TUs
Body paragraphs explain why (what invariant was wrong, what the measurement was, what alternative was considered and why it was rejected). The what is in the diff.
- Keep unrelated refactors out of correctness or performance commits.
- Performance changes should cite before/after numbers on a named GPU
at a specified
k. - New runtime knobs go in
README.md's Environment variables table so users can discover them.
The main branch carries the SYCL/AdaptiveCpp port; the
cuda-only
branch is the original CUDA-only path, preserved as the most-tested
NVIDIA configuration. A PR that only helps NVIDIA may still land on
main, but don't regress parity on AMD (gfx1031) along the way.
Open an issue with:
- Exact command line and the full stderr output.
- GPU vendor + model + VRAM (
nvidia-smi -L/rocminfo | grep gfx). - Build flavor: container (service name +
ACPP_GFX/CUDA_ARCH), nativescripts/install-deps.sh, orcargo install. - Whether parity tests pass on your build.