Releases: rapidsai/raft
Releases · rapidsai/raft
v26.02.00
What's Changed
🚨 Breaking Changes
- Use CCCL's mdspan implementation by @bdice in #2836
- Default to static linking of libcudart by @bdice in #2890
- Remove
neighbors/,cluster/,distance/,spatial/,sparse/neighbors/apis by @aamijar in #2885 - Remove cutlass and cuco dependencies by @divyegala in #2916
🐛 Bug Fixes
- Include
<thrust/for_each.h>where it is used by @bdice in #2883 - Include CTest module in CMakeLists.txt by @bdice in #2895
- Fix Lanczos Determinism by @aamijar in #2894
- Change compile-time assertion to runtime assertion on is_strided by @bdice in #2909
- Set memory pool through RMM by @viclafargue in #2866
📖 Documentation
🚀 New Features
- Tile Policy for Uint8 Input (Pairwise) by @tarang-jain in #2770
- Add copy_vectorized to RAFT by @lowener in #2900
🛠️ Improvements
- Use strict priority in CI conda tests by @bdice in #2879
- Use strict priority in CI conda tests by @bdice in #2884
- Remove alpha specs from non-RAPIDS dependencies by @bdice in #2886
- Enable merge barriers by @KyleFromNVIDIA in #2889
- Fix is_exhaustive, no longer constexpr by @bdice in #2888
- Add devcontainer fallback for C++ test location by @bdice in #2893
eigshoptional seed by @aamijar in #2899- Empty commit to trigger a build by @bdice in #2904
- Update to C++20 by @divyegala in #2908
- Use SPDX license identifiers in pyproject.toml, bump build dependency floors by @jameslamb in #2910
- Remove
neighbors/detail/faiss_selectby @aamijar in #2902 - Remove
sparse/distanceby @aamijar in #2905 - Add CUDA 13.1 support by @bdice in #2896
- Fix CCCL 3.2 mdspan constexpr issues by @bdice in #2911
- build and test against CUDA 13.1.0 by @jameslamb in #2912
- Laplacian Kernel for COO inputs by @aamijar in #2891
- Empty commit to trigger a build by @jameslamb in #2919
- Use main shared-workflows branch by @jameslamb in #2921
- Fix update-version.sh incorrectly replacing main() function names by @AyodeAwe in #2923
- Lanczos remove dead code by @aamijar in #2918
- wheel builds: react to changes in pip's handling of build constraints by @mmccarty in #2927
- fix(build): build package on merge to
release/*branch by @gforsyth in #2929
New Contributors
Full Changelog: v26.02.00a...v26.02.00
v25.12.00
What's Changed
🚨 Breaking Changes
- More consistent container policies & host memory resource by @achirkin in #2835
- Require CUDA 12.2+ by @jakirkham in #2850
🐛 Bug Fixes
- Correct tagging in the
irecvfunction of the STD communicator by @viclafargue in #2829 - Fix copyright hook file exclusion by @KyleFromNVIDIA in #2840
- Properly guard usage of openmp function calls by @robertmaynard in #2839
- Fix reduce mdspan API by @lowener in #2853
- Fix for STD comm waitall function by @viclafargue in #2852
- Pin Cython pre-3.2.0 and PyTest pre-9 by @jakirkham in #2864
- refactored update-version.sh to handle new branching strategy by @rockhowse in #2863
- Fix laplacian scaling coefficients by @aamijar in #2871
- Revert "Remove Deprecated API (#2813)" by @csadorf in #2881
📖 Documentation
🚀 New Features
- BENCH_PRIMS: convenience reporting of benchmark parameters and read throughput by @achirkin in #2824
🛠️ Improvements
- Update to rapids-logger 0.2 by @bdice in #2828
- Enable
sccache-distconnection pool by @trxcllnt in #2837 - Use main in RAPIDS_BRANCH by @bdice in #2842
- Use main shared-workflows branch by @bdice in #2844
- Use SPDX for all copyright headers by @KyleFromNVIDIA in #2845
- Use ruff-check, ruff-format instead of black, flake8, isort by @KyleFromNVIDIA in #2855
- Remove shims for CCCL < 3.1 compatibility by @bdice in #2858
- Always convert warnings to errors by @jakirkham in #2857
- Lanczos Solver with COO input and cusparse wrappers by @aamijar in #2851
- COO support in sparse matrix utilities by @aamijar in #2861
- Update RMM includes from
<rmm/mr/device/*>to<rmm/mr/*>by @bdice in #2867 - Use
sccache-distbuild cluster for conda and wheel builds by @trxcllnt in #2859 - Remove Deprecated API by @jnke2016 in #2813
New Contributors
- @rockhowse made their first contribution in #2863
Full Changelog: v25.12.00a...v25.12.00
v25.10.00
🐛 Bug Fixes
- Workaround for an illegal memory access on SM 120 devices (#2821) @achirkin
- Fix sparse select_k: don't write beyond min(input_len, k) (#2814) @achirkin
- [BUG] Fix compilation error in matrix/detail/gather.cuh (#2811) @enp1s0
- Fix select_k for negative bfloat16 (#2799) @apivovarov
- Fix index types for coo kernels (#2793) @aamijar
- Fix the GEMM pointer mode setting (#2777) @achirkin
- Fix
host_vector_policyissue (#2739) @viclafargue
📖 Documentation
- Fix UCX-Py mention to UCXX in docstring (#2804) @pentschev
🚀 New Features
- Update cutlass to a version that supports CUDA 13 (#2774) @robertmaynard
🛠️ Improvements
- Fix missed deps in
update-version.sh(#2826) @AyodeAwe - Empty commit to trigger a build (#2816) @msarahan
- Make warpsort kernels use the IEEE 754 bit representation for ordering (#2807) @achirkin
- Configure repo for automatic release notes generation (#2806) @AyodeAwe
- Support < 2 element arrays in
rand_index/adjusted_rand_index(#2805) @jcrist - update dependencies: use cuda-toolkit wheels (#2802) @jameslamb
- Use branch-25.10 again (#2800) @jameslamb
- Remove CMake find UCX package (#2798) @pentschev
- use dask-cuda[cu12, cu13] extras for wheel dependencies (#2797) @jameslamb
- Remove UCX-Py (#2791) @pentschev
- Update rapids-dependency-file-generator (#2790) @KyleFromNVIDIA
- Build and test with CUDA 13.0.0 (#2787) @jameslamb
- Fix template arg passing in
adjusted_rand_index(#2785) @jinsolp - Use build cluster in devcontainers (#2781) @trxcllnt
- Use rapids_cuda_enable_fatbin_compression (#2780) @robertmaynard
- Increase Dask tests verbosity in CI (#2779) @pentschev
- Update rapids_config to handle user defined branch name (#2778) @robertmaynard
- [REVIEW] Fix: skip default_allocation_limit() if unnecessary (#2775) @i-Pear
- Update rapids-build-backend to 0.4.1 (#2773) @KyleFromNVIDIA
- ci(labeler): update labeler action to @v5 (#2772) @gforsyth
- Register bfloat16/bfloat162 in util/vectorized.cuh (#2769) @apivovarov
- Use mdspan::index_type to Only Instantiate Specific Kernels (#2767) @tarang-jain
- Allow latest OS in devcontainers (#2759) @bdice
- Update build infra to support new branching strategy (#2751) @robertmaynard
- Use GCC 14 in conda builds. (#2708) @vyasr
[NIGHTLY] v25.12.00
🔗 Links
🐛 Bug Fixes
- Fix copyright hook file exclusion (#2840) @KyleFromNVIDIA
- Properly guard usage of openmp function calls (#2839) @robertmaynard
- Correct tagging in the
irecvfunction of the STD communicator (#2829) @viclafargue
🛠️ Improvements
v25.08.00
🚨 Breaking Changes
MatrixLinewiseOpcompile-time-invocation (#2701) @aamijar- Remove CUDA 11 from dependencies.yaml (#2695) @KyleFromNVIDIA
- stop uploading packages to downloads.rapids.ai (#2688) @jameslamb
- Reduce instantiations of
Reductionkernels (#2679) @divyegala
🐛 Bug Fixes
- Fix stream sync for Copy2DAsync test (#2744) @lowener
- Several small fixes to make Raft compile with LLVM. (#2735) @vitor1001
- Add missing header (#2734) @vitor1001
- [REVIEW] Fix static initialization order fiasco in
lanczos.cu(#2733) @legrosbuffle - [REVIEW] Fix assertion in
fill_indices_by_rows_kernel. (#2732) @legrosbuffle - libucx: consider post-releases in wheel builds (#2729) @jameslamb
- Fix laplacian cast (#2725) @aamijar
- Fix excess_subsample (#2723) @mfoerste4
- Fix the constructor for
coordinate_structurefor non-zeronnz. (#2717) @legrosbuffle - [REVIEW] Fix compile error when using
mdbufferwith all-static extents. (#2716) @legrosbuffle - Fix unsafe cast
coo_remove_scalar(#2713) @aamijar - Fix laplacian self-loops (#2712) @aamijar
- [REVIEW] Fix a few memory leaks. (#2710) @legrosbuffle
- Fix MST bug for graph with identical edge weights (#2707) @jnke2016
- Missed update accounting for reduction related APIs (#2704) @divyegala
- Adding GH_TOKEN pass-through to summarize job (#2702) @msarahan
- Work around Cython ctypedef bug (#2686) @vyasr
📖 Documentation
- add docs on CI workflow inputs (#2728) @jameslamb
🛠️ Improvements
- An additional small change to remove cuda 11 stuff (#2763) @cjnolet
- Removing CUDA 11 from docs and code (#2757) @cjnolet
- fix(docker): use versioned
-latesttag for allrapidsaiimages (#2745) @gforsyth - Update protocol name for UCX-Py tests (#2743) @pentschev
- Remove sphinx upper bound (#2742) @bdice
- remove cuspatial references, avoid triggering tests on clang-format config changes (#2740) @jameslamb
- MST Edge Case (#2736) @tarang-jain
- Add missing
#include <cassert>incpp/include/raft/core/math.hpp(#2730) @trxcllnt - Update leftover CUDA 12.8 to 12.9 in docs (#2724) @jakirkham
- Fix docs lanczos solver (#2722) @aamijar
- Use CUDA 12.9 in Conda, Devcontainers, Spark, GHA, etc. (#2721) @jakirkham
- Remove nvidia and dask channels (#2720) @vyasr
- [REVIEW] Fix compile error of
abs_opwhen compiling withclang(#2718) @legrosbuffle - Avoid using internal method std::experimental::details::alignTo(). (#2714) @vitor1001
- refactor(shellcheck): fix all remaining warnings/errors (#2703) @gforsyth
MatrixLinewiseOpcompile-time-invocation (#2701) @aamijar- Remove pytest pin (#2699) @vyasr
- Fix several issues that breaks LLVM (#2698) @vitor1001
- Remove CUDA 11 from dependencies.yaml (#2695) @KyleFromNVIDIA
- Remove CUDA 11 devcontainers and update CI scripts (#2690) @bdice
- refactor(rattler): remove cuda11 options and general cleanup (#2689) @gforsyth
- stop uploading packages to downloads.rapids.ai (#2688) @jameslamb
- fix(devcontainers): typo in container name (#2687) @gforsyth
- Reduce instantiations of
Reductionkernels (#2679) @divyegala - Forward-merge branch-25.06 into branch-25.08 (#2675) @divyegala
- Add support for F16 in linalg::transpose (#2672) @enp1s0
- Forward-merge branch-25.06 into branch-25.08 (#2664) @gforsyth
- Support
coo_matrixincoo_symmetrizeandcoo_remove_scalar(#2662) @aamijar - Lanczos Solver
which=SA,SM,LA,LMargument (#2628) @aamijar
v25.06.00
🚨 Breaking Changes
🐛 Bug Fixes
- NCCL comm resource fix (#2692) @viclafargue
- Fix the launch bounds for nn-descent kernel for 1210 and remove nn-descent tests (#2691) @viclafargue
- Prefer host gather when dataset is available both on host and device (#2671) @tfeher
- Fix warnings treated as errors downstream in cuVS (#2644) @achirkin
- Fix nccl_comm.hpp warning: #83-D: type qualifier specified more than once (#2643) @achirkin
- NVTX: null destination pointer warning-treated-as-error (#2639) @achirkin
- Add UCXX and NCCL to
libraftconda recipe (#2636) @divyegala - Fix building cutlass (#2619) @miscco
- Fix COO symmetrization (#2582) @viclafargue
🚀 New Features
- [Feat] add
cudaMemcpy2DAsyncwrapper (#2674) @rhdong - Python wrapper for
device_resources_snmg(#2666) @jinsolp - Laplacian normalization primitives (#2648) @aamijar
- [FEA] Matrix shift rows and columns (#2634) @jinsolp
- Use NCCL wheels from PyPI for CUDA 12 builds (#2629) @divyegala
- Support strided matrix view as an input to matrix::samples_rows (#2626) @enp1s0
- [Feat] add support for bm25 and tfidf (#2567) @jperez999
🛠️ Improvements
- use 'rapids-init-pip' in wheel CI, other CI changes (#2677) @jameslamb
- Dask 2025.4.1 compatibility (#2673) @TomAugspurger
- Finish CUDA 12.9 migration and use branch-25.06 workflows (#2669) @bdice
- Update to clang 20 (#2665) @bdice
- Quote head_rev in conda recipes (#2660) @bdice
- CUDA 12.9 use updated compression flags (#2657) @robertmaynard
- Build and test with CUDA 12.9.0 (#2655) @bdice
- Exclude librmm.so from auditwheel (#2654) @bdice
- Fix cub include in normalize.cuh (#2652) @lowener
- Add support for Python 3.13 (#2649) @gforsyth
- Decoupling multi gpu resources from nccl usage (#2647) @jinsolp
- [BUGFIX] Fixed quoting in wheel paths in pylibraft and raft_dask wheel tests (#2645) @VenkateshJaya
- Download build artifacts from Github for CI (#2640) @VenkateshJaya
- Limit allowed wheel sizes (#2638) @divyegala
- Remove CUDA whole compilation ODR violations (#2633) @divyegala
- refactor(rattler): enable strict channel priority for builds (#2632) @gforsyth
- Vendor RAPIDS.cmake (#2631) @bdice
- Replace
Thrustiterator facilities and replace them withlibcu++ones (#2627) @miscco - Port all conda recipes to
rattler-build(#2623) @gforsyth - Add missing thrust include (#2618) @miscco
- Moving wheel builds to specified location and uploading build artifacts to Github (#2617) @VenkateshJaya
- Fixed pytest marker warnings by removing unused pytest.ini (#2591) @TomAugspurger
- Introduction of the
raft::device_resources_snmgtype (#2549) @viclafargue - Create a NCCL sub-communicator using ncclCommSplit (#2495) @seunghwak
v25.04.00
🚨 Breaking Changes
- Account for cugraph API breakage (#2581) @divyegala
- Use new rapids-logger library (#2566) @vyasr
🐛 Bug Fixes
- Backport build patch fix (#2620) @KyleFromNVIDIA
- Revert "Temporarily increase
max_days_without_success(#2602)" (#2613) @divyegala - Relax max duplicates in batched NN Descent (#2610) @jinsolp
- [Fix] Lanczos solver gemv fix (#2607) @aamijar
- [Fix]
select-k-csrfailure on CUDA11.x + H100 (#2604) @rhdong - Temporarily increase
max_days_without_success(#2602) @divyegala - Swap
blocksandthreads_per_blockincompute_graph_laplacian(#2597) @jcrist - [BUG] Fix illegal memory access in linalg::reduction (#2592) @enp1s0
- Require sphinx<8.2.0 (#2590) @KyleFromNVIDIA
- Account for cugraph API breakage (#2581) @divyegala
#include <numeric>forstd::iota(#2578) @benfred- Fix Laplacian calculation in spectral partitioning (#2568) @wphicks
- Take argument by
const&as the input range is const (#2558) @miscco - Allow some of the sparse utility functions to handle larger matrices (#2541) @viclafargue
🛠️ Improvements
- ci: pre-filter 11.4 jobs before they are enabled in shared workflows (#2608) @gforsyth
- Use conda-build instead of conda-mambabuild (#2595) @bdice
- Replace
cub::Sumandcub::Maxwithcuda::std::plusandcuda::maximum(#2594) @miscco - Update all
conda_build_config.yamls RAPIDS UCX version (#2589) @jakirkham - Drop
cub::TransformInputIteratorin favor ofthrust::transform_iterator(#2588) @miscco - Consolidate more Conda solves in CI (#2587) @KyleFromNVIDIA
- Fix duplicate indices in batch NN Descent (#2586) @jinsolp
- Require CMake 3.30.4 (#2584) @robertmaynard
- Create Conda CI test env in one step (#2580) @KyleFromNVIDIA
- Use shared-workflows branch-25.04 (#2576) @bdice
- Add
shellcheckto pre-commit and fix warnings (#2575) @gforsyth - Add build_type input field for
test.yaml(#2573) @gforsyth - Use
rapids-pip-retryin CI jobs that might need retries (#2571) @gforsyth - Avoid limited memory adaptor issue in balanced KMeans (#2570) @csadorf
- update telemetry and retarget 25.04 (#2569) @msarahan
- Use new rapids-logger library (#2566) @vyasr
- disallow fallback to Make in Python builds (#2563) @jameslamb
- Forward-merge branch-25.02 into branch-25.04 (#2561) @bdice
- Migrate to NVKS for amd64 CI runners (#2559) @bdice
- Add
verify-codeownershook (#2557) @KyleFromNVIDIA
v25.02.00
🚨 Breaking Changes
- Update pip devcontainers to UCX 1.18 (#2550) @jameslamb
- Switch over to rapids-logger (#2530) @vyasr
- Adapt to rmm logger changes (#2513) @vyasr
🐛 Bug Fixes
- Rename test to tests. (#2546) @bdice
- Fix bit order of RMAT Rectangular Generator to match expectation (#2542) @mfoerste4
- Fix broken link to python doc (#2537) @lowener
- Fix lanczos solver integer overflow (#2536) @viclafargue
- Fix rnd bit generation in rmat_rectangular_kernel (#2524) @tfeher
📖 Documentation
🚀 New Features
- Add cuda 12.8 support (#2551) @robertmaynard
- Add support for different data type of bitset (#2535) @lowener
- [Feat] Support
bitset_to_csr(#2523) @rhdong - Remove upper bounds on cuda-python to allow 12.6.2 and 11.8.5 (#2517) @bdice
🛠️ Improvements
- Revert CUDA 12.8 shared workflow branch changes (#2560) @vyasr
- Build and test with CUDA 12.8.0 (#2555) @bdice
- Update pip devcontainers to UCX 1.18 (#2550) @jameslamb
- use dynamic CUDA wheels on CUDA 11 (#2548) @jameslamb
- Normalize whitespace (#2547) @bdice
- Use cuda.bindings layout. (#2545) @bdice
- Revert "Introduction of the
raft::device_resources_snmgtype (#2487)" (#2543) @cjnolet - Add missing
#include <cstdint>(#2540) @jakirkham - Use GCC 13 in CUDA 12 conda builds. (#2539) @bdice
- Use rapids-cmake for the logger (#2534) @vyasr
- Check if nightlies have succeeded recently enough (#2533) @vyasr
- remove unused 'joblib' and 'numba' dependencies, other packaging cleanup (#2532) @jameslamb
- introduce libraft wheels (#2531) @jameslamb
- Switch over to rapids-logger (#2530) @vyasr
- reduce duplication, removed unused things in dependencies.yaml (#2529) @jameslamb
- Update cuda-python lower bounds to 12.6.2 / 11.8.5 (#2522) @bdice
- [Opt] Optimizing the performance of
bitmap_to_csr(#2516) @rhdong - prefer system install of UCX in devcontainers, update outdated RAPIDS references (#2514) @jameslamb
- Adapt to rmm logger changes (#2513) @vyasr
- Require approval to run CI on draft PRs (#2512) @bdice
- Shrink wheel size limit following removal of vector search APIs. (#2509) @bdice
- Forward-merge branch-24.12 to branch-25.02 (#2508) @bdice
- Introduction of the
raft::device_resources_snmgtype (#2487) @viclafargue - Add breaking change workflow trigger (#2482) @AyodeAwe
- Remove 'sample' parameter from stats::mean API (#2389) @mfoerste4
v24.12.00
🚨 Breaking Changes
🐛 Bug Fixes
- Skip gtests for new lanczos solver when CUDA version is 11.4 or below. (#2520) @cjnolet
- Switch
asserttostatic_assert(#2510) @divyegala - Revert use of new Lanczos solver in spectral clustering (#2507) @lowener
- Put a ceiling on cuda-python (#2486) @bdice
- Don't presume pointers location infers usability. (#2480) @robertmaynard
- Use Python for sccache hit rate computation. (#2474) @bdice
- Allow compilation with CUDA 12.6.1 (#2469) @robertmaynard
🚀 New Features
🛠️ Improvements
- Skip gtests for Rmat Lanczos tests with cuda <= 11.4 (#2525) @benfred
- Upgrade to latest cutlass version (#2503) @vyasr
- Removing some left over places where implicit instantiations were being ignored in headers (#2501) @cjnolet
- Remove leftover template project code. (#2500) @bdice
- 2412 remove libraft vss instantiations (#2498) @cjnolet
- Remove raft-ann-bench (#2497) @cjnolet
- Pin FAISS Version for raft-ann-bench (#2496) @tarang-jain
- enforce wheel size limits and README formatting in CI, put a ceiling on Cython dependency (#2490) @jameslamb
- Do not initialize the pinned mdarray at construction time (#2478) @achirkin
- Use environment variables in cache hit rate computation. (#2475) @bdice
- devcontainer: replace
VAULT_HOSTwithAWS_ROLE_ARN(#2472) @jjacobelli - print sccache stats in builds (#2470) @jameslamb
- make package installations in CI stricter (#2467) @jameslamb
- Prune workflows based on changed files (#2466) @KyleFromNVIDIA
- Merge branch-24.10 into branch-24.12 (#2461) @jameslamb
- Update all rmm imports to use pylibrmm/librmm (#2451) @Matt711
v24.10.00
🚨 Breaking Changes
🐛 Bug Fixes
- Disable NN Descent Batch tests temporarily (#2453) @divyegala
- Fix sed syntax in
update-version.sh(#2441) @raydouglass - Use runtime check of cudart version for eig (#2430) @lowener
- [BUG] Fix bitset function visibility (#2429) @lowener
- Exclude any kernel symbol that uses cutlass (#2425) @robertmaynard
🚀 New Features
- [Feat] add
repeat,sparsity,eval_n_elementsAPIs tobitset(#2439) @rhdong - [Opt] Enforce the UT Coverity and add benchmark for
transpose(#2438) @rhdong - [FEA] Support for half-float mixed precise in brute-force (#2382) @rhdong
🛠️ Improvements
- bump NCCL floor to 2.19 (#2458) @jameslamb
- Deprecating vector search APIs and updating README accordingly (#2448) @cjnolet
- Update update-version.sh to use packaging lib (#2447) @AyodeAwe
- Switch traceback to
native(#2446) @galipremsagar - bump NCCL floor to 2.18.1.1 (#2443) @jameslamb
- Add missing
cuda_suffixed: true(#2440) @trxcllnt - Use CI workflow branch 'branch-24.10' again (#2437) @jameslamb
- Update to flake8 7.1.1. (#2435) @bdice
- Update fmt (to 11.0.2) and spdlog (to 1.14.1). (#2433) @jameslamb
- Allow coo_sort to work on int64_t indices (#2432) @benfred
- Adding NCCL clique to the RAFT handle (#2431) @viclafargue
- Add support for Python 3.12 (#2428) @jameslamb
- Update rapidsai/pre-commit-hooks (#2420) @KyleFromNVIDIA
- Drop Python 3.9 support (#2417) @jameslamb
- Use CUDA math wheels (#2415) @KyleFromNVIDIA
- Remove NumPy <2 pin (#2414) @seberg
- Update pre-commit hooks (#2409) @KyleFromNVIDIA
- Improve update-version.sh (#2408) @bdice
- Use tool.scikit-build.cmake.version, set scikit-build-core minimum-version (#2406) @jameslamb
- [FEA] Batching NN Descent (#2403) @jinsolp
- Update pip devcontainers to UCX v1.17.0 (#2401) @jameslamb
- Merge branch-24.08 into branch-24.10 (#2397) @jameslamb