feat(#950): CAGRA GPU index persistence#985
Merged
jamie8johnson merged 4 commits intomainfrom Apr 15, 2026
Merged
Conversation
Pulls in the new cuvsCagraSerialize / cuvsCagraDeserialize Rust wrappers added to the fork so cqs can persist CAGRA indices across daemon restarts (issue #950). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ence Implements native cuVS CagraSerialize + sidecar to let the daemon reuse a persisted CAGRA graph across restarts. The sidecar is a JSON file next to the binary blob and carries: - magic + version (for future format changes) - dim + chunk_count (reject on drift after reindex) - splade_generation (coarse staleness signal from the v20 delete trigger) - id_map (cuVS gives us nothing to translate internal indices back) - blake3 of the blob (catch disk rot before handing it to cuVS) save / save_with_store write the blob, checksum it, then write the sidecar atomically via temp + rename — if the sidecar write fails the partial blob is removed so we never leave an orphan. All failures are thiserror variants so callers can log + fall through to rebuild. load validates the full chain (magic, version, dim, chunk_count, sidecar id_map length, blake3) before handing the blob to cuvsCagraDeserialize. On any mismatch the caller is expected to delete_persisted() and rebuild. CQS_CAGRA_PERSIST=0 disables both save and load for A/B testing and to reduce on-disk footprint on systems that don't want it. Defaults to enabled; cached in OnceLock. Round-trip test proves bit-exact (neighbour id + score.to_bits() equal) search results before and after save/load on a 32-vector index. Negative tests cover dim mismatch, chunk_count mismatch, sidecar absence, and corrupted blob detection. Refs #950 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On every daemon startup and every CLI invocation that asks for a CAGRA
index we now:
1. Check for {cqs_dir}/index.cagra + .meta
2. If they validate against the current store dim + chunk_count, hand the
deserialized index back to the caller (tracing backend=cagra
source=persisted) — this skips the ~30s rebuild on a 13k-chunk repo.
3. On any load failure (missing, stale, checksum mismatch) delete both
files and fall through to the existing build_from_store path.
4. After a successful rebuild, save_with_store persists the new graph
(tracing backend=cagra source=rebuilt) so the next restart pays the
fast-path cost. Save failures are warn-logged, never fatal.
Skipped entirely when CQS_CAGRA_PERSIST=0. HNSW path is unchanged; the
<threshold / gpu_unavailable log lines still emit the same structured
event operators grep for.
Refs #950
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README: add CQS_CAGRA_PERSIST to the env var table. - CONTRIBUTING: note that cagra.rs now handles save/load via cuvsCagraSerialize in the architecture overview. - CHANGELOG: describe the persistence feature under [Unreleased] with the behaviour of stale detection and the disable flag. Refs #950 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #950. Persists the CAGRA GPU index to disk with a
.cagrapayload +.cagra.metasidecar (blake3 checksum, dim, chunk count, model id). Daemon hot-restart now skips the ~30s rebuild when the meta matches the current corpus identity, taking cold start from ~30s to ~5s.Changes
src/cagra.rsgainssave,load,delete_persisted+CagraMetasidecar.CQS_CAGRA_PERSIST(default on) — set to0to disable persistence.build_vector_index_with_config: on hit, verify checksum → skip rebuild; on miss, rebuild and save.cuvs-patchedfork pushed to branchadd-serialize-deserializewith the upstream serde bindings. Cargo.toml points[patch.crates-io]at that branch.Test plan
cargo test --features gpu-index --lib cagra.cagrafile → verify graceful rebuildUpstream note
The cuvs patch branch
jamie8johnson/cuvs-patched:add-serialize-deserializetracks rapidsai/cuvs#2019 as the PR upstream for the serde bindings. If that upstream lands, we can drop the[patch.crates-io]entry.🤖 Generated with Claude Code