Skip to content

feat(#950): CAGRA GPU index persistence#985

Merged
jamie8johnson merged 4 commits intomainfrom
refactor/950-cagra-persistence
Apr 15, 2026
Merged

feat(#950): CAGRA GPU index persistence#985
jamie8johnson merged 4 commits intomainfrom
refactor/950-cagra-persistence

Conversation

@jamie8johnson
Copy link
Copy Markdown
Owner

Summary

Closes #950. Persists the CAGRA GPU index to disk with a .cagra payload + .cagra.meta sidecar (blake3 checksum, dim, chunk count, model id). Daemon hot-restart now skips the ~30s rebuild when the meta matches the current corpus identity, taking cold start from ~30s to ~5s.

Changes

  • src/cagra.rs gains save, load, delete_persisted + CagraMeta sidecar.
  • New env var CQS_CAGRA_PERSIST (default on) — set to 0 to disable persistence.
  • Wired into build_vector_index_with_config: on hit, verify checksum → skip rebuild; on miss, rebuild and save.
  • cuvs-patched fork pushed to branch add-serialize-deserialize with the upstream serde bindings. Cargo.toml points [patch.crates-io] at that branch.
  • 8 new tests including a bit-exact round-trip test.

Test plan

  • cargo test --features gpu-index --lib cagra
  • Manual daemon restart timing: cold (no cache) vs warm (cache hit) on a 13k-chunk index
  • Corruption test: truncate .cagra file → verify graceful rebuild

Upstream note

The cuvs patch branch jamie8johnson/cuvs-patched:add-serialize-deserialize tracks rapidsai/cuvs#2019 as the PR upstream for the serde bindings. If that upstream lands, we can drop the [patch.crates-io] entry.

🤖 Generated with Claude Code

jamie8johnson and others added 4 commits April 15, 2026 04:53
Pulls in the new cuvsCagraSerialize / cuvsCagraDeserialize Rust wrappers
added to the fork so cqs can persist CAGRA indices across daemon restarts
(issue #950).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ence

Implements native cuVS CagraSerialize + sidecar to let the daemon reuse a
persisted CAGRA graph across restarts. The sidecar is a JSON file next to
the binary blob and carries:
- magic + version (for future format changes)
- dim + chunk_count (reject on drift after reindex)
- splade_generation (coarse staleness signal from the v20 delete trigger)
- id_map (cuVS gives us nothing to translate internal indices back)
- blake3 of the blob (catch disk rot before handing it to cuVS)

save / save_with_store write the blob, checksum it, then write the sidecar
atomically via temp + rename — if the sidecar write fails the partial blob
is removed so we never leave an orphan. All failures are thiserror variants
so callers can log + fall through to rebuild.

load validates the full chain (magic, version, dim, chunk_count, sidecar
id_map length, blake3) before handing the blob to cuvsCagraDeserialize. On
any mismatch the caller is expected to delete_persisted() and rebuild.

CQS_CAGRA_PERSIST=0 disables both save and load for A/B testing and to
reduce on-disk footprint on systems that don't want it. Defaults to
enabled; cached in OnceLock.

Round-trip test proves bit-exact (neighbour id + score.to_bits() equal)
search results before and after save/load on a 32-vector index. Negative
tests cover dim mismatch, chunk_count mismatch, sidecar absence, and
corrupted blob detection.

Refs #950

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On every daemon startup and every CLI invocation that asks for a CAGRA
index we now:

1. Check for {cqs_dir}/index.cagra + .meta
2. If they validate against the current store dim + chunk_count, hand the
   deserialized index back to the caller (tracing backend=cagra
   source=persisted) — this skips the ~30s rebuild on a 13k-chunk repo.
3. On any load failure (missing, stale, checksum mismatch) delete both
   files and fall through to the existing build_from_store path.
4. After a successful rebuild, save_with_store persists the new graph
   (tracing backend=cagra source=rebuilt) so the next restart pays the
   fast-path cost. Save failures are warn-logged, never fatal.

Skipped entirely when CQS_CAGRA_PERSIST=0. HNSW path is unchanged; the
<threshold / gpu_unavailable log lines still emit the same structured
event operators grep for.

Refs #950

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README: add CQS_CAGRA_PERSIST to the env var table.
- CONTRIBUTING: note that cagra.rs now handles save/load via
  cuvsCagraSerialize in the architecture overview.
- CHANGELOG: describe the persistence feature under [Unreleased]
  with the behaviour of stale detection and the disable flag.

Refs #950

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jamie8johnson jamie8johnson merged commit 4f23563 into main Apr 15, 2026
8 checks passed
@jamie8johnson jamie8johnson deleted the refactor/950-cagra-persistence branch April 15, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: CagraIndex::save/load — persist GPU index across daemon restarts

1 participant