-
Notifications
You must be signed in to change notification settings - Fork 140
Description
Currently UDF checkpoints are scoped to a job's run group — _find_udf_checkpoint only searches by rerun_from_job_id. Combined with checkpoint cleanup, this means previously computed results can't be reused after expiry, even if the exact same computation was done before.
With #1641 (removing transient dependencies) and #1642 (storing chain hash in dataset version), chain hashes become self-contained and stable across jobs. This enables global checkpoint reuse.
Two levels:
.save() — use dataset_version.hash (#1642)
No checkpoint needed. .save() searches dataset versions for a matching hash — if found, the dataset already exists, skip the chain. The dataset version itself is the checkpoint and never expires.
UDF steps — global checkpoint search
The UDF skip path (_skip_udf) already copies output tables from previous jobs. The change is to search globally for a matching hash_output across all jobs, not just rerun_from_job_id. If a matching checkpoint + output table exists from any previous job, reuse it.
The UDF continue/resume path (_continue_udf) stays job-scoped — it depends on a specific parent job's partial output table for crash recovery.
This means:
.save()stops creating checkpoints entirely, uses dataset_version.hash instead- UDF skip becomes global — any previous job's output can be reused
- UDF continue/resume stays per-job (partial output is job-specific)
- Checkpoint cleanup no longer causes unnecessary re-computation for
.save()(dataset versions persist)