-
Notifications
You must be signed in to change notification settings - Fork 140
Open
Labels
Description
Extracted from #1639 (point 1).
read_storage() computes its starting hash via _starting_step_hash before apply_listing_pre_step() runs. This means the hash is based on the bare listing dataset name (e.g. lst__s3://my-bucket) instead of the actual dataset version UUID. As a result, update=True with new files arriving produces the same hash — stale checkpoints are reused and new data is silently ignored.
Fix: call apply_listing_pre_step() in DatasetQuery.hash() so the listing is resolved before hash computation. This is safe because apply_listing_pre_step() is idempotent — if already resolved, it's a no-op. When apply_steps() runs later, it skips the already-resolved listing.
Reactions are currently unavailable