Fix read_storage hash computed before listing resolves

Extracted from #1639 (point 1).

`read_storage()` computes its starting hash via `_starting_step_hash` before `apply_listing_pre_step()` runs. This means the hash is based on the bare listing dataset name (e.g. `lst__s3://my-bucket`) instead of the actual dataset version UUID. As a result, `update=True` with new files arriving produces the same hash — stale checkpoints are reused and new data is silently ignored.

Fix: call `apply_listing_pre_step()` in `DatasetQuery.hash()` so the listing is resolved before hash computation. This is safe because `apply_listing_pre_step()` is idempotent — if already resolved, it's a no-op. When `apply_steps()` runs later, it skips the already-resolved listing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix read_storage hash computed before listing resolves #1655

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix read_storage hash computed before listing resolves #1655

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions