Skip to content

Fix read_storage hash computed before listing resolves #1655

@ilongin

Description

@ilongin

Extracted from #1639 (point 1).

read_storage() computes its starting hash via _starting_step_hash before apply_listing_pre_step() runs. This means the hash is based on the bare listing dataset name (e.g. lst__s3://my-bucket) instead of the actual dataset version UUID. As a result, update=True with new files arriving produces the same hash — stale checkpoints are reused and new data is silently ignored.

Fix: call apply_listing_pre_step() in DatasetQuery.hash() so the listing is resolved before hash computation. This is safe because apply_listing_pre_step() is idempotent — if already resolved, it's a no-op. When apply_steps() runs later, it skips the already-resolved listing.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions