Skip to content

Fix preparing regions for numpy ndarray.#872

Open
shaunrd0 wants to merge 4 commits intomainfrom
shaunreed/cloud-2882-vcf-unable-to-run-genomics-workflow-example-notebook-on
Open

Fix preparing regions for numpy ndarray.#872
shaunrd0 wants to merge 4 commits intomainfrom
shaunreed/cloud-2882-vcf-unable-to-run-genomics-workflow-example-notebook-on

Conversation

@shaunrd0
Copy link
Member

@shaunrd0 shaunrd0 commented Feb 3, 2026

This fixes hanging task graph nodes in TileDB-Server, due to regions being cleared when passed as a numpy.ndarray.

After clearing the regions, we would later select all regions for the read within prepare_regions_v4, resulting in each task graph node reading the entire dataset and eventually hitting a timeout after 15m.

The reproduction in CLOUD-2882 completes normally with max_workers=10 when running locally against this branch.

The regions were cleared here when passed a numpy.ndarray
This reverts commit c127aff.

Revert "Log both region sets."
This reverts commit ce94552.
Copy link
Member

@alancleary alancleary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR currently only updates 1 of 4 places a regions parameter is parsed in this way. All 4 should be updated:

  1. def read_arrow()
  2. def read()
  3. def export()
  4. def read_iter()

if isinstance(regions, str):
regions = [regions]
if isinstance(regions, list):
elif isinstance(regions, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to elif prevents _prepare_regions(...) from being called in the case that regions is a string, meaning the region is no longer parsed and validated. Apparently there isn't a test for this so I ask that you add this test to the PR and revise your approach, for example:

with pytest.raises(Exception, match=format_error):
    test_stats_v3_ingestion.read_arrow(regions="", ...)

elif isinstance(regions, list):
regions = map(str, self._prepare_regions(regions))
else:
elif regions is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else should be left as the base case so that types that aren't explicitly supported don't make it past this block. In other words, add a new elif for type numpy.ndarray and apply the _prepare_regions(...) generator or something similar if required to preserve the numpy.ndarray type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants