Conversation
The regions were cleared here when passed a numpy.ndarray
alancleary
left a comment
There was a problem hiding this comment.
This PR currently only updates 1 of 4 places a regions parameter is parsed in this way. All 4 should be updated:
def read_arrow()def read()def export()def read_iter()
| if isinstance(regions, str): | ||
| regions = [regions] | ||
| if isinstance(regions, list): | ||
| elif isinstance(regions, list): |
There was a problem hiding this comment.
Switching to elif prevents _prepare_regions(...) from being called in the case that regions is a string, meaning the region is no longer parsed and validated. Apparently there isn't a test for this so I ask that you add this test to the PR and revise your approach, for example:
with pytest.raises(Exception, match=format_error):
test_stats_v3_ingestion.read_arrow(regions="", ...)| elif isinstance(regions, list): | ||
| regions = map(str, self._prepare_regions(regions)) | ||
| else: | ||
| elif regions is None: |
There was a problem hiding this comment.
The else should be left as the base case so that types that aren't explicitly supported don't make it past this block. In other words, add a new elif for type numpy.ndarray and apply the _prepare_regions(...) generator or something similar if required to preserve the numpy.ndarray type.
This fixes hanging task graph nodes in TileDB-Server, due to
regionsbeing cleared when passed as anumpy.ndarray.After clearing the regions, we would later select all regions for the read within prepare_regions_v4, resulting in each task graph node reading the entire dataset and eventually hitting a timeout after 15m.
The reproduction in CLOUD-2882 completes normally with
max_workers=10when running locally against this branch.