Fix S3 path lookup hanging on large prefixes#6849
Conversation
The lookup method paginated through all objects under an S3 prefix (maxKeys=250) to check path existence. On prefixes with millions of objects this caused the main thread to hang for minutes parsing massive XML responses. Observed in production: nf-schema parameter validation calls Files.exists() on an S3 outdir path, which triggers S3ObjectSummaryLookup.lookup. With a large prefix like s3://bucket/results containing many objects from previous runs, the pagination loop iterated indefinitely. Fix: use maxKeys=2 and remove pagination. The matchName check only needs to find the exact key or its first child (key + "/"), which are guaranteed to appear in the first results due to S3 lexicographic ordering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
|
Tests pass, however in my tests execution fails |
|
There is something wrong in this approach. I have a folder with similar names such as the following Then I made a test to check if folder 'a' exists and it says false. In master is working, but not in this branch I see the lexicographical order is guaranteed in general buckets with I do not see this statement in I am debugging it to see what is wrong and if listObjectsV2 fixes it |
|
The I see two solutions but imply to make two calls in the worst case:
|
|
I guess list (capped to 10) should cover most of the case, otherwise falling back to the double head. wdyt? |
|
I think we wil not need to fallback to double head. The first call, either the list or head, will cover the non directory case. So, the fallback just need to check the directory, and the S3Object for a folder only requires the key ending with '/' |
|
Not getting what are the symbols that can before '/' that can be part of a file. Can a test be made to capture this case? |
|
In lexicographic order, there are several symbols (such as '-' or '.') that go before '/'. So, If you have a set of key names 'name/', 'name-1', 'name-2', 'name.txt'. In lexicographic order, 'name-1', 'name-2' and 'name.txt' will appear before 'name/'. This is the reason why the PR does not always work. We could increase to 10 to have more chance to get the folder in the first try but we need always the fallback. I added a test in #6851 that is reproducing this behaviour |
|
I have checked the approach with the |
|
Ok, let's go with that |

Summary
Problem
S3ObjectSummaryLookup.lookup() is used by S3FileSystemProvider.checkAccess() to verify if an S3 path exists. The method paginated through all objects matching the prefix in batches of 250. On prefixes with millions of objects (e.g. s3://bucket/results accumulated from many pipeline runs), this caused the main thread to hang for minutes parsing massive XML responses from S3.
Observed in production: nf-schema FormatDirectoryPathEvaluator calls Files.exists() on an S3 outdir path during parameter validation. With a prefix containing many objects, the main thread hung for 7+ minutes (180s CPU) stuck in XmlDomParser.parseElement, parsing unbounded listObjects responses.
Fix
The matchName() check only needs to find either the exact key or its first child (key/). Since S3 returns objects in lexicographic order, these are guaranteed to appear in the first 1-2 results. Using maxKeys=2 without pagination is sufficient and eliminates the unbounded listing.
Test plan