Skip to content

Fix S3 path lookup hanging on large prefixes#6849

Closed
pditommaso wants to merge 1 commit intomasterfrom
fix/s3-lookup-unbounded-pagination
Closed

Fix S3 path lookup hanging on large prefixes#6849
pditommaso wants to merge 1 commit intomasterfrom
fix/s3-lookup-unbounded-pagination

Conversation

@pditommaso
Copy link
Member

Summary

  • Fix S3ObjectSummaryLookup.lookup() causing unbounded pagination when checking existence of S3 paths with large number of objects under the prefix
  • Replace pagination loop (maxKeys=250 + marker) with a single listObjects call using maxKeys=2

Problem

S3ObjectSummaryLookup.lookup() is used by S3FileSystemProvider.checkAccess() to verify if an S3 path exists. The method paginated through all objects matching the prefix in batches of 250. On prefixes with millions of objects (e.g. s3://bucket/results accumulated from many pipeline runs), this caused the main thread to hang for minutes parsing massive XML responses from S3.

Observed in production: nf-schema FormatDirectoryPathEvaluator calls Files.exists() on an S3 outdir path during parameter validation. With a prefix containing many objects, the main thread hung for 7+ minutes (180s CPU) stuck in XmlDomParser.parseElement, parsing unbounded listObjects responses.

Fix

The matchName() check only needs to find either the exact key or its first child (key/). Since S3 returns objects in lexicographic order, these are guaranteed to appear in the first 1-2 results. Using maxKeys=2 without pagination is sufficient and eliminates the unbounded listing.

Test plan

  • Verified matchName logic: exact key match or key/ child always appears first in lexicographic S3 listing
  • Smoke test with large S3 prefix to confirm fast Files.exists() check

The lookup method paginated through all objects under an S3 prefix
(maxKeys=250) to check path existence. On prefixes with millions of
objects this caused the main thread to hang for minutes parsing massive
XML responses.

Observed in production: nf-schema parameter validation calls
Files.exists() on an S3 outdir path, which triggers
S3ObjectSummaryLookup.lookup. With a large prefix like
s3://bucket/results containing many objects from previous runs,
the pagination loop iterated indefinitely.

Fix: use maxKeys=2 and remove pagination. The matchName check only
needs to find the exact key or its first child (key + "/"), which
are guaranteed to appear in the first results due to S3 lexicographic
ordering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link

netlify bot commented Feb 19, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 92ccdc9
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/699779d3de88f7000823666e

@pditommaso pditommaso requested a review from jorgee February 19, 2026 21:00
@pditommaso
Copy link
Member Author

Tests pass, however in my tests execution fails

Feb-19 21:56:42.056 [TaskFinalizer-8] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (S07604624_Padded_Agilent_SureSelectXT_allexons_V6_UTR); work-dir=s3://nextflow-ci-dev/test-sarek/2a/010b2fe3d028c01abf368bbd3934c9
  error [nextflow.exception.MissingFileException]: Cannot access directory: '/nextflow-ci-dev/test-sarek/2a/010b2fe3d028c01abf368bbd3934c9'

@jorgee
Copy link
Contributor

jorgee commented Feb 20, 2026

There is something wrong in this approach. I have a folder with similar names such as the following
image

Then I made a test to check if folder 'a' exists and it says false. In master is working, but not in this branch

    def 'should check s3 folder exists with similar names' () {
        when:
        def result = nextflow.cloud.aws.util.S3PathFactory.create('s3:///jorgee-eu-west1-test1/test_lexicorder/a')
        then:
        result.exists() == true
        result.isDirectory() == true
    }

I see the lexicographical order is guaranteed in general buckets with listObjectsV2 call
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

Sorting order of returned objects
General purpose bucket - For general purpose buckets, ListObjectsV2 returns objects in lexicographical order based on their key names.

Directory bucket - For directory buckets, ListObjectsV2 does not return objects in lexicographical order.

I do not see this statement in listObjects but in nextMarker it says something about S3 objects in alphabetical sort, but not sure if it always applies.

I am debugging it to see what is wrong and if listObjectsV2 fixes it

@jorgee
Copy link
Contributor

jorgee commented Feb 20, 2026

The listObjectV2 has the same problem. There are several symbols that are before '/' that can be part of a file.
I have also tried to play with delimiter, but no success.

I see two solutions but imply to make two calls in the worst case:

  1. Keep the same code, in most of the cases it will get 'a' or 'a/', and when not found, add a second listObject call with key+'/' to be sure the folder exists. I have implemented it in Fix S3 lookup unbounded pagination with double call #6851

  2. Another alternative is making two HeadObject requests. One just with key. If fails another with key+'/'. The `HeadObjectResponse is providing the content length and last modified. So, they are also valid. This require more modifications in the code but I think head calls are cheaper.

@pditommaso
Copy link
Member Author

I guess list (capped to 10) should cover most of the case, otherwise falling back to the double head. wdyt?

@jorgee
Copy link
Contributor

jorgee commented Feb 20, 2026

I think we wil not need to fallback to double head. The first call, either the list or head, will cover the non directory case. So, the fallback just need to check the directory, and the S3Object for a folder only requires the key ending with '/'

@pditommaso
Copy link
Member Author

Not getting what are the symbols that can before '/' that can be part of a file. Can a test be made to capture this case?

@jorgee
Copy link
Contributor

jorgee commented Feb 20, 2026

In lexicographic order, there are several symbols (such as '-' or '.') that go before '/'. So, If you have a set of key names 'name/', 'name-1', 'name-2', 'name.txt'. In lexicographic order, 'name-1', 'name-2' and 'name.txt' will appear before 'name/'. This is the reason why the PR does not always work. We could increase to 10 to have more chance to get the folder in the first try but we need always the fallback. I added a test in #6851 that is reproducing this behaviour

@jorgee
Copy link
Contributor

jorgee commented Feb 20, 2026

I have checked the approach with the headObject for a directory, and it is not working because a directory is not an object. So, the option in #6851 is the only way.

@pditommaso
Copy link
Member Author

Ok, let's go with that

@pditommaso pditommaso closed this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments