S3 Directory Support by jterapin · Pull Request #3304 · aws/aws-sdk-ruby

jterapin · 2025-10-13T18:50:58Z

Adds directory upload/download to Transfer Manager.

`upload_directory`

Upload all files from a local directory to S3
Optional recursive traversal of subdirectories
Symlink handling with circular reference detection
S3 key prefix support
Filter callback to selectively upload files
Request callback to modify upload parameters per file
Progress callback for transfer monitoring
Configurable failure handling (fail-fast or continue on error)

`download_directory`

Download all objects from an S3 bucket/prefix to a local directory
S3 prefix stripping for clean local paths
Path traversal detection
Filter callback to selectively download objects
Request callback to modify download parameters per object
Progress callback for transfer monitoring
Configurable failure handling (fail-fast or continue on error)
Automatic directory creation for nested structures

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the CHANGELOG.md file (at corresponding gem). For the description entry, please make sure it lives in one line and starts with Feature or Issue in the correct format.
For generated code changes, please checkout below instructions first:
https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md

Thank you for your contribution!

github-actions · 2025-10-13T18:56:23Z

Detected 1 possible performance regressions:

aws-sdk-kinesis.gem_size_kb - z-score regression: 81.5 -> 82.0. Z-score: Infinity

jterapin · 2026-01-14T17:18:24Z

gems/aws-sdk-s3/lib/aws-sdk-s3/transfer_manager.rb

+      #   * `:completed_downloads` - Number of objects successfully downloaded
+      #   * `:failed_downloads` - Number of objects that failed to download
+      #   * `:errors` - Array of errors for failed downloads (only present when failures occur)
+      def download_directory(destination, bucket:, **options)


I need to allow for setting a custom thread for executor here. Same goes for uploader.

jterapin · 2026-01-14T17:23:18Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        request_abort unless opts[:ignore_failure]
+      end
+
+      def process_download_queue(producer, downloader, opts)


In a situation where we need to raise, I decided against doing @queue_executor.kill since there might be work in-flight that hangs. It would be best to exit gracefully and the bubbles up to raise within #build_results

jterapin · 2026-01-14T17:28:48Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

+        raise ArgumentError, 'Invalid directory' unless Dir.exist?(source_directory)
+
+        uploader = FileUploader.new(
+          multipart_threshold: opts.delete(:multipart_threshold),


Self-reminder to documentation this at TransferManager#upload_directory

I need to take a look at other params available on FileUploader and FileDownloader level but I'm concerned about overlapped params if it occurs.

alextwoods

Nice - this is looking good! Great test coverage and documentation. Good thread safety.

alextwoods · 2026-02-04T16:46:10Z

gems/aws-sdk-s3/spec/transfer_manger_spec_helper.rb

Nit - manger mispelled.

alextwoods · 2026-02-04T16:54:42Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        end
+
+        def validate_path(path, key)
+          segments = path.split('/')


Should this be File::SEPERATOR rather than / here? I think the path at this point will come from File.join and so would have os seperator? I might be wrong about that though.

alextwoods · 2026-02-04T16:55:12Z

gems/aws-sdk-s3/lib/aws-sdk-s3/customizations.rb

+    autoload :LegacySigner, 'aws-sdk-s3/legacy_signer'
+
+    # transfer manager + multipart upload/download utilities
+    autoload :DefaultExecutor, 'aws-sdk-s3/default_executor'


Nit: it looks like default_executor is listed twice here.

alextwoods · 2026-02-04T16:56:40Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+
+      attr_reader :client, :executor
+
+      def abort_requested


This might be more clear with a ? suffix (ie abort_requested?) to make it clear that its a status check rather than an action.

alextwoods · 2026-02-04T16:58:47Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        @mutex.synchronize { @abort_requested }
+      end
+
+      def request_abort


"request_abort" feels more verbose than just "abort". I think the internal @abort_requested makes sense, but the method's name I think could just be abort. I assume the request_abort is to make it clear that aborting is async, but I think thats implied with abort already. What do you think?

alextwoods · 2026-02-04T17:03:53Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        downloads, errors = process_download_queue(producer, downloader, download_opts)
+        build_result(downloads, errors)
+      ensure
+        @abort_requested = false


Why are we setting @abort_requested to false here?

alextwoods · 2026-02-04T17:12:51Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        def stream_objects(continuation_token: nil)
+          resp = @client.list_objects_v2(bucket: @bucket, prefix: @s3_prefix, continuation_token: continuation_token)
+          resp.contents&.each do |o|
+            break if @directory_downloader.abort_requested


This will block on the mutex - I'm not sure we always need to that here. It is definitely safe to do so, but likely has a performance hit. Ditto I think with the check on line 91:

begin producer.each do |object| break if abort_requested

Since we're using a SizedQueue for communicating between threads - maybe we could use clear/close on it rather than constantly blocking on the mutex? If we used close It will cause threads waiting to raise ClosedQueueError which we could catch and handle to detect aborts? I haven't fully thought that out, but I think it might simplify the code and avoid locking as much.

alextwoods · 2026-02-04T17:19:58Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_progress.rb

+          @transferred_bytes += bytes_transferred
+          @transferred_files += 1
+
+          @progress_callback.call(@transferred_bytes, @transferred_files)


I see why we're calling the progress_callback inside the synchronize block - it does ensure progress is linear for users. However, depending on what they're doing in the callback (things like IO like printing/writing to file, ect) - this could end up being a small bottleneck. I'm not sure how much that matters vs the fully ordered progress callbacks.

alextwoods · 2026-02-04T17:25:00Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

+          @mutex.synchronize { errors << e }
+          request_abort
+        end
+        upload_attempts.times { completion_queue.pop }


Is there a chance that upload_attemps will ever be larger than the completion queue?

richardwang1124

Nice! Looks pretty good overall. Left a few comments.

richardwang1124 · 2026-02-02T22:32:48Z

gems/aws-sdk-s3/lib/aws-sdk-s3/transfer_manager.rb

      #   you are responsible for shutting it down when finished.
      def initialize(options = {})
        @client = options[:client] || Client.new
        @executor = options[:executor]


What if you added DefaultExecutor here? Like

@executor = options[:executor] || DefaultExecutor.new

richardwang1124 · 2026-02-04T17:17:27Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+        downloads, errors = process_download_queue(producer, downloader, download_opts)
+        build_result(downloads, errors)
+      ensure
+        @abort_requested = false


Does this need @mutex.synchronize?

richardwang1124 · 2026-02-04T17:30:28Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_uploader.rb

+        uploads, errors = process_upload_queue(producer, uploader, upload_opts)
+        build_result(uploads, errors)
+      ensure
+        @abort_requested = false


Same as my previous comment

richardwang1124 · 2026-02-04T17:39:42Z

gems/aws-sdk-s3/lib/aws-sdk-s3/directory_downloader.rb

+          @mutex.synchronize { errors << e }
+          request_abort
+        end
+        download_attempts.times { completion_queue.pop }


Is it guaranteed that download_attempts will be less than or equal to completion_queue size?

jterapin added 29 commits October 7, 2025 11:35

Add executor support

8c7ca45

Add changelog entry

c21969a

Update TM with executor changes

39ecf0a

Remove thread count support from MPU

a3f2b9f

Update Object usage of executor

3156f7c

Add documentation/remove unused methods from DefaultExecutor

84c9966

Add Default Executor specs

8e16a3b

Update TM docs and impl

db1cb62

Update streaming MPU to use executor

f907c3b

More MP Stream updates

7cb940a

Update specs

4003536

Update interfaces

7dddda9

Update specs

481f198

Update changelog

88bf44a

Minor updates

c1a25cd

Fix failing specs

7522a16

Merge branch 'version-3' into s3-executor-support

89cffe7

Feedback - address sleep in specs

9eea233

Feedback - update method name for cleanup_team_file

75b0d96

Feedback - wrap checksum callback

ad943ee

Feedback - update method name in MPU

f1fc86a

Feedback - streamline handling of progress callbacks

09eae68

Feedback - streamline docs

e824de0

Merge branch 'version-3' into s3-executor-support

c073349

Feedback - streamline opts

cd91eb7

Feedback - remove sleep from specs when possible

abf78d6

Feedback - update to use 10 threads only

04a287f

Add directory features

54b9add

Add temp changelog entry

ca6c2ae

jterapin added 23 commits January 12, 2026 10:56

Add rubocop fix

22df10c

Add directory uploader to tm

3dae2ff

Refactor Directory Downloader

58fcfe5

Add fixes

f83d174

Add specs

569f1f7

Fixes

a173b2d

Adjustments

9c331eb

Mini refactors

ce9b65f

Refactors

78fcf15

Update changelog

31867db

Clean up

853be51

Fixes

95ec3db

Populate errors differently

12f0338

Fix

7a0f135

Shutdown executor gracefully

e041a1e

More streamlining

a76bd21

Merge branch 'version-3' into s3-directory-support

a9f89c2

Update documentation

ea4f261

Clean specs

9d9f6c6

Block path traversal keys

c5238ca

Remove comments

735c2b3

Refactor

9341672

Refactors

bc13fae

jterapin marked this pull request as ready for review January 14, 2026 17:30

jterapin commented Jan 14, 2026

View reviewed changes

jterapin added 3 commits January 14, 2026 12:41

Mini refactors

78200c9

Merge version-3 into branch

36042f2

Scope queue executor

7812d29

alextwoods reviewed Feb 4, 2026

View reviewed changes

richardwang1124 reviewed Feb 4, 2026

View reviewed changes

Conversation

jterapin commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

upload_directory

download_directory

Uh oh!

github-actions bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alextwoods left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardwang1124 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jterapin commented Oct 13, 2025 •

edited

Loading

`upload_directory`

`download_directory`

github-actions bot commented Oct 13, 2025 •

edited

Loading