Skip to content

Support setting Content-Type when uploading a Zarr entry#2171

Closed
jwodder wants to merge 3 commits intomasterfrom
gh-1870
Closed

Support setting Content-Type when uploading a Zarr entry#2171
jwodder wants to merge 3 commits intomasterfrom
gh-1870

Conversation

@jwodder
Copy link
Copy Markdown
Contributor

@jwodder jwodder commented Feb 5, 2025

Closes #1870.

@dandi/dandiarchive Should I add tests for this? If so, how?

@jwodder jwodder added the zarr Issues with Zarr hosting/processing/etc. label Feb 5, 2025
self.storage.generate_presigned_put_object_url(self.s3_path(o['path']), o['base64md5'])
self.storage.generate_presigned_put_object_url(
self.s3_path(o['path']), o['base64md5'], content_type=o['content_type']
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while at it, should may be path_md5s to be renamed (it is internal, should be ok AFAIK) into e.g. path_objs or path_recs to signal that it is not just a list (or mapping) of paths to md5s?

jwodder and others added 2 commits February 5, 2025 11:50
@yarikoptic
Copy link
Copy Markdown
Member

@dandi/dandiarchive Should I add tests for this? If so, how?

now that you added to minio interface, I think test should be adjusted and/or added to verify that functioning to https://github.com/dandi/dandi-archive/blob/HEAD/dandiapi/zarr/tests/test_zarr_upload.py

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Feb 7, 2025

@yarikoptic None of the tests in that file both upload to a Zarr and request the uploaded entry from minio, so I'm not sure exactly how to proceed.

@dandi/dandiarchive Please advise.

Copy link
Copy Markdown
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me. Jake will advise on how to go about testing.

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Feb 12, 2025

I tested this in the client using this code, with Docker Compose adjusted to point to a local build of this PR. I determined that the client does need to set the "Content-Type" header when uploading to the signed URL, but I also found that sending the "Content-Type" to the archive for inclusion in the signed URL is unnecessary (for minio, at least). If we can confirm that the same thing happens on S3, that would make this PR superfluous.

@jjnesbitt
Copy link
Copy Markdown
Member

I tested this in the client using this code, with Docker Compose adjusted to point to a local build of this PR. I determined that the client does need to set the "Content-Type" header when uploading to the signed URL, but I also found that sending the "Content-Type" to the archive for inclusion in the signed URL is unnecessary (for minio, at least). If we can confirm that the same thing happens on S3, that would make this PR superfluous.

Regarding testing, Minio and S3 differ in some ways, which is part of the reason there can't really be good tests for these features. This is one of the friction points between our local and prod deployments.

Regarding the feature itself, according to my research, if Content-Type is provided as a param to the pre-signed URL generation, then the client using the pre-signed URL for upload must include that header, to "match" the pre-signed URL (as is the case with other params like ContentMD5). However, if that param is not included in the pre-signed URL params, the client may supply that header themselves, and it will be honored.

I'm unable to test this in S3 myself at the moment, but what I've found would seem to match the results of your test. So if that truly is the case, this is sort of a policy question, more than anything else. Do we want to require Content-Type to be included alongside other params like ContentMD5 at URL generation time (and therefore enforced once the URL is "wielded")? Or should we leave that control up to the client?

To me, I agree it seems entirely superfluous to include that behavior here, since in either case, the responsibility of providing the Content-Type header lies with the client. The only difference is that with this code change, the client would need to additionally supply it at pre-signed URL generation time as well.

@jwodder
Copy link
Copy Markdown
Contributor Author

jwodder commented Feb 13, 2025

I was able to confirm (via a test upload to staging) that setting the Content-Type in the signed URL is unnecessary on S3. Closing this PR as superfluous.

@jwodder jwodder closed this Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

zarr Issues with Zarr hosting/processing/etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend Zarr entry upload procedure to support client-assigned Content-Types

4 participants