Fix URL joining for non-HTTP schemes (s3://, file://, etc.)#867
Open
Fix URL joining for non-HTTP schemes (s3://, file://, etc.)#867
Conversation
Python's urllib.parse.urljoin doesn't handle non-HTTP URL schemes properly. When joining an s3:// URL with "." or "", it returns just "." instead of the original URL with a trailing slash. This caused package URLs to be malformed (e.g., ".zlib-1.2.11-h7b6447c_3.tar.bz2" instead of the full S3 URL). Add _safe_urljoin_with_slash() helper that: - Uses standard urljoin for HTTP/HTTPS (works correctly) - Handles s3://, file://, ftp://, and other schemes manually - Always returns URLs with trailing slash for proper filename concatenation Fixes #866
aaa2e96 to
ea95be2
Compare
This was referenced Feb 5, 2026
dholth
reviewed
Feb 5, 2026
| ZSTD_MAX_SHARD_SIZE = 2**20 * 16 # maximum size necessary when compressed data has no size header | ||
|
|
||
| # URL schemes that urljoin handles correctly | ||
| _URLJOIN_SAFE_SCHEMES = frozenset(("http", "https", "")) |
Contributor
There was a problem hiding this comment.
This isn't urljoin's longer list?
jaimergp
reviewed
Feb 5, 2026
| _URLJOIN_SAFE_SCHEMES = frozenset(("http", "https", "")) | ||
|
|
||
|
|
||
| def _safe_urljoin_with_slash(base_url: str, relative_url: str = "") -> str: |
Contributor
There was a problem hiding this comment.
Wouldn't it be simpler to replace the scheme with http if it's something else, and then undo?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Python's
urllib.parse.urljoindoesn't handle non-HTTP URL schemes properly. When joining ans3://URL with"."or"", it returns just"."instead of the original URL with a trailing slash. This caused package URLs to be malformed (e.g.,.zlib-1.2.11-h7b6447c_3.tar.bz2instead of the full S3 URL).This PR adds a
_safe_urljoin_with_slash()helper function that:urljoinfor HTTP/HTTPS (works correctly)s3://,file://,ftp://, and other non-HTTP schemes manuallyChanges
_safe_urljoin_with_slash()helper inshards.pyShardBase.base_urlproperty to use the new helper_shards_base_url()function to use the new helperAlternative considered
An alternative fix would be to register the
s3scheme withurllib.parseglobally:This makes
urljoinwork correctly for S3 URLs. However, this approach was rejected because:urllib.parsebehavior for all Python code in the process, not just conda/clsurljoinresults significantly (e.g.,urljoin("s3://bucket/path", "file.txt")changes from"file.txt"to"s3://bucket/file.txt")The local
_safe_urljoin_with_slash()helper is safer as it only affects the specific code paths that need fixing.Fixes #866