Fix URL joining for non-HTTP schemes (s3://, file://, etc.) by jezdez · Pull Request #867 · conda/conda-libmamba-solver

jezdez · 2026-02-05T12:14:17Z

Summary

Python's urllib.parse.urljoin doesn't handle non-HTTP URL schemes properly. When joining an s3:// URL with "." or "", it returns just "." instead of the original URL with a trailing slash. This caused package URLs to be malformed (e.g., .zlib-1.2.11-h7b6447c_3.tar.bz2 instead of the full S3 URL).

This PR adds a _safe_urljoin_with_slash() helper function that:

Uses standard urljoin for HTTP/HTTPS (works correctly)
Handles s3://, file://, ftp://, and other non-HTTP schemes manually
Always returns URLs with trailing slash for proper filename concatenation

Changes

Added _safe_urljoin_with_slash() helper in shards.py
Updated ShardBase.base_url property to use the new helper
Updated _shards_base_url() function to use the new helper
Added parametrized tests covering HTTP, S3, file, and FTP URL schemes

Alternative considered

An alternative fix would be to register the s3 scheme with urllib.parse globally:

from urllib.parse import uses_netloc, uses_relative
uses_netloc.append("s3")
uses_relative.append("s3")

This makes urljoin work correctly for S3 URLs. However, this approach was rejected because:

Global side effects: Modifies urllib.parse behavior for all Python code in the process, not just conda/cls
Behavioral changes: Changes urljoin results significantly (e.g., urljoin("s3://bucket/path", "file.txt") changes from "file.txt" to "s3://bucket/file.txt")
Potential breakage: Other libraries might (incorrectly) depend on the current behavior

The local _safe_urljoin_with_slash() helper is safer as it only affects the specific code paths that need fixing.

Fixes #866

Python's urllib.parse.urljoin doesn't handle non-HTTP URL schemes properly. When joining an s3:// URL with "." or "", it returns just "." instead of the original URL with a trailing slash. This caused package URLs to be malformed (e.g., ".zlib-1.2.11-h7b6447c_3.tar.bz2" instead of the full S3 URL). Add _safe_urljoin_with_slash() helper that: - Uses standard urljoin for HTTP/HTTPS (works correctly) - Handles s3://, file://, ftp://, and other schemes manually - Always returns URLs with trailing slash for proper filename concatenation Fixes #866

dholth · 2026-02-05T15:40:06Z

conda_libmamba_solver/shards.py

 ZSTD_MAX_SHARD_SIZE = 2**20 * 16  # maximum size necessary when compressed data has no size header
+
+# URL schemes that urljoin handles correctly
+_URLJOIN_SAFE_SCHEMES = frozenset(("http", "https", ""))


This isn't urljoin's longer list?

jaimergp · 2026-02-05T23:04:07Z

conda_libmamba_solver/shards.py

+_URLJOIN_SAFE_SCHEMES = frozenset(("http", "https", ""))
+
+
+def _safe_urljoin_with_slash(base_url: str, relative_url: str = "") -> str:


Wouldn't it be simpler to replace the scheme with http if it's something else, and then undo?

jezdez requested a review from a team as a code owner February 5, 2026 12:14

conda-bot added this to 🔎 Review Feb 5, 2026

github-project-automation bot moved this to 🆕 New in 🔎 Review Feb 5, 2026

conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Feb 5, 2026

style: apply ruff formatting

ea95be2

jezdez force-pushed the fix/s3-url-join branch from aaa2e96 to ea95be2 Compare February 5, 2026 12:23

This was referenced Feb 5, 2026

Add S3 download optimization to avoid extra file copy conda/conda#15636

Merged

S3 channel URLs are corrupted due to urljoin not handling s3:// scheme #866

Open

dholth reviewed Feb 5, 2026

View reviewed changes

jaimergp reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix URL joining for non-HTTP schemes (s3://, file://, etc.)#867

Fix URL joining for non-HTTP schemes (s3://, file://, etc.)#867
jezdez wants to merge 2 commits intomainfrom
fix/s3-url-join

jezdez commented Feb 5, 2026 •

edited

Loading

Uh oh!

dholth Feb 5, 2026

Uh oh!

jaimergp Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		_URLJOIN_SAFE_SCHEMES = frozenset(("http", "https", ""))


		def _safe_urljoin_with_slash(base_url: str, relative_url: str = "") -> str:

Conversation

jezdez commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Alternative considered

Uh oh!

dholth Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

jaimergp Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jezdez commented Feb 5, 2026 •

edited

Loading

jaimergp Feb 5, 2026 •

edited

Loading