-
Notifications
You must be signed in to change notification settings - Fork 191
Description
Problem Statement
There is no method in the Databricks Python SDK to download the artifact files of a Unity Catalog registered model version. The ModelVersionsAPI (w.model_versions) provides no download capability, and the ExperimentsAPI (w.experiments) only exposes list_artifacts() for enumeration — not download.
The only working approach today requires calling an undocumented internal endpoint directly to obtain presigned cloud URLs (Azure SAS / AWS presigned / GCP signed), and then downloading from those URLs manually:
# Currently required workaround — undocumented endpoint:
raw = client.api_client.do(
"POST",
"/api/2.0/mlflow/artifacts/credentials-for-read",
body={"run_id": run_id, "path": ["model/model.pkl"]},
)
signed_uri = raw["credential_infos"][0]["signed_uri"]
# then manually download from the presigned URLThis endpoint (POST /api/2.0/mlflow/artifacts/credentials-for-read) is the same one the mlflow Python package uses internally via DatabricksMlflowArtifactsService.GetCredentialsForRead (defined in databricks_artifacts.proto), but it is not surfaced anywhere in the Databricks SDK.
Proposed Solution
Add a download_artifacts() method (or equivalent) to ExperimentsAPI or ModelVersionsAPI that wraps the presigned URL credential fetch and file download:
# Desired SDK interface — option A (on ExperimentsAPI):
w.experiments.download_artifacts(run_id=run_id, path="model/", dst_path="/local/dir")
# Desired SDK interface — option B (on ModelVersionsAPI):
w.model_versions.download_artifacts(
full_name="catalog.schema.model", version=1, dst_path="/local/dir"
)Additional Context
- SDK version: 0.97.0
- The
mlflowPython package implements this viaMlflowClient.download_artifacts(), but using it in a multi-tenant application requires mutating process-global environment variables (DATABRICKS_HOST,DATABRICKS_TOKEN) protected by a threading lock — which blocks the entire process under async concurrency. A native SDK implementation would be safe for concurrent multi-tenant use since eachWorkspaceClientinstance is independent. - The underlying mechanism fetches a temporary presigned cloud URL per artifact path, then downloads directly from cloud storage (no Databricks credentials needed for the download itself). This is already how
mlflowworks under the hood. - Related undocumented endpoint:
POST /api/2.0/mlflow/artifacts/credentials-for-read(protobuf:DatabricksMlflowArtifactsService.GetCredentialsForReadindatabricks_artifacts.proto). - Related issue: [FEATURE] Native SDK support for reading and writing MLflow tags on Unity Catalog model versions #1313 (UC model version tags)