Skip to content

[FEATURE] Native SDK support for downloading MLflow artifacts of Unity Catalog model versions #1314

@asanglard

Description

@asanglard

Problem Statement

There is no method in the Databricks Python SDK to download the artifact files of a Unity Catalog registered model version. The ModelVersionsAPI (w.model_versions) provides no download capability, and the ExperimentsAPI (w.experiments) only exposes list_artifacts() for enumeration — not download.

The only working approach today requires calling an undocumented internal endpoint directly to obtain presigned cloud URLs (Azure SAS / AWS presigned / GCP signed), and then downloading from those URLs manually:

# Currently required workaround — undocumented endpoint:
raw = client.api_client.do(
    "POST",
    "/api/2.0/mlflow/artifacts/credentials-for-read",
    body={"run_id": run_id, "path": ["model/model.pkl"]},
)
signed_uri = raw["credential_infos"][0]["signed_uri"]
# then manually download from the presigned URL

This endpoint (POST /api/2.0/mlflow/artifacts/credentials-for-read) is the same one the mlflow Python package uses internally via DatabricksMlflowArtifactsService.GetCredentialsForRead (defined in databricks_artifacts.proto), but it is not surfaced anywhere in the Databricks SDK.

Proposed Solution

Add a download_artifacts() method (or equivalent) to ExperimentsAPI or ModelVersionsAPI that wraps the presigned URL credential fetch and file download:

# Desired SDK interface — option A (on ExperimentsAPI):
w.experiments.download_artifacts(run_id=run_id, path="model/", dst_path="/local/dir")

# Desired SDK interface — option B (on ModelVersionsAPI):
w.model_versions.download_artifacts(
    full_name="catalog.schema.model", version=1, dst_path="/local/dir"
)

Additional Context

  • SDK version: 0.97.0
  • The mlflow Python package implements this via MlflowClient.download_artifacts(), but using it in a multi-tenant application requires mutating process-global environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN) protected by a threading lock — which blocks the entire process under async concurrency. A native SDK implementation would be safe for concurrent multi-tenant use since each WorkspaceClient instance is independent.
  • The underlying mechanism fetches a temporary presigned cloud URL per artifact path, then downloads directly from cloud storage (no Databricks credentials needed for the download itself). This is already how mlflow works under the hood.
  • Related undocumented endpoint: POST /api/2.0/mlflow/artifacts/credentials-for-read (protobuf: DatabricksMlflowArtifactsService.GetCredentialsForRead in databricks_artifacts.proto).
  • Related issue: [FEATURE] Native SDK support for reading and writing MLflow tags on Unity Catalog model versions #1313 (UC model version tags)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions