Skip to content

Spark Connector doesn't take Proxy configuration for OIDC authentication #936

@trivedirishabh

Description

@trivedirishabh

We are testing open sharing where the provider is using Databricks and as a consumer, we have setup open source Spark on our GCP Compute engine. The requirement is to connect to Delta Sharing server over proxy for authentication and accessing files privately with no_proxy configuration.

This works fine with delta sharing python client but doesn't work with Spark Connector.

Command :

spark-submit --packages io.delta:delta-sharing-spark_2.13:4.2.0 --conf "spark.driver.extraJavaOptions=-Dhttps.proxyHost= -Dhttps.proxyPort=8080 -Dhttp.proxyHost= -Dhttp.proxyPort=8080 -Djava.net.useSystemProxies=true" --conf "spark.executor.extraJavaOptions=-Dhttps.proxyHost= -Dhttps.proxyPort=8080 -Dhttp.proxyHost= -Dhttp.proxyPort=8080 -Djava.net.useSystemProxies=true" --conf "spark.delta.sharing.network.proxyHost=" --conf "spark.delta.sharing.network.proxyPort=8080" delta_test.py

delta_test.py
import delta_sharing
from pyspark.sql import SparkSession

spark = SparkSession.builder
.appName("DeltaSharing")
.config("spark.delta.sharing.network.proxyHost", "")
.config("spark.delta.sharing.network.proxyPort", "8080")
.config("spark.delta.sharing.network.sslTrustAll", "true")
.config("spark.jars.packages", "io.delta:delta-sharing-spark_2.13:4.2.0")
.config("spark.jars.repositories", "https://repo1/maven.org/maven2")
.getOrCreate()

spark.sparkContext.setLogLevel("DEBUG")

profile_file = "/root/oauth_config_cs.share"

Create a SharingClient.

client = delta_sharing.SharingClient(profile_file)

List all shared tables.

tables = client.list_all_tables() ###### THIS WORKS ######

print(tables)

table_url = profile_file + "#<share.catalog.schema.table">

df = spark.read.format("deltaSharing").load(table_url)

df.show(10)

spark.stop()

Logs:

26/04/23 12:25:35 INFO DeltaSharingRestClient: DeltaSharingRestClient with endStreamActionEnabled: false, enableAsyncQuery:false, skipFileIdHashVerification:false
26/04/23 12:25:35 DEBUG FsUrlStreamHandlerFactory: Creating handler for protocol http
26/04/23 12:25:35 DEBUG FsUrlStreamHandlerFactory: Unknown protocol http, delegating to default implementation
26/04/23 12:25:35 DEBUG FsUrlStreamHandlerFactory: Creating handler for protocol https
26/04/23 12:25:35 DEBUG FsUrlStreamHandlerFactory: Unknown protocol https, delegating to default implementation
26/04/23 12:25:35 DEBUG RequestAddCookies: CookieSpec selected: default
26/04/23 12:25:35 DEBUG RequestAuthCache: Auth cache not set in the context
26/04/23 12:25:35 DEBUG PoolingHttpClientConnectionManager: Connection request: [route: {s}->https://login.microsoftonline.com:443][total available: 0; route allocated: 0 of 2; total allocated: 0 of 20]
26/04/23 12:25:35 DEBUG PoolingHttpClientConnectionManager: Connection leased: [id: 0][route: {s}->https://login.microsoftonline.com:443][total available: 0; route allocated: 1 of 2; total allocated: 1 of 20]
26/04/23 12:25:35 DEBUG MainClientExec: Opening connection {s}->https://login.microsoftonline.com:443
26/04/23 12:25:35 DEBUG DefaultHttpClientConnectionOperator: Connecting to login.microsoftonline.com/20.190.159.130:443
26/04/23 12:25:35 DEBUG SSLConnectionSocketFactory: Connecting socket to login.microsoftonline.com/20.190.159.130:443 with timeout 320000
26/04/23 12:27:46 DEBUG DefaultHttpClientConnectionOperator: Connect to login.microsoftonline.com/20.190.159.130:443 timed out. Connection will be retried using another IP address
26/04/23 12:27:46 DEBUG DefaultHttpClientConnectionOperator: Connecting to login.microsoftonline.com/20.190.159.2:443
26/04/23 12:27:46 DEBUG SSLConnectionSocketFactory: Connecting socket to login.microsoftonline.com/20.190.159.2:443 with timeout 320000
26/04/23 12:29:57 DEBUG DefaultHttpClientConnectionOperator: Connect to login.microsoftonline.com/20.190.159.2:443 timed out. Connection will be retried using another IP address
26/04/23 12:29:57 DEBUG DefaultHttpClientConnectionOperator: Connecting to login.microsoftonline.com/40.126.31.2:443
26/04/23 12:29:57 DEBUG SSLConnectionSocketFactory: Connecting socket to login.microsoftonline.com/40.126.31.2:443 with timeout 320000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions