[SPARK-33425] Credentials are not passed in the `doFetchFile` when running spark-submit with https url - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: In Progress
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.0.1
Fix Version/s: None
Component/s: Input/Output
Labels:
None

Description

I'm running spark-submit https url containing username and password. It's said in the documentation - https://spark.a pache.org/docs/latest/submitting-applications.html

(Note that credentials for password-protected repositories can be supplied in some cases in the repository URI, such as in https://user:password@host/.... Be careful when supplying credentials this way.)

However, when using that, I receive the following error:


INFO - 20/11/11 12:59:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
INFO - Exception in thread "main" java.io.IOException: Server returned HTTP response code: 401 for URL: https://username:*****@host.com/my_app/pipeline.jar
INFO - at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1924)
INFO - at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
INFO - at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
INFO - at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:729)
INFO - at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:138)
INFO - at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
INFO - at scala.Option.map(Option.scala:230)
INFO - at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
INFO - at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
INFO - at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
INFO - at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
INFO - at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
INFO - at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

When downloading my file manually using wget, at first I receive a 401 error but then there's a retry with credentials:


HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="Restricted"
Reusing existing connection to host.com:443.
HTTP request sent, awaiting response... 200 OK

When I do use ` --auth-no-challenge` in wget the credentials are passed directly in the first request and I receive 200 OK. The problem with the first wget is that, it tries to download a file without passing credentials and after 401 it's challenged to pass credentials so it goes in two steps. That is similar to my issue where credentials are not passed in the first query.

Attachments

Issue Links

links to

[Github] Pull Request #30337 (pprzetacznik)

Activity

People

Assignee:: Unassigned

Reporter:: Piotr Przetacznik

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Nov/20 15:28

Updated:: 11/Nov/20 15:38