Details
-
Bug
-
Status: In Progress
-
Minor
-
Resolution: Unresolved
-
3.0.1
-
None
-
None
Description
I'm running spark-submit https url containing username and password. It's said in the documentation - https://spark.a pache.org/docs/latest/submitting-applications.html
(Note that credentials for password-protected repositories can be supplied in some cases in the repository URI, such as in https://user:password@host/.... Be careful when supplying credentials this way.)
However, when using that, I receive the following error:
INFO - 20/11/11 12:59:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO - Exception in thread "main" java.io.IOException: Server returned HTTP response code: 401 for URL: https://username:*****@host.com/my_app/pipeline.jar INFO - at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1924) INFO - at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520) INFO - at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250) INFO - at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:729) INFO - at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:138) INFO - at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376) INFO - at scala.Option.map(Option.scala:230) INFO - at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376) INFO - at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) INFO - at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) INFO - at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) INFO - at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) INFO - at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
When downloading my file manually using wget, at first I receive a 401 error but then there's a retry with credentials:
HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="Restricted"
Reusing existing connection to host.com:443.
HTTP request sent, awaiting response... 200 OK
When I do use ` --auth-no-challenge` in wget the credentials are passed directly in the first request and I receive 200 OK. The problem with the first wget is that, it tries to download a file without passing credentials and after 401 it's challenged to pass credentials so it goes in two steps. That is similar to my issue where credentials are not passed in the first query.