Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46860

Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.3, 3.5.0, 3.3.4
    • None
    • k8s
    • None
    • Spark 3.3.3 deployed on K8s 

    Description

      We are trying to run the spark application by pointing the dependent files as well the main pyspark script from secure webserver

      We are looking for solution to pass the dependencies as well as pysaprk script from webserver.

      we have tried deploying the spark application from webserver to k8s cluster without username and password and it worked, but when tried with username/password we are facing "Exception in thread "main" java.io.IOException: Server returned HTTP response code: 401 for URL: https://username:password@domain.com/application/pysparkjob.py"

      Working  options on spark-submit:
      spark-submit ......

      --repositories https://username:password@domain.com/repo1/repo

      --jars https://domain.com/jars/runtime.jar \

      --files https://domain.com/files/query.sql \

      --py-files https://domain.com/pythonlib/pythonlib.zip \

      https://domain.com/app1/pysparkapp.py

      Note: only repositories option works with username and password

      Spark-submit using https url with username/password not working:

      spark-submit ......

      --jars https://username:password@domain.com/jars/runtime.jar \

      --files https://username:password@domain.com/files/query.sql \

      --py-files https://username:password@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip] \

      https://username:password@domain.com/app1/pysparkapp.py

       

      Error :

      25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      Exception in thread "main" java.io.IOException: Server returned HTTP response code: 401 for URL: https://username:password@domain.com/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz
              at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000)
              at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
              at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
              at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809)
              at org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264)
              at org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233)
              at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
              at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
              at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
              at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
              at scala.collection.TraversableLike.map(TraversableLike.scala:286)
              at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
              at scala.collection.AbstractTraversable.map(Traversable.scala:108)

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            jvikram253 Vikram Janarthanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: