Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38652

uploadFileUri should preserve file scheme

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.3.0
    • 3.1.3, 3.0.4, 3.3.0, 3.2.2
    • Kubernetes
    • None

    Description

      DepsTestsSuite in k8s IT test is blocked with PathIOException in hadoop-aws-3.3.2. Exception Message is as follow

      Exception in thread "main" org.apache.spark.SparkException: Uploading file /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar failed...        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)        
      at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)        
      at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)        
      at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)        
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)        
      at scala.collection.TraversableLike.map(TraversableLike.scala:286)        
      at scala.collection.TraversableLike.map$(TraversableLike.scala:279)        
      at scala.collection.AbstractTraversable.map(Traversable.scala:108)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)        
      at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)       
      at scala.collection.immutable.List.foreach(List.scala:431)        
      at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)        
      at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)        at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)        
      at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)        
      at scala.collection.immutable.List.foldLeft(List.scala:91)        
      at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)        
      at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
      at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)        
      at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)        
      at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)        
      at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)        
      at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)        
      at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)        
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)        
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: org.apache.spark.SparkException: Error uploading file spark-examples_2.12-3.4.0-SNAPSHOT.jar        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)        
      ... 30 more
      Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path for URI:file:///Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar': Input/output error
      at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:365)        
      at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOperation.java:226)        
      at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.execute(CopyFromLocalOperation.java:170)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$copyFromLocalFile$25(S3AFileSystem.java:3920)        
      at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFromLocalFile(S3AFileSystem.java:3913)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:352)        
      ... 31 more  

      For more information please refer to HADOOP-18173.

       

      But, DepsTestsSuite with hadoop-aws-3.3.1 works normally.

      hengzhen.sq@b-q922md6r-0237 ~/Desktop$ /Users/hengzhen.sq/IdeaProjects/spark/bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkRemoteFileTest --master k8s://https://192.168.64.86:8443/ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.testing=false  --conf spark.hadoop.fs.s3a.access.key=minio --conf spark.kubernetes.driver.label.spark-app-locator=a8937b5fdf6a444a806ee1c3ecac37fc --conf spark.kubernetes.file.upload.path=s3a://spark --conf spark.authenticate=true --conf spark.executor.instances=1 --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.executor.label.spark-app-locator=a8937b5fdf6a444a806ee1c3ecac37fc --conf spark.kubernetes.namespace=spark-job --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.hadoop.fs.s3a.secret.key=miniostorage --conf spark.executor.extraJavaOptions=-Dlog4j2.debug --conf spark.hadoop.fs.s3a.endpoint=192.168.64.86:32681 --conf spark.app.name=spark-test-app --conf spark.files=/tmp/tmp7013228683780235449.txt --conf spark.ui.enabled=true --conf spark.driver.extraJavaOptions=-Dlog4j2.debug --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/smart-spark/spark:test --conf spark.executor.cores=1 --conf spark.jars.packages=org.apache.hadoop:hadoop-aws:3.3.1 --conf spark.hadoop.fs.s3a.connection.ssl.enabled=false /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar tmp7013228683780235449.txt
      22/03/25 10:24:10 WARN Utils: Your hostname, B-Q922MD6R-0237.local resolves to a loopback address: 127.0.0.1; using 30.25.86.17 instead (on interface en0)
      22/03/25 10:24:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
      :: loading settings :: url = jar:file:/Users/hengzhen.sq/IdeaProjects/spark/assembly/target/scala-2.12/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
      Ivy Default Cache set to: /Users/hengzhen.sq/.ivy2/cache
      The jars for the packages stored in: /Users/hengzhen.sq/.ivy2/jars
      org.apache.hadoop#hadoop-aws added as a dependency
      :: resolving dependencies :: org.apache.spark#spark-submit-parent-8220baa6-0490-4484-9779-945d4cf69df4;1.0
      	confs: [default]
      	found org.apache.hadoop#hadoop-aws;3.3.1 in central
      	found com.amazonaws#aws-java-sdk-bundle;1.11.901 in central
      	found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
      :: resolution report :: resolve 224ms :: artifacts dl 6ms
      	:: modules in use:
      	com.amazonaws#aws-java-sdk-bundle;1.11.901 from central in [default]
      	org.apache.hadoop#hadoop-aws;3.3.1 from central in [default]
      	org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
      	---------------------------------------------------------------------
      	|                  |            modules            ||   artifacts   |
      	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
      	---------------------------------------------------------------------
      	|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
      	---------------------------------------------------------------------
      :: retrieving :: org.apache.spark#spark-submit-parent-8220baa6-0490-4484-9779-945d4cf69df4
      	confs: [default]
      	0 artifacts copied, 3 already retrieved (0kB/6ms)
      22/03/25 10:24:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      22/03/25 10:24:11 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
      22/03/25 10:24:12 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
      22/03/25 10:24:12 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
      22/03/25 10:24:12 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
      22/03/25 10:24:12 INFO MetricsSystemImpl: s3a-file-system metrics system started
      22/03/25 10:24:13 INFO KubernetesUtils: Uploading file: /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar to dest: s3a://spark/spark-upload-f49ee7fc-182d-499a-b073-40b298c55e8b/spark-examples_2.12-3.4.0-SNAPSHOT.jar...
      22/03/25 10:24:13 INFO KubernetesUtils: Uploading file: /private/tmp/tmp7013228683780235449.txt to dest: s3a://spark/spark-upload-906c6d35-6aa7-4ee8-8c37-434f547c6087/tmp7013228683780235449.txt...
      22/03/25 10:24:14 INFO ShutdownHookManager: Shutdown hook called
      22/03/25 10:24:14 INFO ShutdownHookManager: Deleting directory /private/var/folders/3t/v_td68551s78mq4c1cpk86gc0000gn/T/spark-e89a395e-c3b4-4619-8c9b-e60310af6503
      22/03/25 10:24:14 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system...
      22/03/25 10:24:14 INFO MetricsSystemImpl: s3a-file-system metrics system stopped.
      22/03/25 10:24:14 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete.

      CopyFromLocalFile action in hadoop-aws is different between 3.3.1 and 3.3.2, because 3.3.2 introuduces CopyFromLocalFileOperation.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dongjoon Dongjoon Hyun
            dcoliversun Qian Sun
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment