Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18173

S3a copyFromLocalOperation doesn't support single file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.2
    • None
    • fs/s3
    • None
    • Hadoop version 3.3.2

      Spark version 3.4.0-SNAPSHOT

      use minio:latest to mock S3 filesystem

       

    Description

      Spark job uses aws s3 as fileSystem and calls 

      fs.copyFromLocalFile(delSrc, overwrite, src, dest) 
      
      delSrc = false
      overwrite = true
      src = "/Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar"
      dest = "s3a://spark/spark-upload-a703d8e7-8dd2-4e29-beca-b4df2fedefbd/spark-examples_2.12-3.4.0-SNAPSHOT.jar"

      Then throw a PathIOException, message is as follow

      Exception in thread "main" org.apache.spark.SparkException: Uploading file /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar failed...        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)        
      at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)        
      at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)        
      at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)        
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)        
      at scala.collection.TraversableLike.map(TraversableLike.scala:286)        
      at scala.collection.TraversableLike.map$(TraversableLike.scala:279)        
      at scala.collection.AbstractTraversable.map(Traversable.scala:108)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)        
      at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)       
      at scala.collection.immutable.List.foreach(List.scala:431)        
      at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
      at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)        at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)        
      at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)        
      at scala.collection.immutable.List.foldLeft(List.scala:91)        
      at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)        
      at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242) 
      at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)        
      at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)        
      at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)        
      at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)        
      at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)        
      at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)        
      at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)        
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)        
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: org.apache.spark.SparkException: Error uploading file spark-examples_2.12-3.4.0-SNAPSHOT.jar        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)        
      ... 30 more
      Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path for URI:file:///Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar': Input/output error 
      at apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:365)        
      at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOperation.java:226)        
      at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.execute(CopyFromLocalOperation.java:170)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$copyFromLocalFile$25(S3AFileSystem.java:3920)        
      at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)        
      at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFromLocalFile(S3AFileSystem.java:3913)        
      at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:352)        
      ... 31 more 

      I add some logs

      22/03/25 09:33:24 INFO KubernetesUtils: Uploading file: /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar to dest: s3a://spark/spark-upload-a703d8e7-8dd2-4e29-beca-b4df2fedefbd/spark-examples_2.12-3.4.0-SNAPSHOT.jar...
      22/03/25 09:33:24 INFO S3AFileSystem: Copying local file from /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar to s3a://spark/spark-upload-a703d8e7-8dd2-4e29-beca-b4df2fedefbd/spark-examples_2.12-3.4.0-SNAPSHOT.jar
      22/03/25 09:33:24 INFO CopyFromLocalOperation: Copying local file from /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar to s3a://spark/spark-upload-a703d8e7-8dd2-4e29-beca-b4df2fedefbd/spark-examples_2.12-3.4.0-SNAPSHOT.jar
      22/03/25 09:33:24 INFO CopyFromLocalOperation: execute#CopyFromLocalOperation, sourceFile is /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
      22/03/25 09:33:24 INFO CopyFromLocalOperation: uploadSourceFromFS#CopyFromLocalOperation, localFile 1: path is LocatedFileStatus{path=file:/Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar; isDirectory=false; length=1567474; replication=1; blocksize=33554432; modification_time=1647874074000; access_time=1647874074000; owner=hengzhen.sq; group=staff; permission=rw-r--r--; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false}
      22/03/25 09:33:24 INFO CopyFromLocalOperation: getFinalPath#CopyFromLocalOperation, src is file:/Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar, source is /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 

      It looks like copyFromLocalOperation doesn't support single file.

      Attachments

        1. minio.yaml
          1 kB
          Qian Sun

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dcoliversun Qian Sun
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: