Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-611

Cannot write rasters to S3 on EMR

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.1
    • None

    Description

      This is a bug reported by a user on Discord. Writing the data as raster back to s3 on EMR raises the following error.

      Error:

      Caused by: java.lang.IllegalArgumentException: Pathname s3/...../9db15e93-3831-4066-ba1b-1f3bc364dc98.tiff is not a valid DFS filename.
          at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName
      

      Code snippet:

      (ndvi_geotiff.write.format("raster").option("rasterField", "geotiff").option("fileExtension", ".tiff").mode("overwrite")
       .save("s3://varun-poc-emr-bootstrap/raster/"))
      

      I tried to reproduce this problem on emr-7.1.0, the write failed with the following exception thrown on executor:

      org.apache.hadoop.fs.staging.StagingDirectoryNotFoundException: Staging directory not found under path s3://bucket-name/tmp/write_geotiff with stage name "0_attempt_202406201300423084972650467585554_0009_m_000001_13"
      	at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.newDirectoryNotFoundException(InMemoryStagingMetadataStore.java:225)
      	at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.getDirectoryOrFail(InMemoryStagingMetadataStore.java:184)
      	at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.createFile(InMemoryStagingMetadataStore.java:114)
      	at com.amazon.ws.emr.hadoop.fs.s3.upload.plan.StagingUploadPlanner.plan(StagingUploadPlanner.java:61)
      	at com.amazon.ws.emr.hadoop.fs.s3.upload.plan.UploadPlannerChain.plan(UploadPlannerChain.java:37)
      	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:351)
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1240)
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1217)
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
      	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1085)
      	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:202)
      	at org.apache.spark.sql.sedona_sql.io.raster.RasterFileWriter.write(RasterFileFormat.scala:112)
      	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
      	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
      	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
      	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404)
      	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1409)
      	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411)
      	at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
      	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
      	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
      	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
      	at org.apache.spark.scheduler.Task.run(Task.scala:143)
      	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
      	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
      	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
      	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	at java.base/java.lang.Thread.run(Thread.java:840)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            kontinuation Kristin Cowalcijk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: