Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.0
-
None
Description
This is a bug reported by a user on Discord. Writing the data as raster back to s3 on EMR raises the following error.
Error:
Caused by: java.lang.IllegalArgumentException: Pathname s3/...../9db15e93-3831-4066-ba1b-1f3bc364dc98.tiff is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName
Code snippet:
(ndvi_geotiff.write.format("raster").option("rasterField", "geotiff").option("fileExtension", ".tiff").mode("overwrite") .save("s3://varun-poc-emr-bootstrap/raster/"))
I tried to reproduce this problem on emr-7.1.0, the write failed with the following exception thrown on executor:
org.apache.hadoop.fs.staging.StagingDirectoryNotFoundException: Staging directory not found under path s3://bucket-name/tmp/write_geotiff with stage name "0_attempt_202406201300423084972650467585554_0009_m_000001_13" at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.newDirectoryNotFoundException(InMemoryStagingMetadataStore.java:225) at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.getDirectoryOrFail(InMemoryStagingMetadataStore.java:184) at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.createFile(InMemoryStagingMetadataStore.java:114) at com.amazon.ws.emr.hadoop.fs.s3.upload.plan.StagingUploadPlanner.plan(StagingUploadPlanner.java:61) at com.amazon.ws.emr.hadoop.fs.s3.upload.plan.UploadPlannerChain.plan(UploadPlannerChain.java:37) at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:351) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1240) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1217) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1085) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:202) at org.apache.spark.sql.sedona_sql.io.raster.RasterFileWriter.write(RasterFileFormat.scala:112) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1409) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:143) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840)