[SEDONA-611] Cannot write rasters to S3 on EMR - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 1.6.1
Labels:
None

Description

This is a bug reported by a user on Discord. Writing the data as raster back to s3 on EMR raises the following error.

Error:

Caused by: java.lang.IllegalArgumentException: Pathname s3/...../9db15e93-3831-4066-ba1b-1f3bc364dc98.tiff is not a valid DFS filename.
    at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName

Code snippet:

(ndvi_geotiff.write.format("raster").option("rasterField", "geotiff").option("fileExtension", ".tiff").mode("overwrite")
 .save("s3://varun-poc-emr-bootstrap/raster/"))

I tried to reproduce this problem on emr-7.1.0, the write failed with the following exception thrown on executor:

org.apache.hadoop.fs.staging.StagingDirectoryNotFoundException: Staging directory not found under path s3://bucket-name/tmp/write_geotiff with stage name "0_attempt_202406201300423084972650467585554_0009_m_000001_13"
	at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.newDirectoryNotFoundException(InMemoryStagingMetadataStore.java:225)
	at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.getDirectoryOrFail(InMemoryStagingMetadataStore.java:184)
	at com.amazon.ws.emr.hadoop.fs.staging.metadata.inmemory.InMemoryStagingMetadataStore.createFile(InMemoryStagingMetadataStore.java:114)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.plan.StagingUploadPlanner.plan(StagingUploadPlanner.java:61)
	at com.amazon.ws.emr.hadoop.fs.s3.upload.plan.UploadPlannerChain.plan(UploadPlannerChain.java:37)
	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:351)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1240)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1217)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1085)
	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:202)
	at org.apache.spark.sql.sedona_sql.io.raster.RasterFileWriter.write(RasterFileFormat.scala:112)
	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1409)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411)
	at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:143)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Kristin Cowalcijk

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Jun/24 14:02

Updated:: 19/Aug/24 00:19

Resolved:: 19/Aug/24 00:19