Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Writing large amounts of raster files to distributed file systems or object store is super slow, because the output committer has to move files from temporary locations to their target locations. Users will see all the tasks are completed but the driver is stuck at the committing phase.
We'll add an option useDirectCommitter to the raster format. By default useDirectCommitter is true, and the raster format will use a direct committer that writes raster files to their target locations directly. Users can manually set it to false if they want the original behavior:
df.write.format("raster").option("useDirectCommitter", "false").save("/target/location")
Attachments
Issue Links
- links to