Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-632

Don't use a conventional output committer when writing raster files using df.write.format("raster")

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.1

    Description

      Writing large amounts of raster files to distributed file systems or object store is super slow, because the output committer has to move files from temporary locations to their target locations. Users will see all the tasks are completed but the driver is stuck at the committing phase.

      We'll add an option useDirectCommitter to the raster format. By default useDirectCommitter is true, and the raster format will use a direct committer that writes raster files to their target locations directly. Users can manually set it to false if they want the original behavior:

      df.write.format("raster").option("useDirectCommitter", "false").save("/target/location")
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kontinuation Kristin Cowalcijk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m