Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14269 Performance optimizations for data on S3
  3. HIVE-14271

FileSinkOperator should not rename files to final paths when S3 is the default destination

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      FileSinkOperator does a rename of outPaths -> finalPaths when it finished writing all rows to a temporary path. The problem is that S3 does not support renaming.

      Two options can be considered:

      a. Use a copy operation instead. After FileSinkOperator writes all rows to outPaths, then the commit method will do a copy() call instead of move().

      b. Write row by row directly to the S3 path (see HIVE-1620). This may add better performance calls, but we should take care of the cleanup part in case of writing errors.

      Attachments

        Activity

          People

            spena Sergio Peña
            spena Sergio Peña
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: