Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14269 Performance optimizations for data on S3
  3. HIVE-14271

FileSinkOperator should not rename files to final paths when S3 is the default destination

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      FileSinkOperator does a rename of outPaths -> finalPaths when it finished writing all rows to a temporary path. The problem is that S3 does not support renaming.

      Two options can be considered:

      a. Use a copy operation instead. After FileSinkOperator writes all rows to outPaths, then the commit method will do a copy() call instead of move().

      b. Write row by row directly to the S3 path (see HIVE-1620). This may add better performance calls, but we should take care of the cleanup part in case of writing errors.

        Attachments

          Activity

            People

            • Assignee:
              spena Sergio Peña
              Reporter:
              spena Sergio Peña
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated: