Details
-
Sub-task
-
Status: Reopened
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
FileSinkOperator does a rename of outPaths -> finalPaths when it finished writing all rows to a temporary path. The problem is that S3 does not support renaming.
Two options can be considered:
a. Use a copy operation instead. After FileSinkOperator writes all rows to outPaths, then the commit method will do a copy() call instead of move().
b. Write row by row directly to the S3 path (see HIVE-1620). This may add better performance calls, but we should take care of the cleanup part in case of writing errors.