Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
The taskId and taskAttemptId is not extracted correctly for copy files (00001_02_copy_3) and when doing a move file of an incompatible copy file the rename utility generates wrong file names. Ex: 00001_02_copy_3 is renamed to 00001_02_copy_3_1 if 00001_02_copy_3 already exists, ideally it should be 00001_02_copy_N.
Incompatible files should be always renamed using the current task or it can get deleted if the file name conflicts with another task output file. Ex: if the input file name for a task is 00005_01 and is incompatible then if we move this file, it will be treated as an output file for task id 5, attempt 1 which if exists will try to generate the same file and fail and another attempt will be made. There will be 2 files 00005_01, 00005_02, the deduping code will remove 00005_01 resulting in data loss. There are other scenarios where the same can happen.
Attachments
Issue Links
- fixes
-
HIVE-25130 alter table concat gives NullPointerException, when data is inserted from Spark
- Resolved
- links to