Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24936

Fix file name parsing and copy file move.

    XMLWordPrintableJSON

Details

    Description

      The taskId and taskAttemptId is not extracted correctly for copy files (00001_02_copy_3) and when doing a move file of an incompatible copy file the rename utility generates wrong file names. Ex: 00001_02_copy_3 is renamed to 00001_02_copy_3_1 if 00001_02_copy_3 already exists, ideally it should be 00001_02_copy_N.

       

      Incompatible files should be always renamed using the current task or it can get deleted if the file name conflicts with another task output file. Ex: if the input file name for a task is 00005_01 and is incompatible then if we move this file, it will be treated as an output file for task id 5, attempt 1 which if exists will try to generate the same file and fail and another attempt will be made. There will be 2 files 00005_01, 00005_02, the deduping code will remove 00005_01 resulting in data loss. There are other scenarios where the same can happen.

      Attachments

        Issue Links

          Activity

            People

              harishjp Harish JP
              harishjp Harish JP
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m