Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21214

MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      For a given task, if there is more than one attempt then deduplication logic kicks in.

      Utilities.compareTempOrDuplicateFiles()

      The logic uses file size and picks the one with largest size. This logic is very fragile.

      ideally, it should pick the successful attempt's file.

      However, a simpler solution is to pick the newest attempt and also checking the file size for the newest attempt is the largest.

      If not, throw an exception.

       

      cc gopalv thejas jdere ekoifman

      Attachments

        1. HIVE-21214.1.patch
          5 kB
          Deepak Jaiswal
        2. HIVE-21214.2.patch
          5 kB
          Deepak Jaiswal
        3. HIVE-21214.3.patch
          5 kB
          Deepak Jaiswal

        Issue Links

          Activity

            People

              djaiswal Deepak Jaiswal
              djaiswal Deepak Jaiswal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: