Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17113

Duplicate bucket files can get written to table by runaway task

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Query Processor
    • None

    Description

      Saw a table get a duplicate bucket file from a Hive query. It looks like the following happened:

      1. Task attempt A_0 starts,but then stops making progress
      2. The job was running with speculative execution on, and task attempt A_1 is started
      3. Task attempt A_1 finishes execution and saves its output to the temp directory.
      5. A task kill is sent to A_0, though this does appear to actually kill A_0
      6. The job for the query finishes and Utilities.mvFileToFinalPath() calls Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
      7. A_0 (still running) finally finishes and saves its file to the temp directory. At this point we now have duplicate bucket files - oops!
      8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the final location, where it is later moved to the partition directory.

      Attachments

        1. HIVE-17113.3.patch
          9 kB
          Jason Dere
        2. HIVE-17113.2.patch
          8 kB
          Jason Dere
        3. HIVE-17113.1.patch
          2 kB
          Jason Dere

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jdere Jason Dere Assign to me
            jdere Jason Dere
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment