Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20725

Simultaneous dynamic inserts can result in partition files lost

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      If two users attempt a dynamic insert into the same new partition at the same time, a possible race condition exists which result in error state. In that case the partition info has been inserted to metastore but data files been removed.

      The current logic in function "add_partition_core" in class HiveMetaStore.HMSHandler is like this :

      1. check if partition already exists
      2. create the partition files directory if not exists
      3. try to add partition
      4. if add partition failed and it created the directory in step 2, delete that directory

      Assume that two users are inserting the same partition at the same time, there are two threads operating their requests, say thread A and thread B. If 1~4 steps of thread B are all done between step 2 and step 3 of thread A. The sequence like this : A1 A2 B1 B2 B3 B4 A3 A4. The partition files written by B will be removed by A.

       

      Attachments

        1. HIVE-20725.1.patch
          1 kB
          zhuwei

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            qunyan zhuwei Assign to me
            qunyan zhuwei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment