Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26472

Concurrent UPDATEs can cause duplicate rows

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 4.0.0-alpha-1
    • None
    • HiveServer2

    Description

      Concurrent UPDATEs to the same table can cause duplicate rows when the following occurs:
      Two UPDATEs get assigned txnIds and writeIds like this:
      UPDATE #1 = txnId: 100 writeId: 50 <--- commits first
      UPDATE #2 = txnId: 101 writeId: 49

      To replicate the issue:
      I applied the attach debug.diff patch which adds hive.lock.sleep.writeid (which controls the amount to sleep before acquiring a writeId) and hive.lock.sleep.post.writeid (which controls the amount to sleep after acquiring a writeId).

      CREATE TABLE test_update(i int) STORED AS ORC TBLPROPERTIES('transactional'="true");
      INSERT INTO test_update VALUES (1);
      
      Start two beeline connections.
      In connection #1 - run:
      set hive.driver.parallel.compilation = true;
      set hive.lock.sleep.writeid=5s;
      update test_update set i = 1 where i = 1;
      
      Wait one second and in connection #2 - run:
      set hive.driver.parallel.compilation = true;
      set hive.lock.sleep.post.writeid=10s;
      update test_update set i = 1 where i = 1;
      
      After both updates complete - it is likely that test_update contains two rows now.
      

      HIVE-24211 seems to address the case when:
      UPDATE #1 = txnId: 100 writeId: 50
      UPDATE #2 = txnId: 101 writeId: 49 <--- commits first (I think this causes UPDATE #1 to detect the snapshot is out of date because commitedTxn > UPDATE #1s txnId)

      A possible work around is to set hive.driver.parallel.compilation = false, but this would only help in cases there is only one HS2 instance.

      Attachments

        1. debug.diff
          3 kB
          John Sherman

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jfs John Sherman Assign to me
            jfs John Sherman
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 2h
              2h

              Slack

                Issue deployment