Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26472

Concurrent UPDATEs can cause duplicate rows

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 4.0.0-alpha-1
    • None
    • HiveServer2

    Description

      Concurrent UPDATEs to the same table can cause duplicate rows when the following occurs:
      Two UPDATEs get assigned txnIds and writeIds like this:
      UPDATE #1 = txnId: 100 writeId: 50 <--- commits first
      UPDATE #2 = txnId: 101 writeId: 49

      To replicate the issue:
      I applied the attach debug.diff patch which adds hive.lock.sleep.writeid (which controls the amount to sleep before acquiring a writeId) and hive.lock.sleep.post.writeid (which controls the amount to sleep after acquiring a writeId).

      CREATE TABLE test_update(i int) STORED AS ORC TBLPROPERTIES('transactional'="true");
      INSERT INTO test_update VALUES (1);
      
      Start two beeline connections.
      In connection #1 - run:
      set hive.driver.parallel.compilation = true;
      set hive.lock.sleep.writeid=5s;
      update test_update set i = 1 where i = 1;
      
      Wait one second and in connection #2 - run:
      set hive.driver.parallel.compilation = true;
      set hive.lock.sleep.post.writeid=10s;
      update test_update set i = 1 where i = 1;
      
      After both updates complete - it is likely that test_update contains two rows now.
      

      HIVE-24211 seems to address the case when:
      UPDATE #1 = txnId: 100 writeId: 50
      UPDATE #2 = txnId: 101 writeId: 49 <--- commits first (I think this causes UPDATE #1 to detect the snapshot is out of date because commitedTxn > UPDATE #1s txnId)

      A possible work around is to set hive.driver.parallel.compilation = false, but this would only help in cases there is only one HS2 instance.

      Attachments

        1. debug.diff
          3 kB
          John Sherman

        Issue Links

          Activity

            People

              jfs John Sherman
              jfs John Sherman
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h