Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20414

TestLockProcedure#testMultipleLocks may fail on slow machine

    XMLWordPrintableJSON

Details

    • Test
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.1.0, 2.0.1
    • None
    • None
    • Reviewed

    Description

      Here was recent failure : https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/172/testReport/junit/org.apache.hadoop.hbase.master.locking/TestLockProcedure/health_checks___yetus_jdk8_hadoop2_checks___testMultipleLocks/

      java.lang.AssertionError: expected:<LOCKED> but was:<UNLOCKED>
      	at org.apache.hadoop.hbase.master.locking.TestLockProcedure.sendHeartbeatAndCheckLocked(TestLockProcedure.java:221)
      	at org.apache.hadoop.hbase.master.locking.TestLockProcedure.testMultipleLocks(TestLockProcedure.java:311)
      

      In the test output, we can see this:

      2018-04-13 20:19:54,230 DEBUG [Time-limited test] locking.TestLockProcedure(225): Proc id 22 : LOCKED.
      ...
      2018-04-13 20:19:55,529 DEBUG [Time-limited test] procedure2.ProcedureExecutor(865): Stored pid=26, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure regions=a7f9adfd047350eabb199a39564ba4db,c1e297609590b707233a2f9c8bb51fa1, type=EXCLUSIVE
      
      2018-04-13 20:19:56,230 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure ProcedureEvent for pid=22, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=22, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE]
      

      After the pid=26 log, the code does this (1 second wait):

          // Assert tables & region locks are waiting because of namespace lock.
          Thread.sleep(HEARTBEAT_TIMEOUT / 2);
      

      On a slow machine (in the case above), there was only 730 msec between the queueing of regionsLock2 and the WAITING_TIMEOUT state of the nsLock. The 1 second wait was too long, leading to assertion failure.

      Attachments

        1. 20414.v1.txt
          1 kB
          Ted Yu
        2. 20414.v2.txt
          1 kB
          Ted Yu

        Activity

          People

            yuzhihong@gmail.com Ted Yu
            yuzhihong@gmail.com Ted Yu
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: