Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26012

Improve logging and dequeue logic in DelayQueue

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.5, 2.4.4, 3.0.0-alpha-2
    • 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.5
    • None
    • None
    • Reviewed

    Description

      In Remote Procedure dispatcher, before submitting (sub)Procedure to Thread pool, we enqueue it as DelayedWithTimeout object on DelayQueue. 

      TimeoutExecutorThread keeps dequeuing elements from this DelayQueue and submit the Procedure to the threadpool. The expiration of DelayedWithTimeout is determined by getDelay(TimeUnit):

      @Override
      public long getDelay(final TimeUnit unit) {
        return DelayedUtil.getRemainingTime(unit, getTimeout());
      }
      
        /**
         * @return Time remaining as milliseconds.
         */
        public static long getRemainingTime(final TimeUnit resultUnit, final long timeout) {
          final long currentTime = EnvironmentEdgeManager.currentTime();
          if (currentTime >= timeout) {
            return 0;
          }
          return resultUnit.convert(timeout - currentTime, TimeUnit.MILLISECONDS);
        }
      

      Hence, in order for the elements to get dequeued on time, it is necessary that EnvironmentEdgeManager.currentTime() returns the current time in millis.

      As part of unit test, if we use our own custom EnvironmentEdge and inject it using EnvironmentEdgeManager.injectEdge before creating any tables, it is possible that we continue returning same value (based on custom impl) with EnvironmentEdgeManager.currentTime(). If that is the case, getRemainingTime as mentioned above, will never return 0 and hence, the procedure wrapped in DelayedWithTimeout might never be dequeued from DelayQueue because it's delay will not expire.

      As of today, our system goes in hanging state while waiting for table regions to be available (as mentioned above, DelayedWithTimeout object never gets dequeued from DelayQueue).

      Thread dump might show something like this consistently:

      "ProcedureDispatcherTimeoutThread" #319 daemon prio=5 os_prio=31 tid=0x00007fcaf0cae800 nid=0x21d03 waiting on condition [0x0000700019293000]
         java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007225a0090> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.DelayQueue.take(DelayQueue.java:223)
        at org.apache.hadoop.hbase.procedure2.util.DelayedUtil.takeWithoutInterrupt(DelayedUtil.java:82)
        at org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$TimeoutExecutorThread.run(RemoteProcedureDispatcher.java:314)
         Locked ownable synchronizers:
        - None
      

      Although running into situation like this is not likely possible unless custom EnvironmentEdge is used as mentioned above, we should improve our dequeue logic as well as log important message to show where we are stuck.

      Attachments

        Activity

          People

            vjasani Viraj Jasani
            vjasani Viraj Jasani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: