Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2347

Deadlock shutting down zookeeper

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.4.7
    • 3.4.8
    • None
    • None

    Description

      HBase recently upgraded to zookeeper 3.4.7

      In one of the tests, TestSplitLogManager, there is reproducible hang at the end of the test.
      Below is snippet from stack trace related to zookeeper:

      "main-EventThread" daemon prio=5 tid=0x00007fd27488a800 nid=0x6f1f waiting on condition [0x000000011834b000]
         java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007c5b8d3a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
      
      "main-SendThread(localhost:59510)" daemon prio=5 tid=0x00007fd274eb4000 nid=0x9513 waiting on condition [0x0000000118042000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101)
        at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
      
      "SyncThread:0" prio=5 tid=0x00007fd274d02000 nid=0x730f waiting for monitor entry [0x00000001170ac000]
         java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512)
        - waiting to lock <0x00000007c5b62128> (a org.apache.zookeeper.server.ZooKeeperServer)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144)
        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200)
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
      
      "main-EventThread" daemon prio=5 tid=0x00007fd2753a3800 nid=0x711b waiting on condition [0x0000000117a30000]
         java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007c9b106b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
      
      "main" prio=5 tid=0x00007fd276000000 nid=0x1903 in Object.wait() [0x0000000108aa1000]
         java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007c5b66400> (a org.apache.zookeeper.server.SyncRequestProcessor)
        at java.lang.Thread.join(Thread.java:1281)
        - locked <0x00000007c5b66400> (a org.apache.zookeeper.server.SyncRequestProcessor)
        at java.lang.Thread.join(Thread.java:1355)
        at org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213)
        at org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770)
        at org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478)
        - locked <0x00000007c5b62128> (a org.apache.zookeeper.server.ZooKeeperServer)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266)
        at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301)
      

      Note the address (0x00000007c5b66400) in the last hunk which seems to indicate some form of deadlock.

      According to Camille Fournier:

      We made shutdown synchronized. But decrementing the requests is
      also synchronized and called from a different thread. So yeah, deadlock.
      This came in with ZOOKEEPER-1907

      Attachments

        1. ZOOKEEPER-2347-br-3.4.patch
          12 kB
          Rakesh Radhakrishnan
        2. ZOOKEEPER-2347-br-3.4.patch
          12 kB
          Rakesh Radhakrishnan
        3. ZOOKEEPER-2347-br-3.4.patch
          11 kB
          Rakesh Radhakrishnan
        4. ZOOKEEPER-2347-br-3.4.patch
          5 kB
          Rakesh Radhakrishnan
        5. testSplitLogManager.stack
          16 kB
          Ted Yu

        Activity

          People

            rakeshr Rakesh Radhakrishnan
            yuzhihong@gmail.com Ted Yu
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: