Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-8532

controller-event-thread deadlock with zk-session-expiry-handler0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 2.1.1
    • None
    • core
    • None

    Description

      We have observed a serious deadlock between controller-event-thead and zk-session-expirey-handle thread. When this issue occurred, it's only one way to recovery the kafka cluster is restart kafka server. The  follows is the jstack log of controller-event-thead and zk-session-expiry-handle thread.

      "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 tid=0x00007fcc9c010000 nid=0xfb22 waiting on condition [0x00007fcbb01f8000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00000005ee3f7000> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 等待controller-event-thread线程处理expireEvent
        at kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
        at kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
        at kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
        at kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
        at kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
        at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown Source)
        at scala.collection.Iterator.foreach(Iterator.scala:937)
        at scala.collection.Iterator.foreach$(Iterator.scala:937)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
        at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
        at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
        at kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
        at kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown Source)
        at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
        at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown Source)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

      Locked ownable synchronizers:

      • <0x0000000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

      "controller-event-thread" #51 prio=5 os_prio=0 tid=0x00007fceaeec4000 nid=0x310 waiting on condition [0x00007fccb55c8000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00000005d1be5a00> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at kafka.zookeeper.ZooKeeperClient.handleRequests(ZooKeeperClient.scala:157)
        at kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1596)
        at kafka.zk.KafkaZkClient.kafka$zk$KafkaZkClient$$retryRequestUntilConnected(KafkaZkClient.scala:1589)
        at kafka.zk.KafkaZkClient.deletePreferredReplicaElection(KafkaZkClient.scala:989)
        at kafka.controller.KafkaController.removePartitionsFromPreferredReplicaElection(KafkaController.scala:873)
        at kafka.controller.KafkaController.kafka$controller$KafkaController$$onPreferredReplicaElection(KafkaController.scala:631)
        at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:266)
        at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1221)
        at kafka.controller.KafkaController$Reelect$.process(KafkaController.scala:1508)
        at kafka.controller.KafkaController$RegisterBrokerAndReelect$.process(KafkaController.scala:1517)
        at kafka.controller.ControllerEventManager$ControllerEventThread.$anonfun$doWork$1(ControllerEventManager.scala:89)
        at kafka.controller.ControllerEventManager$ControllerEventThread$$Lambda$362/918424570.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
        at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:89)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)

      Locked ownable synchronizers:

       

      this issue is same to https://issues.apache.org/jira/browse/KAFKA-6879 , I doubt that  KAFKA-6879 was not been fixed complete , please be attention to this issue

      • None

      Attachments

        1. js.log
          68 kB
          leibo
        2. js0.log
          62 kB
          leibo
        3. js1.log
          63 kB
          leibo
        4. js2.log
          63 kB
          leibo
        5. controllerDeadLockAnalysis-2020-01-20.png
          123 kB
          leibo

        Activity

          People

            Unassigned Unassigned
            lbdai3190 leibo
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: