Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1317

KafkaServer 0.8.1 not responding to .shutdown() cleanly, possibly related to TopicDeletionManager or MetricsMeter state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.8.1
    • 0.8.1.1
    • None

    Description

      When I run an in-process instance of KafkaServer, send a message through it, then call shutdown(), some threads never exit and the process hangs until the process is killed manually. The same scenario does not result in a hang on 0.8.0. The hang happens when calling both shutdown() by itself as well as shutdown() and awaitShutdown() together. I have seen similar behavior shutting down a deployed kafka server as well, but haven't had time to diagnose whether or not it is the same symptom.

      I suspect either the metrics-meter-tick-thread-1 & 2 or delete-topics-thread
      (waiting in kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$awaitTopicDeletionNotification(TopicDeletionManager.scala:178) is to blame. Since the TopicDeletionManager is new, it seems more suspicious to me. A complete thread dump is attached; the suspect threads are below.

      "delete-topics-thread" prio=5 tid=0x00007fb3e31d2800 nid=0x6b03 waiting on condition [0x000000013c3b3000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x000000012e6e6920> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$awaitTopicDeletionNotification(TopicDeletionManager.scala:178)
        at kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:334)
        at kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:333)
        at kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:333)
        at kafka.utils.Utils$.inLock(Utils.scala:538)
        at kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:333)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

      Locked ownable synchronizers:

      • None

      "metrics-meter-tick-thread-2" daemon prio=5 tid=0x00007fb3e31c1000 nid=0x5f03 runnable [0x000000013ab8f000]
      java.lang.Thread.State: TIMED_WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x000000012e7d05d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

      Locked ownable synchronizers:

      • None

      "metrics-meter-tick-thread-1" daemon prio=5 tid=0x00007fb3e31ef800 nid=0x5e03 waiting on condition [0x000000013a98c000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x000000012e7d05d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1085)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

      Locked ownable synchronizers:

      • None

      Attachments

        1. KAFKA-1317_2014-03-28_09:34:02.patch
          10 kB
          Timothy Chen
        2. KAFKA-1317_2014-03-27_15:15:05.patch
          80 kB
          Timothy Chen
        3. KAFKA-1317_2014-03-26_15:18:52.patch
          7 kB
          Timothy Chen
        4. KAFKA-1317_2014-03-26_15:09:48.patch
          7 kB
          Timothy Chen
        5. KAFKA-1317.patch
          6 kB
          Timothy Chen
        6. KAFKA-1317_2014-03-26_11:30:57.patch
          7 kB
          Timothy Chen
        7. KAFKA-1317_2014-03-26_09:48:03.patch
          6 kB
          Timothy Chen
        8. KAFKA-1317_2014-03-25_15:20:14.patch
          6 kB
          Timothy Chen
        9. KAFKA-1317_2014-03-24_11:06:15.patch
          4 kB
          Timothy Chen
        10. KAFKA-1317_2014-03-23_23:48:28.patch
          4 kB
          Timothy Chen
        11. KAFKA-1317.patch
          5 kB
          Timothy Chen
        12. threaddump.txt
          12 kB
          Brent Bradbury

        Issue Links

          Activity

            People

              tnachen Timothy Chen
              brentbradbury Brent Bradbury
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: