Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7136

PushHttpMetricsReporter may deadlock when processing metrics changes

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.1.0, 2.0.0
    • Fix Version/s: 1.1.1, 2.0.0
    • Component/s: metrics
    • Labels:
      None

      Description

      We noticed a deadlock in PushHttpMetricsReporter. Locking for metrics was changed under KAFKA-6765 to avoid NullPointerException in metrics reporters due to concurrent read and updates. PushHttpMetricsReporter requires a lock to process metrics registration that is invoked while holding the sensor lock. It also reads metrics attempting to acquire sensor lock while holding its lock (inverse order). This resulted in the deadlock below.

      Found one Java-level deadlock:
      Java stack information for the threads listed above:
      ===================================================
      "StreamThread-7":
      at org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)

      • waiting to lock <0x0000000655a54310> (a java.lang.Object)
        at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
      • locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
      • locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
        at org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
        at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
        at org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
        at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
        at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
        at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
        at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
        at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
        at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)

      "pool-17-thread-1":
      at org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)

      • waiting to lock <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
        at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
        at org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
      • locked <0x0000000655a54310> (a java.lang.Object)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

      Found 1 deadlock.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rsivaram Rajini Sivaram
                Reporter:
                rsivaram Rajini Sivaram
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: