Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7088

Kafka streams thread waits infinitely on transaction init

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.0.1
    • None
    • clients

    Description

      A kafka stream application thread stops processing without any feedback. The topic has 24 partitions and I noticed that processing stopped only for some partitions. I will describe what happened to partition:10. The application is still running (now for about 8 hours) and that thread is hanging there and no rebalancing that took place.

      There is no error (we have a custom `Thread.UncaughtExceptionHandler` which was not called). I noticed that after couple of minutes stream stopped processing (at offset 32606948 where log-end-offset is 33472402). 

      Broker itself is not reporting any active consumer in that consumer group and the only info I was able to gather was from thread dump:

      "mp_ads_publisher_pro_madstorage-web-corotos-prod-9db804ae-2a7a-431f-be09-392ab38cd8a2-StreamThread-33" #113 prio=5 os_prio=0 tid=0x00007fe07c6349b0 nid=0xf7a waiting on condition [0x00007fe0215d4000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)
      - parking to wait for <0x00000000fec6a2f8> (a java.util.concurrent.CountDownLatch$Sync)
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
      at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
      at org.apache.kafka.clients.producer.internals.TransactionalRequestResult.await(TransactionalRequestResult.java:50)
      at org.apache.kafka.clients.producer.KafkaProducer.initTransactions(KafkaProducer.java:554)
      at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:151)
      at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:404)
      at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:365)
      at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.createTasks(StreamThread.java:350)
      at org.apache.kafka.streams.processor.internals.TaskManager.addStreamTasks(TaskManager.java:137)
      at org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:88)
      at org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:259)
      at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:264)
      at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:367)
      at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
      at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:295)
      at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1146)
      at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1111)
      at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:851)
      at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:808)
      at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
      at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)

       

      I tried restarting application once but the situation repeated. Thread read some data, committed offset and stopped processing, leaving that thread in wait state.

      FYI: we have EOS enabled

      Attachments

        Activity

          People

            Unassigned Unassigned
            lgluchowski Lukasz Gluchowski
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: