Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6064

Cluster hung when the controller tried to delete a bunch of topics

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Auto Closed
    • 0.8.2.1
    • None
    • controller
    • rhel 6, 12 core, 48GB

    Description

      Hi,

      We have been using 0.8.2.1 in our kafka cluster and we had a full cluster outage when we programmatically tried to delete 220 topics and the controller got hung and went out of memory. This has somehow led to the whole cluster outage and the clients were not able to push the data at the right rate. AFAIK, controller shouldn't impact the write rate to the fellow brokers and in this case, it did. Below is the client error.

      [WARN] Failed to send kafka.producer.async request with correlation id 1613935688 to broker 44 with data for partitions [topic_2,65],[topic_2,167],[topic_3,2],[topic_4,0],[topic_5,30],[topic_2,48],[topic_2,150]
      java.io.IOException: Broken pipe
      at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_60]
      at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:1.8.0_60]
      at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_60]
      at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) ~[?:1.8.0_60]
      at java.nio.channels.SocketChannel.write(SocketChannel.java:502) ~[?:1.8.0_60]
      at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) ~[stormjar.jar:?]
      at kafka.network.Send$class.writeCompletely(Transmission.scala:75) ~[stormjar.jar:?]
      at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) ~[stormjar.jar:?]
      at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:103) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103) ~[stormjar.jar:?]
      at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:102) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
      at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
      at kafka.producer.SyncProducer.send(SyncProducer.scala:101) ~[stormjar.jar:?]
      at kafka.producer.async.YamasKafkaEventHandler.kafka$producer$async$YamasKafkaEventHandler$$send(YamasKafkaEventHandler.scala:481) [stormjar.jar:?]
      at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:144) [stormjar.jar:?]
      at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:138) [stormjar.jar:?]
      at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) [stormjar.jar:?]
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
      at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) [stormjar.jar:?]
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) [stormjar.jar:?]
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) [stormjar.jar:?]
      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) [stormjar.jar:?]
      at kafka.producer.async.YamasKafkaEventHandler.dispatchSerializedData(YamasKafkaEventHandler.scala:138) [stormjar.jar:?]
      at kafka.producer.async.YamasKafkaEventHandler.handle(YamasKafkaEventHandler.scala:79) [stormjar.jar:?]
      at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105) [stormjar.jar:?]
      at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88) [stormjar.jar:?]
      at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68) [stormjar.jar:?]
      at scala.collection.immutable.Stream.foreach(Stream.scala:547) [stormjar.jar:?]
      at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67) [stormjar.jar:?]
      at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45) [stormjar.jar:?]

      We tried shifting the controller to a different broker and that didn't help. We had to ultimately clean up the kafka cluster to stabilize it.

      Wondering if this is a known issue and if not we would appreciate it if anyone in the community could provide insights into why the hung controller would bring down the cluster and why deleting the topics would cause the controllers hang.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            chaitanyagsk Chaitanya GSK
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment