Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Auto Closed
-
0.8.2.1
-
None
-
rhel 6, 12 core, 48GB
Description
Hi,
We have been using 0.8.2.1 in our kafka cluster and we had a full cluster outage when we programmatically tried to delete 220 topics and the controller got hung and went out of memory. This has somehow led to the whole cluster outage and the clients were not able to push the data at the right rate. AFAIK, controller shouldn't impact the write rate to the fellow brokers and in this case, it did. Below is the client error.
[WARN] Failed to send kafka.producer.async request with correlation id 1613935688 to broker 44 with data for partitions [topic_2,65],[topic_2,167],[topic_3,2],[topic_4,0],[topic_5,30],[topic_2,48],[topic_2,150]
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:1.8.0_60]
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:1.8.0_60]
at sun.nio.ch.IOUtil.write(IOUtil.java:148) ~[?:1.8.0_60]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504) ~[?:1.8.0_60]
at java.nio.channels.SocketChannel.write(SocketChannel.java:502) ~[?:1.8.0_60]
at kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56) ~[stormjar.jar:?]
at kafka.network.Send$class.writeCompletely(Transmission.scala:75) ~[stormjar.jar:?]
at kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26) ~[stormjar.jar:?]
at kafka.network.BlockingChannel.send(BlockingChannel.scala:103) ~[stormjar.jar:?]
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) ~[stormjar.jar:?]
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72) ~[stormjar.jar:?]
at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:103) ~[stormjar.jar:?]
at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103) ~[stormjar.jar:?]
at kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:103) ~[stormjar.jar:?]
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
at kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:102) ~[stormjar.jar:?]
at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
at kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:102) ~[stormjar.jar:?]
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) ~[stormjar.jar:?]
at kafka.producer.SyncProducer.send(SyncProducer.scala:101) ~[stormjar.jar:?]
at kafka.producer.async.YamasKafkaEventHandler.kafka$producer$async$YamasKafkaEventHandler$$send(YamasKafkaEventHandler.scala:481) [stormjar.jar:?]
at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:144) [stormjar.jar:?]
at kafka.producer.async.YamasKafkaEventHandler$$anonfun$dispatchSerializedData$2.apply(YamasKafkaEventHandler.scala:138) [stormjar.jar:?]
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) [stormjar.jar:?]
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) [stormjar.jar:?]
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) [stormjar.jar:?]
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) [stormjar.jar:?]
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) [stormjar.jar:?]
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) [stormjar.jar:?]
at kafka.producer.async.YamasKafkaEventHandler.dispatchSerializedData(YamasKafkaEventHandler.scala:138) [stormjar.jar:?]
at kafka.producer.async.YamasKafkaEventHandler.handle(YamasKafkaEventHandler.scala:79) [stormjar.jar:?]
at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105) [stormjar.jar:?]
at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88) [stormjar.jar:?]
at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68) [stormjar.jar:?]
at scala.collection.immutable.Stream.foreach(Stream.scala:547) [stormjar.jar:?]
at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67) [stormjar.jar:?]
at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45) [stormjar.jar:?]
We tried shifting the controller to a different broker and that didn't help. We had to ultimately clean up the kafka cluster to stabilize it.
Wondering if this is a known issue and if not we would appreciate it if anyone in the community could provide insights into why the hung controller would bring down the cluster and why deleting the topics would cause the controllers hang.