-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 0.9.0.1
-
Fix Version/s: 0.10.1.0
-
Component/s: controller
-
Labels:None
We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per day, the controllers in our clusters have their connection to all brokers disconnected, and then successfully reconnected a few hundred ms later. Each time this occurs we see a brief spike in our 99th percentile produce and consume times, reaching several hundred ms.
Here is an example of what we're seeing in the controller.log:
[2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], Controller 151 epoch 106 fails to send request {…} to broker Node(160, broker.160.hostname, 9092). Reconnecting to broker. (kafka.controller.RequestSendThread) java.io.IOException: Connection to 160 was disconnected before the response was read at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) at scala.Option.foreach(Option.scala:236) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) ... one each for all brokers (including the controller) ... [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending state change requests (kafka.controller.RequestSendThread) … one each for all brokers (including the controller) ...
- links to