Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-3916

Connection from controller to broker disconnects

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0.1
    • 0.10.1.0
    • controller
    • None

    Description

      We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per day, the controllers in our clusters have their connection to all brokers disconnected, and then successfully reconnected a few hundred ms later. Each time this occurs we see a brief spike in our 99th percentile produce and consume times, reaching several hundred ms.

      Here is an example of what we're seeing in the controller.log:

      [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], Controller 151 epoch 106 fails to send request {…} to broker Node(160, broker.160.hostname, 9092). Reconnecting to broker. (kafka.controller.RequestSendThread)
      java.io.IOException: Connection to 160 was disconnected before the response was read
              at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
              at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
              at scala.Option.foreach(Option.scala:236)
              at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
              at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
              at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
              at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
              at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
              at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
              at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
              at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
      
      ... one each for all brokers (including the controller) ...
      
       [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending state change requests (kafka.controller.RequestSendThread)
      
      … one each for all brokers (including the controller) ...
      

      Attachments

        Issue Links

          Activity

            People

              hachikuji Jason Gustafson
              poweld Dave Powell
              Votes:
              11 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: