Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7876

Broker suddenly got disconnected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 2.1.0
    • None
    • controller, network
    • None

    Description

       

      We have 3 node cluster setup. There are scenarios that one of the broker suddenly got disconnected from the cluster but no underlying system issue is found. The node that got dc'ed wasn't able to release the partition it holds as the leader, hence clients (spring-boot) was unable to send/receive data from the issued broker.

      We noticed that it always happen to the active controller count.

      Environment details:

      Provider: AWS
      Kernel: 3.10.0-693.21.1.el7.x86_64
      OS: CentOS Linux release 7.5.1804 (Core)
      Scala version: 2.11
      Kafka version: 2.1.0

       Kafka config:

      ############################# Socket Server Settings #############################
      
      num.network.threads=3
      num.io.threads=8
      socket.send.buffer.bytes=102400
      socket.receive.buffer.bytes=102400
      socket.request.max.bytes=104857600
      
      ############################# Log Basics #############################
      
      num.partitions=1
      num.recovery.threads.per.data.dir=1
      
      ############################# Internal Topic Settings  #############################
      
      offsets.topic.replication.factor=3
      transaction.state.log.replication.factor=3
      transaction.state.log.min.isr=2
      
      ############################# Log Retention Policy #############################
      log.retention.hours=168
      log.segment.bytes=1073741824
      log.retention.check.interval.ms=300000
      
      
      ############################# Group Coordinator Settings #############################
      group.initial.rebalance.delay.ms=0
      
      ############################# Zookeeper #############################
      
      zookeeper.connection.timeout.ms=6000
      
      broker.id=1
      zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
      log.dirs=/data/kafka-node
      advertised.listeners=PLAINTEXT://node1:9092
      

      Broker disconnected controller log:

      [2019-01-26 05:03:52,512] TRACE [Controller id=2] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)
      [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Preferred replicas by broker Map(TOPICS->MAP) (kafka.controller.KafkaController)
      [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred replica for broker 2 Map() (kafka.controller.KafkaController)
      [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for broker 2 is 0.0 (kafka.controller.KafkaController)
      [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred replica for broker 1 Map() (kafka.controller.KafkaController)
      [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for broker 1 is 0.0 (kafka.controller.KafkaController)
      [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred replica for broker 3 Map() (kafka.controller.KafkaController)
      [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for broker 3 is 0.0 (kafka.controller.KafkaController)
      [2019-01-26 05:08:52,513] TRACE [Controller id=2] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)
      
      

      Broker working server.log:

      [2019-01-26 05:02:05,564] INFO [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=1637095899, epoch=21379644) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)
      [2019-01-26 05:02:05,573] WARN [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=3, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={PlayerGameRounds-8=(offset=2171960, logStartOffset=1483356, maxBytes=1048576, currentLeaderEpoch=Optional[2])}, isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessio
      nId=1637095899, epoch=21379644)) (kafka.server.ReplicaFetcherThread)
      java.io.IOException: Connection to 2 was disconnected before the response was read
              at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
              at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
              at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
              at kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
              at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
              at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
              at scala.Option.foreach(Option.scala:257)
              at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
              at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
              at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
      [2019-01-26 05:02:35,723] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:59988-1550 (kafka.network.Processor)
      [2019-01-26 05:02:35,731] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:59986-1550 (kafka.network.Processor)
      [2019-01-26 05:02:35,797] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:59494-1549 (kafka.network.Processor)
      [2019-01-26 05:02:35,816] WARN Attempting to send response via channel for which there is no open connection, connection id node3:9092-node2:53268-1530 (kafka.network.Processor)
      [2019-01-26 05:02:37,603] INFO [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Error sending fetch request (sessionId=1637095899, epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)
      
      

       

      The request handler idle metrics dropped during this issue:

         

       

      Attachments

        1. kafka-issue.png
          12 kB
          Anthony Lazam

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lazam Anthony Lazam
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: