Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6582

Partitions get underreplicated, with a single ISR, and doesn't recover. Other brokers do not take over and we need to manually restart the broker.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 2.1.1
    • Component/s: network
    • Labels:
      None
    • Environment:

      Description

      Partitions get underreplicated, with a single ISR, and doesn't recover. Other brokers do not take over and we need to manually restart the 'single ISR' broker (if you describe the partitions of replicated topic it is clear that some partitions are only in sync on this broker).

      This bug resembles KAFKA-4477 a lot, but since that issue is marked as resolved this is probably something else but similar.

      We have the same issue (or at least it looks pretty similar) on Kafka 1.0. 

      Since upgrading to Kafka 1.0 in November 2017 we've had these issues (we've upgraded from Kafka 0.10.2.1).

      This happens almost every 24-48 hours on a random broker. This is why we currently have a cronjob which restarts every broker every 24 hours. 

      During this issue the ISR shows the following server log: 

      [2018-02-20 12:02:08,342] WARN Attempting to send response via channel for which there is no open connection, connection id 10.132.0.32:9092-10.14.148.20:56352-96708 (kafka.network.Processor)
      [2018-02-20 12:02:08,364] WARN Attempting to send response via channel for which there is no open connection, connection id 10.132.0.32:9092-10.14.150.25:54412-96715 (kafka.network.Processor)
      [2018-02-20 12:02:08,349] WARN Attempting to send response via channel for which there is no open connection, connection id 10.132.0.32:9092-10.14.149.18:35182-96705 (kafka.network.Processor)
      [2018-02-20 12:02:08,379] WARN Attempting to send response via channel for which there is no open connection, connection id 10.132.0.32:9092-10.14.150.25:54456-96717 (kafka.network.Processor)
      [2018-02-20 12:02:08,448] WARN Attempting to send response via channel for which there is no open connection, connection id 10.132.0.32:9092-10.14.159.20:36388-96720 (kafka.network.Processor)
      [2018-02-20 12:02:08,683] WARN Attempting to send response via channel for which there is no open connection, connection id 10.132.0.32:9092-10.14.157.110:41922-96740 (kafka.network.Processor)
      

      Also on the ISR broker, the controller log shows this:

      [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-3-send-thread]: Controller 3 connected to 10.132.0.32:9092 (id: 3 rack: null) for sending state change requests (kafka.controller.RequestSendThread)
      [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-0-send-thread]: Controller 3 connected to 10.132.0.10:9092 (id: 0 rack: null) for sending state change requests (kafka.controller.RequestSendThread)
      [2018-02-20 12:02:14,928] INFO [Controller-3-to-broker-1-send-thread]: Controller 3 connected to 10.132.0.12:9092 (id: 1 rack: null) for sending state change requests (kafka.controller.RequestSendThread)

      And the non-ISR brokers show these kind of errors:

       

      2018-02-20 12:02:29,204] WARN [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Error in fetch to broker 3, request (type=FetchRequest, replicaId=1, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={......................}, isolationLevel=READ_UNCOMMITTED) (kafka.server.ReplicaFetcherThread)
      java.io.IOException: Connection to 3 was disconnected before the response was read
       at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:95)
       at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
       at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:205)
       at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:41)
       at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
       at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
       at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
      

       

        Attachments

        1. Screenshot 2019-01-18 at 13.16.59.png
          642 kB
          Juris Pavlyuchenkov
        2. Screenshot 2019-01-18 at 13.08.17.png
          496 kB
          Juris Pavlyuchenkov

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jurriaanpruis Jurriaan Pruis
            • Votes:
              11 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: