Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-4464

Clean shutdown of broker fails due to controller error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 0.10.1.0
    • 0.10.1.0
    • controller

    Description

      My cluster is unable to communicate to one of my brokers (Broker 1 in this case) and is spinning on logs:

      [2016-11-29 19:05:08,659] WARN [ReplicaFetcherThread-0-1], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@27aeb5f4 (kafka.server.ReplicaFetcherThread)
      java.io.IOException: Connection to 10.60.68.122:9092 (id: 1 rack: null) failed
      	at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:83)
      	at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:93)
      	at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:248)
      	at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
      	at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
      	at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
      	at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
      	at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
      
      

      and on Broker 1 I tried issuing a shutdown and I get this:

      [2016-11-29 18:33:16,152] INFO [Kafka Server 1], shutting down (kafka.server.KafkaServer)
      [2016-11-29 18:33:16,163] INFO [Kafka Server 1], Starting controlled shutdown (kafka.server.KafkaServer)
      [2016-11-29 18:33:38,347] INFO [Kafka Server 1], Remaining partitions to move: LONG_LIST_OF_TOPICS
      [2016-11-29 18:33:38,350] INFO [Kafka Server 1], Error code from controller: 0 (kafka.server.KafkaServer)
      [2016-11-29 18:33:43,356] WARN [Kafka Server 1], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer)
      [2016-11-29 18:34:04,053] INFO [Kafka Server 1], Remaining partitions to move: SAME_LONG_LIST_OF_TOPICS_AGAIN
      [2016-11-29 18:34:04,053] INFO [Kafka Server 1], Error code from controller: 0 (kafka.server.KafkaServer)
      [2016-11-29 18:34:09,054] WARN [Kafka Server 1], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer)
      [2016-11-29 18:34:32,577] INFO [Kafka Server 1], Remaining partitions to move: SAM_LONG_LIST_OF_TOPICS_AGAIN_AGAIN
      [2016-11-29 18:34:32,578] INFO [Kafka Server 1], Error code from controller: 0 (kafka.server.KafkaServer)
      [2016-11-29 18:34:37,579] WARN [Kafka Server 1], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer)
      [2016-11-29 18:34:37,586] WARN [Kafka Server 1], Proceeding to do an unclean shutdown as all the controlled shutdown attempts failed (kafka.server.KafkaServer)
      [2016-11-29 18:34:37,612] INFO [Socket Server on Broker 1], Shutting down (kafka.network.SocketServer)
      [2016-11-29 18:42:36,940] INFO Rolled new log segment for '__consumer_offsets-30' in 6 ms. (kafka.log.Log)
      [2016-11-29 18:43:52,440] INFO Deleting segment 71712593 from log __consumer_offsets-30. (kafka.log.Log)
      [2016-11-29 18:43:52,440] INFO Deleting segment 0 from log __consumer_offsets-30. (kafka.log.Log)
      [2016-11-29 18:43:52,492] INFO Deleting index /data/__consumer_offsets-30/00000000000071712593.index.deleted (kafka.log.OffsetIndex)
      [2016-11-29 18:43:52,532] INFO Deleting index /data/__consumer_offsets-30/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
      [2016-11-29 18:43:52,532] INFO Deleting index /data/__consumer_offsets-30/00000000000000000000.timeindex.deleted (kafka.log.TimeIndex)
      [2016-11-29 18:43:52,549] INFO Deleting index /data/__consumer_offsets-30/00000000000071712593.timeindex.deleted (kafka.log.TimeIndex)
      [2016-11-29 18:43:53,370] INFO Deleting segment 72483593 from log __consumer_offsets-30. (kafka.log.Log)
      [2016-11-29 18:43:53,478] INFO Deleting index /data/__consumer_offsets-30/00000000000072483593.index.deleted (kafka.log.OffsetIndex)
      [2016-11-29 18:43:53,479] INFO Deleting index /data/__consumer_offsets-30/00000000000072483593.timeindex.deleted (kafka.log.TimeIndex)
      

      so it says it's doing an unclean shutdown but then it refuses to stop the process. now I have this sort of zombie process and the other brokers are spinning even faster on trying to connect to it.

      What other logs can I provide to help debug this broker's failure?

      Attachments

        Activity

          People

            Unassigned Unassigned
            xrl Xavier Lange
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: