Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-13944

Shutting down broker can be elected as partition leader in KRaft

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0
    • None

    Description

      When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN state in the controller. It is possible for the broker to remain unfenced in this state until the controlled shutdown completes. When doing an election, the only thing we generally check is that the broker is unfenced, so this means we can elect a broker that is in controlled shutdown. 

      Here are a few snippets from a recent system test in which this occurred:

      // broker 2 starts controlled shutdown
      [2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has requested and been granted a controlled shutdown. (org.apache.kafka.controller.BrokerHeartbeatManager)
       
      // there is only one replica, so we set leader to -1
      [2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)
      
      // controlled shutdown cannot complete immediately
      [2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 to shut down can not yet be granted because the lowest active offset 177 is not greater than the broker's shutdown offset 244. (org.apache.kafka.controller.BrokerHeartbeatManager)
      [2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled shutdown offset for broker 2 to 244. (org.apache.kafka.controller.BrokerHeartbeatManager)
      
      // later on we elect leader 2 again
      [2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)
      
      // now controlled shutdown is stuck because of the newly elected leader
      [2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled shutdown state, but can not shut down because more leaders still need to be moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
      

      Attachments

        Issue Links

          Activity

            People

              dajac David Jacot
              hachikuji Jason Gustafson
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: