Description
When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN state in the controller. It is possible for the broker to remain unfenced in this state until the controlled shutdown completes. When doing an election, the only thing we generally check is that the broker is unfenced, so this means we can elect a broker that is in controlled shutdown.
Here are a few snippets from a recent system test in which this occurred:
// broker 2 starts controlled shutdown [2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has requested and been granted a controlled shutdown. (org.apache.kafka.controller.BrokerHeartbeatManager) // there is only one replica, so we set leader to -1 [2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1, partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager) // controlled shutdown cannot complete immediately [2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 to shut down can not yet be granted because the lowest active offset 177 is not greater than the broker's shutdown offset 244. (org.apache.kafka.controller.BrokerHeartbeatManager) [2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled shutdown offset for broker 2 to 244. (org.apache.kafka.controller.BrokerHeartbeatManager) // later on we elect leader 2 again [2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1 with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2, partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager) // now controlled shutdown is stuck because of the newly elected leader [2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled shutdown state, but can not shut down because more leaders still need to be moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
Attachments
Issue Links
- is fixed by
-
KAFKA-13916 Fenced replicas should not be allowed to join the ISR in KRaft (KIP-841)
- Resolved