Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.9.0.0
-
None
-
None
Description
Running 0.9.0.0, the controller can get into a state where it no longer is able to elect a leader for an Offline partition. It's unclear how this state is first achieved but in the steady state, this happens:
-There are partitions with a leader of -1
-The Controller repeatedly attempts a preferred leader election for these partitions
-The preferred leader election fails because the only replica in the ISR is not the preferred leader
The log cycle looks like this:
[2017-04-12 18:00:18,891] INFO [Controller 8]: Starting preferred replica leader election for partitions topic,1 [2017-04-12 18:00:18,891] INFO [Partition state machine on Controller 8]: Invoking state change to OnlinePartition for partitions topic,1 [2017-04-12 18:00:18,892] INFO [PreferredReplicaPartitionLeaderSelector]: Current leader -1 for partition [topic,1] is not the preferred replica. Trigerring preferred replica leader election (kafka.controller.PreferredReplicaPartitionLeaderSelector) [2017-04-12 18:00:18,893] WARN [Controller 8]: Partition [topic,1] failed to complete preferred replica leader election. Leader is -1 (kafka.controller.KafkaController)
It's not clear if this would happen on versions later that 0.9.0.0.