Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Auto Closed
-
0.8.2.1
-
None
-
None
Description
Our 2 kafka brokers ( 1 & 5) were rebooted due to hypervisor going down and I think we encountered a similar
issue that was discussed in thread "Problem with node after restart no partitions?". The resulting JIRA is closed without conclusions or
recovery steps.
Our Brokers 5 and 1 were also running zookeeper of our cluster (along with broker 2),
we are running kafka version 0.8.2.1
After doing a controlled restarts over all brokers a few times our cluster seems ok now.
But there are a some topics that have replicas out of sync with Leaders.
Partition 2 below has Leader 5 and replicas order should be 5,1
Topic:2015-01-12 PartitionCount:3 ReplicationFactor:2 Configs: Topic: 2015-01-12 Partition: 0 Leader: 4 Replicas: 4,3 Isr: 3,4 Topic: 2015-01-12 Partition: 1 Leader: 0 Replicas: 0,4 Isr: 0,4 Topic: 2015-01-12 Partition: 2 Leader: 5 Replicas: 1,5 Isr: 5
I tried reassigning partition 2 replicas to broker 5 (leader) and broker : 0
Now partition reassignment is stuck for more than a day.
%) /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper kafka-trgt05:2182 --reassignment-json-file 2015-01-12_2.json --verify
Status of partition reassignment:
Reassignment of partition [2015-01-12,2] is still in progress
And In zookeeper, reassign_partitions is empty..
[zk: kafka-trgt05:2182(CONNECTED) 2] ls /admin/reassign_partitions
[]
This seems like a bug being triggered, that leaves the cluster in unhealthy state.