Changes in this patch include -
1. Allowing the isr list to be empty. This can happen if all the brokers in the isr fall out which can happen if all the replicas that host the partition are down. In other words, the partition is offline. The problem was that if the controller moves when the isr is empty or you restart the entire cluster, it relied on a non-empty isr.
2. Marking the partition offline by setting the leader to -1 in zookeeper. This is because, today there is no way for an external tool to figure out the list of all offline partitions. If we were to build a Kafka cluster dashboard and list partitions and their metadata, we would want to know the leader for each partition. Until a new leader is elected, we continue to store the previous leader in zookeeper. If the partition goes offline and no new leader will ever come up, we still store the previous leader. This is not ideal and it might be worth to store some value like -1 to denote an offline partition
3. Cleaned up logging for a partition. There were several places in the code that used a custom string like "[%s, %d]" or "(%s, %d)" to print a partition. This makes it very hard to trace the state changes on a partition while troubleshooting. I changed everything in kafka.controller to standardize on the toString() API of TopicAndPartition. I'm assuming the rest of the code will get cleaned up as part of