Kafka
  1. Kafka
  2. KAFKA-708

ISR becomes empty while marking a partition offline

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0, 0.8.1
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:

      Description

      Attached state change log shows that ISR becomes empty when a partition is being marked as offline.

      1. kafka-request.log.2013-01-16-15
        43 kB
        Swapnil Ghike
      2. kafka-708-v1.patch
        33 kB
        Neha Narkhede
      3. kafka-708-v2.patch
        36 kB
        Neha Narkhede

        Issue Links

          Activity

          Hide
          Neha Narkhede added a comment -

          Fixed the typo, checked in patch v2. Thanks for the review!

          Show
          Neha Narkhede added a comment - Fixed the typo, checked in patch v2. Thanks for the review!
          Hide
          Jun Rao added a comment -

          There is a typo logIndent in Logging that causes a compilation error. Other than that, +1 for patch v2. Once the typo is fixed, the patch can be checked in.

          Show
          Jun Rao added a comment - There is a typo logIndent in Logging that causes a compilation error. Other than that, +1 for patch v2. Once the typo is fixed, the patch can be checked in.
          Hide
          Neha Narkhede added a comment -

          1. Good point. I moved setting the leader to -1 to the removeReplicaFromIsr API. Basically, if you are removing a leader from the isr, then just set the leader field to -1 and remove it from the isr

          2. Included the fix

          Show
          Neha Narkhede added a comment - 1. Good point. I moved setting the leader to -1 to the removeReplicaFromIsr API. Basically, if you are removing a leader from the isr, then just set the leader field to -1 and remove it from the isr 2. Included the fix
          Hide
          Jun Rao added a comment -

          Thanks for the patch. Overall, looks good. A couple of minor comments:

          1. PartitionStateMachine: The patch sets the ISR in ZK to empty every time the partition goes offline. This means that, in the most common case when we can elect another broker as the leader, we will need to update ISR in ZK twice when the leader gone, the first time to set it to empty and the second time to set it to the new leader. I am wondering if we should set the ISR in ZK to empty in OfflinePartitionLeaderSelector.selectLeader() just before we throw a PartitionOfflineException. This way, in the common case, we avoid an extra ZK write.

          2. UtilTest,testCsvList(): assertTrue(emptyStringList!=null) should probably be assertTrue(emptyList!=null)

          Show
          Jun Rao added a comment - Thanks for the patch. Overall, looks good. A couple of minor comments: 1. PartitionStateMachine: The patch sets the ISR in ZK to empty every time the partition goes offline. This means that, in the most common case when we can elect another broker as the leader, we will need to update ISR in ZK twice when the leader gone, the first time to set it to empty and the second time to set it to the new leader. I am wondering if we should set the ISR in ZK to empty in OfflinePartitionLeaderSelector.selectLeader() just before we throw a PartitionOfflineException. This way, in the common case, we avoid an extra ZK write. 2. UtilTest,testCsvList(): assertTrue(emptyStringList!=null) should probably be assertTrue(emptyList!=null)
          Hide
          Neha Narkhede added a comment -

          Changes in this patch include -

          1. Allowing the isr list to be empty. This can happen if all the brokers in the isr fall out which can happen if all the replicas that host the partition are down. In other words, the partition is offline. The problem was that if the controller moves when the isr is empty or you restart the entire cluster, it relied on a non-empty isr.

          2. Marking the partition offline by setting the leader to -1 in zookeeper. This is because, today there is no way for an external tool to figure out the list of all offline partitions. If we were to build a Kafka cluster dashboard and list partitions and their metadata, we would want to know the leader for each partition. Until a new leader is elected, we continue to store the previous leader in zookeeper. If the partition goes offline and no new leader will ever come up, we still store the previous leader. This is not ideal and it might be worth to store some value like -1 to denote an offline partition

          3. Cleaned up logging for a partition. There were several places in the code that used a custom string like "[%s, %d]" or "(%s, %d)" to print a partition. This makes it very hard to trace the state changes on a partition while troubleshooting. I changed everything in kafka.controller to standardize on the toString() API of TopicAndPartition. I'm assuming the rest of the code will get cleaned up as part of KAFKA-649

          Show
          Neha Narkhede added a comment - Changes in this patch include - 1. Allowing the isr list to be empty. This can happen if all the brokers in the isr fall out which can happen if all the replicas that host the partition are down. In other words, the partition is offline. The problem was that if the controller moves when the isr is empty or you restart the entire cluster, it relied on a non-empty isr. 2. Marking the partition offline by setting the leader to -1 in zookeeper. This is because, today there is no way for an external tool to figure out the list of all offline partitions. If we were to build a Kafka cluster dashboard and list partitions and their metadata, we would want to know the leader for each partition. Until a new leader is elected, we continue to store the previous leader in zookeeper. If the partition goes offline and no new leader will ever come up, we still store the previous leader. This is not ideal and it might be worth to store some value like -1 to denote an offline partition 3. Cleaned up logging for a partition. There were several places in the code that used a custom string like " [%s, %d] " or "(%s, %d)" to print a partition. This makes it very hard to trace the state changes on a partition while troubleshooting. I changed everything in kafka.controller to standardize on the toString() API of TopicAndPartition. I'm assuming the rest of the code will get cleaned up as part of KAFKA-649

            People

            • Assignee:
              Neha Narkhede
              Reporter:
              Swapnil Ghike
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development