Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1767

/admin/reassign_partitions deleted before reassignment completes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 0.8.1.1
    • None
    • controller
    • None

    Description

      https://github.com/apache/kafka/blob/0.8.1.1/core/src/main/scala/kafka/controller/KafkaController.scala#L477-L517 describes the process of reassigning partitions. Specifically,by the time /admin/reassign_partitions is updated, the newly assigned replicas (RAR) should be in sync, and the assigned replicas (AR) in ZooKeeper should be updated:

      4. Wait until all replicas in RAR are in sync with the leader.
      ...
      10. Update AR in ZK with RAR.
      11. Update the /admin/reassign_partitions path in ZK to remove this partition.
      

      This worked in 0.8.1, but in 0.8.1.1 we observe /admin/reassign_partitions being removed before step 4 has completed.

      For example, if we have AR [1,2] and then put [3,4] in /admin/reassign_partitions, the cluster will end up with AR [1,2,3,4] and ISR [1,2] when the key is removed. Eventually, the AR will be updated to [3,4].

      This means that the kafka-reassign-partitions.sh tool will accept a new batch of reassignments before the current reassignments have finished, and our own tool that feeds in reassignments in small batches (see KAFKA-1677) can't rely on this key to detect active reassignments.

      Although we haven't observed this, it seems likely that if a controller resignation happens, the new controller won't know that a reassignment is in progress, and the AR will never be updated to the RAR.

      Attachments

        Activity

          People

            nehanarkhede Neha Narkhede
            rberdeen Ryan Berdeen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: