Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-1097

Race condition while reassigning low throughput partition leads to incorrect ISR information in zookeeper

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.8.0
    • 0.8.1
    • controller
    • None

    Description

      While moving partitions, the controller moves the old replicas through the following state changes -

      ONLINE -> OFFLINE -> NON_EXISTENT

      During the offline state change, the controller removes the old replica and writes the updated ISR to zookeeper and notifies the leader. Note that it doesn't notify the old replicas to stop fetching from the leader (to be fixed in KAFKA-1032). During the non-existent state change, the controller does not write the updated ISR or replica list to zookeeper. Right after the non-existent state change, the controller writes the new replica list to zookeeper, but does not update the ISR. So an old replica can send a fetch request after the offline state change, essentially letting the leader add it back to the ISR. The problem is that if there is no new data coming in for the partition and the old replica is fully caught up, the leader cannot remove it from the ISR. That lets a non existent replica live in the ISR at least until new data comes in to the partition

      Attachments

        1. KAFKA-1097_2013-10-29_10:49:45.patch
          28 kB
          Neha Narkhede
        2. KAFKA-1097_2013-10-30_21:46:00.patch
          40 kB
          Neha Narkhede
        3. KAFKA-1097_2013-10-31_10:37:29.patch
          41 kB
          Neha Narkhede
        4. KAFKA-1097_2013-11-01_09:55:33.patch
          41 kB
          Neha Narkhede
        5. KAFKA-1097.patch
          11 kB
          Neha Narkhede

        Activity

          People

            nehanarkhede Neha Narkhede
            nehanarkhede Neha Narkhede
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: