Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7837

maybeShrinkIsr may not reflect OfflinePartitions immediately

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.2.0, 2.1.1
    • None
    • None

    Description

      When a partition is marked offline due to a failed disk, the leader is supposed to not shrink its ISR any more. In ReplicaManager.maybeShrinkIsr(), we iterate through all non-offline partitions to shrink the ISR. If an ISR needs to shrink, we need to write the new ISR to ZK, which can take a bit of time. In this window, some partitions could now be marked as offline, but may not be picked up by the iterator since it only reflects the state at that point. This can cause all in-sync followers to be dropped out of ISR unnecessarily and prevents a clean leader election.

      Attachments

        Issue Links

          Activity

            People

              dhruvilshah Dhruvil Shah
              junrao Jun Rao
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: