Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7837

maybeShrinkIsr may not reflect OfflinePartitions immediately

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0, 2.1.1
    • Component/s: None
    • Labels:
      None

      Description

      When a partition is marked offline due to a failed disk, the leader is supposed to not shrink its ISR any more. In ReplicaManager.maybeShrinkIsr(), we iterate through all non-offline partitions to shrink the ISR. If an ISR needs to shrink, we need to write the new ISR to ZK, which can take a bit of time. In this window, some partitions could now be marked as offline, but may not be picked up by the iterator since it only reflects the state at that point. This can cause all in-sync followers to be dropped out of ISR unnecessarily and prevents a clean leader election.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dhruvilshah Dhruvil Shah
                Reporter:
                junrao Jun Rao
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: