Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5633

Clarify another scenario of unclean leader election

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      When unclean leader election is enabled, you don't need to lose all replicas of some partition, it's enough to lose just one. Leading replica can get into the state when it kicks everything out of ISR because it has issue with the network, then it can just die, causing leaderless partition.

      This is what we saw:

      Jul 24 18:05:53 broker-10029 kafka[4104]: INFO Partition [requests,9] on broker 10029: Shrinking ISR for partition [requests,9] from 10029,10016,10072 to 10029 (kafka.cluster.Partition)
      
              Topic: requests Partition: 9    Leader: -1      Replicas: 10029,10072,10016     Isr: 10029
      

      This is the default behavior in 0.11.0.0+, but I don't think that docs are completely clear about implications. Before the change you could silently lose data if the scenario described above happened, but now you can grind your whole pipeline to halt when just one node has issues. My understanding is that to avoid this you'd want to have min.insync.replicas > 1 and acks > 1 (probably all).

      It's also worth documenting how to force leader election when unclean leader election is disabled. I assume it can be accomplished by switching unclean.leader.election.enable on and off again for problematic topic, but being crystal clear on this it docs would be tremendously helpful.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bobrik Ivan Babrou
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: