[KAFKA-5633] Clarify another scenario of unclean leader election - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

When unclean leader election is enabled, you don't need to lose all replicas of some partition, it's enough to lose just one. Leading replica can get into the state when it kicks everything out of ISR because it has issue with the network, then it can just die, causing leaderless partition.

This is what we saw:

Jul 24 18:05:53 broker-10029 kafka[4104]: INFO Partition [requests,9] on broker 10029: Shrinking ISR for partition [requests,9] from 10029,10016,10072 to 10029 (kafka.cluster.Partition)

        Topic: requests Partition: 9    Leader: -1      Replicas: 10029,10072,10016     Isr: 10029

This is the default behavior in 0.11.0.0+, but I don't think that docs are completely clear about implications. Before the change you could silently lose data if the scenario described above happened, but now you can grind your whole pipeline to halt when just one node has issues. My understanding is that to avoid this you'd want to have min.insync.replicas > 1 and acks > 1 (probably all).

It's also worth documenting how to force leader election when unclean leader election is disabled. I assume it can be accomplished by switching unclean.leader.election.enable on and off again for problematic topic, but being crystal clear on this it docs would be tremendously helpful.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ivan Babrou

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 24/Jul/17 19:17

Updated:: 24/Jul/17 19:17