Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Motivation
If the current leader A at epoch X gets partition from the rest of the quorum, quorum voter A will stay leader at epoch X. This happens because voter A will never receive an request from the rest of the voters increasing the epoch. These requests that typically increase the epoch of past leaders are BeginQuorumEpoch and Vote.
In addition if voter A (leader at epoch X) doesn't get partition from the rest of the brokers (observer in the KRaft protocol) the brokers will never learn about the new quorum leader. This happens because 1. observers learn about the leader from the Fetch response and 2. observer send a Fetch request to a random leader if the Fetch request times out.
Neither of these two scenarios will cause the broker to send a request to a different voter because the leader at epoch X will never send a different leader in the response and the broker will never send a Fetch request to a different voter because the Fetch request will never timeout.
Proposed Changes
In this scenario the A, the leader at epoch X, will stop receiving Fetch request from the majority of the voters. Voter A should resign as leader if the Fetch request from the majority of the voters is old enough. A reasonable value for "old enough" is the Fetch timeout value.
Attachments
Issue Links
- duplicates
-
KAFKA-15489 split brain in KRaft cluster
- Resolved
- relates to
-
KAFKA-15911 KRaft quorum leader should make sure the follower fetch is making progress
- Open