Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
There are cases when a current leader cannot perform rebalance on specified set of nodes, for example, when some node from the raft group permanently fails with RaftError#ECATCHUP. For such scenario retry mechanism is implemented in IGNITE-16801, but we cannot retry rebalance intent infinitely, so there should be implemented mechanism for canceling a rebalance.
Naive canceling could be implemented by removing pending key and replacing it with planned key. But this approach has several crucial limitations and may cause inconsistency in the current rebalance protocol, for example, when there is a race between cancel and applying new assignment to the stable key from the new leader. We can remove pending key right before applying new assignment to the stable key, so we cannot resolve peers to ClusterIds, which is made on a union of pending and stable keys.
Also there is a case, when we can lost planned rebalance:
- Current leader retries failed rebalance
- Current leader stops being leader for some reasons and sleeps
- New leader performs rebalance and calls RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied
- At this moment old leader wakes up and cancels the current rebalance, so it removes pending and writes to it planned key.
- At this moment we receive RebalanceRaftGroupEventsListener#onNewPeersConfigurationApplied from the new leader, see that planned is empty, so we just delete pending key, but this is not correct to delete this key as far as the rebalance that is associated to the removed key hasn't been performed yet.
Also we should consider separating scenarios for recoverable and unrecoverable errors, because it might be useless to retry rebalance, if some participating node fails with unrecoverable error.
Seems like we should properly think about introducing some failure handling for such exceptional scenarios.
New node role from https://issues.apache.org/jira/browse/IGNITE-17252 primary replica, can help us to resolve this issue in a simplier way and cancel rebalance from the primary replica.
As a result of this issue we must design correct algorithm for cancelling hanged rebalance.
Attachments
Issue Links
- is blocked by
-
IGNITE-17252 Introduce Replica, ReplicaServer(?), ReplicaService and ReplicaListener interfaces
- Resolved
- is caused by
-
IGNITE-16801 Implement error handling for rebalance onReconfigurationError callback
- Resolved
- links to