Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Motivation
IEP-131
In HA mode for scale up situations:
- Let’s say we have [A, B, C] for the partition assignments, B and C left.
- Raft group was narrowed in force manner to [A], and after that node B returned, we must enhance stable to [A, B]
- In terms of DZ.scale up, there wasn't any change in DZ.scale up time window, so data nodes will be the same, so it could mean that we don’t need to schedule new rebalance to enhance stable assignment to [A, B], but we actually do need. (Note that DZ.scale down timer is quite big and wasn’t event passed)
Proposed enhancements
- Data nodes are rewritten on scale up even if they are the same
- When we decide if we need to trigger rebalance after data nodes change, we calculate assignments and apply nodes aliveness check filter to those assignments. If we see that the actual stablePartAssignmentsKey differs from the filtered one, we schedule rebalance.
- In our example, calculated assignments will be [A, B, C], we will filter them to [A,B] and schedule new rebalance to enhance stablePartAssignmentsKey
Definition of done
- Corresponding approach must be implemented, so nodes that returned back after majority loss could be returned back to stable
Attachments
Issue Links
- links to