[IGNITE-23572] Change rebalance scheduling when data nodes are changed - ASF JIRA

XML

Word

Printable

JSON

IEP-131

In HA mode for scale up situations:

Let’s say we have [A, B, C] for the partition assignments, B and C left.
Raft group was narrowed in force manner to [A], and after that node B returned, we must enhance stable to [A, B]
In terms of DZ.scale up, there wasn't any change in DZ.scale up time window, so data nodes will be the same, so it could mean that we don’t need to schedule new rebalance to enhance stable assignment to [A, B], but we actually do need. (Note that DZ.scale down timer is quite big and wasn’t event passed)

Proposed enhancements

Data nodes are rewritten on scale up even if they are the same
When we decide if we need to trigger rebalance after data nodes change, we calculate assignments and apply nodes aliveness check filter to those assignments. If we see that the actual stablePartAssignmentsKey differs from the filtered one, we schedule rebalance.
In our example, calculated assignments will be [A, B, C], we will filter them to [A,B] and schedule new rebalance to enhance stablePartAssignmentsKey

Corresponding approach must be implemented, so nodes that returned back after majority loss could be returned back to stable

links to

GitHub Pull Request #4905

Estimated:

Not Specified

Remaining:

Logged:

50m