Details
-
Sub-task
-
Status: Closed
-
Blocker
-
Resolution: Done
-
None
-
None
Description
This task aims to verify FLINK-29663 which improves the adaptive batch scheduler.
Before the change of FLINK-29663, adaptive batch scheduler will distribute subpartitoins according to the number of subpartitions, make different downstream subtasks consume roughly the same number of subpartitions. This will lead to imbalance loads of different downstream tasks when the subpartitions contain different amounts of data.
To solve this problem, in FLINK-29663, we let the adaptive batch scheduler distribute subpartitoins according to the amount of data, so that different downstream subtasks consume roughly the same amount of data. Note that currently it only takes effect for All-To-All edges.
The documentation of adaptive scheduler can be found here
One can verify it by creating intended data skew on All-To-All edges.