Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
RegionPartitionReleaseStrategy is responsible for releasing result partitions when all the downstream tasks finish.
The current implementation is:
for each consumed SchedulingResultPartition of current finished SchedulingPipelinedRegion: for each consumer SchedulingPipelinedRegion of the SchedulingResultPartition: if all the regions are finished: release the partitions
The time complexity of releasing a result partition is O(N^2). However, considering that during the entire stage, all the result partitions need to be released, the time complexity is actually O(N^3).
After the optimization of DefaultSchedulingTopology, the consumed result partitions are grouped. Since the result partitions in one group are isomorphic, we can just cache the finished status of result partition groups and the corresponding pipeline regions.
The optimized implementation is:
for each ConsumedPartitionGroup of current finished SchedulingPipelinedRegion: if all consumer SchedulingPipelinedRegion of the ConsumedPartitionGroup are finished: set the ConsumePartitionGroup to be fully consumed for result partition in the ConsumePartitionGroup: if all the ConsumePartitionGroups it belongs to are fully consumed: release the result partition
Attachments
Issue Links
- is related to
-
FLINK-22007 PartitionReleaseInBatchJobBenchmarkExecutor seems to be failing
- Closed
- links to