Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21110 Optimize scheduler performance for large-scale jobs
  3. FLINK-21330

Optimize the performance of PipelinedRegionSchedulingStrategy

    XMLWordPrintableJSON

Details

    Description

      PipelinedRegionSchedulingStrategy is used for task scheduling. Its initialization is located at PipelinedRegionSchedulingStrategy#init. The initialization can be divided into two parts:

      1. Calculating consumed result partitions of SchedulingPipelinedRegions
      2. Calculating the consumer pipelined region of SchedulingResultPartition

      Based on FLINK-21328, the consumedResults of DefaultSchedulingPipelinedRegion can be replaced with ConsumedPartitionGroup.

      Then we can optimize the procedures we mentioned above. After the optimization, the time complexity decreases from O(N^2) to O(N).

      The related usage of getConsumedResults should be replaced, too. Furthermore, PipelinedRegionSchedulingStrategy#maybeScheduleRegion can be optimized at the same time.

      The detailed design doc is located at: https://docs.google.com/document/d/1OjGAyJ9Z6KsxcMtBHr6vbbrwP9xye7CdCtrLvf8dFYw/edit#heading=h.a1mz4yjpry6m
       

      Attachments

        Issue Links

          Activity

            People

              Thesharing Zhilong Hong
              Thesharing Zhilong Hong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: