Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21110 Optimize scheduler performance for large-scale jobs
  3. FLINK-21920

Optimize ExecutionGraphToInputsLocationsRetrieverAdapter

    XMLWordPrintableJSON

Details

    Description

      Based on the scheduler benchmark introduced in FLINK-21731, we find that there's a procedure related to DefaultScheduler#allocateSlots that has O(N^2) complexity, which is: ExecutionGraphToInputsLocationsRetrieverAdapter#getConsumedResultPartitionsProducers.

      The original implementation is:

      for all SchedulingExecutionVertex in DefaultScheduler:
        for all ConsumedPartitionGroup of the SchedulingExecutionVertex:
          for all IntermediateResultPartition in the ConsumedPartitionGroup:
            get producer of the IntermediateResultPartition 

      This procedure has O(N^2) complexity.

      We can see that for each SchedulingExecutionVertex, the producers of its ConsumedPartitionGroup is calculated separately. For the SchedulingExecutionVertices in the same ConsumerVertexGroup, they have the same ConsumedPartitionGroup. Therefore, we don't need to calculate the producers over and over again. We can use a local cache to cache the producers. This will decrease the complexity from O(N^2) to O(N).

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Thesharing Zhilong Hong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: