Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32484

AdaptiveScheduler combined restart during scaling out

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.17.0
    • None
    • API / Core
    • None

    Description

      On a scaling-out operation, when nodes are added at different times, AdaptiveScheduler does multiple restarts within a short period of time. On one of our Flink jobs, we have seen AdaptiveScheduler restart the ExecutionGraph every time there is a notification of new resources to it. There are five restarts within 3 minutes.

      AdaptiveScheduler could provide a configurable restart window interval to the user during which it combines the notified resources and restarts once when the available resources are sufficient to fit the desired parallelism or when the window times out. The window is created during the first notification of resources received. This is applicable only when the execution graph is in the executing state and not in the waiting for resources state.

       

      [root@ip-1-2-3-4 container_1688034805200_0002_01_000001]# grep -i scale *
      jobmanager.log:2023-06-29 10:46:58,061 INFO  org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New resources are available. Restarting job to scale up.
      jobmanager.log:2023-06-29 10:47:57,317 INFO  org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New resources are available. Restarting job to scale up.
      jobmanager.log:2023-06-29 10:48:53,314 INFO  org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New resources are available. Restarting job to scale up.
      jobmanager.log:2023-06-29 10:49:27,821 INFO  org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New resources are available. Restarting job to scale up.
      jobmanager.log:2023-06-29 10:50:15,672 INFO  org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler [] - New resources are available. Restarting job to scale up.
      [root@ip-1-2-3-4 container_1688034805200_0002_01_000001]# 

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              prabhujoseph Prabhu Joseph
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: