Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10429

Redesign Flink Scheduling, introducing dedicated Scheduler component

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.0
    • Fix Version/s: None
    • Component/s: Runtime / Coordination
    • Labels:
      None

      Description

      This epic tracks the redesign of scheduling in Flink. Scheduling is currently a concern that is scattered across different components, mainly the ExecutionGraph/Execution and the SlotPool. Scheduling also happens only on the granularity of individual tasks, which make holistic scheduling strategies hard to implement. In this epic we aim to introduce a dedicated Scheduler component that can support use-case like auto-scaling, local-recovery, and resource optimized batch.

      The design for this feature is developed here: https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit?usp=sharing

        Attachments

          Issue Links

          1.
          Extract scheduling-related code from Executions Sub-task Open Stefan Richter  
          2.
          Introduce bulk/group-aware scheduling Sub-task Open Stefan Richter  
          3.
          Stepwise creation of the ExecutionGraph sub-structures Sub-task Open Stefan Richter  
          4.
          Add CLI command for rescaling Sub-task Open Unassigned  
          5.
          Enable MiniCluster tests based on schedulerNG in Flink cron build Sub-task In Progress Zhu Zhu
          6.
          Support configurable failover strategy for scheduler NG Sub-task Open Unassigned  
          7.
          Unify SchedulerOperations#allocateSlotsAndDeploy implementation for all scheduling strategies Sub-task Open Unassigned  
          8.
          Support global failure handling for DefaultScheduler (SchedulerNG) Sub-task In Progress Zhu Zhu
          9.
          All task state changes should be notified to SchedulingStrategy (SchedulerNG) Sub-task Open Unassigned  
          10.
          All partition consumable events should be notified to SchedulingStrategy (SchedulerNG) Sub-task Open Unassigned  
          11.
          Make LazyFromSourcesSchedulingStrategy do lazy scheduling based on partition state only Sub-task Open Unassigned  
          12.
          Change DefaultSchedulingResultPartition to return correct partition state Sub-task Open Unassigned  
          13.
          Prevent vertex from being affected by outdated deployment (SchedulerNG) Sub-task In Progress Zhu Zhu
          14.
          Enable ClassLoaderITCase and EventTimeWindowCheckpointingITCase to pass with scheduler NG Sub-task Open Unassigned  
          15.
          Enable KeyedStateCheckpointingITCase to pass with scheduler NG Sub-task Open Unassigned  
          16.
          Enable ZooKeeperHighAvailabilityITCase to pass with scheduler NG Sub-task Open Unassigned  
          17.
          Enable RegionFailoverITCase to pass with scheduler NG Sub-task Open Unassigned  
          18.
          Avoid to trigger failover on a non-effective task failure notification Sub-task Open Zhu Zhu  
          19.
          Restore task state in new DefaultScheduler Sub-task In Progress Zhu Zhu
          20.
          RestartPipelinedRegionStrategy leverage tracked partition availability for better failover experience in DefaultScheduler Sub-task Open Unassigned  
          21.
          Enable BatchFineGrainedRecoveryITCase to pass with scheduler NG Sub-task Open Unassigned  
          22.
          Refactor SchedulingTopology to extend base topology Sub-task In Progress Zhu Zhu
          23.
          Refactor FailoverTopology to extend base topology Sub-task In Progress Zhu Zhu
          24.
          Keep only one execution topology in scheduler Sub-task In Progress Zhu Zhu
          25.
          Support building pipelined regions from base topology Sub-task In Progress Zhu Zhu
          26.
          Add a metric to show failover count regarding fine grained recovery Sub-task Open Unassigned  

            Activity

              People

              • Assignee:
                gjy Gary Yao
                Reporter:
                srichter Stefan Richter
              • Votes:
                3 Vote for this issue
                Watchers:
                44 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 13h
                  13h