Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25318 Improvement of scheduler and execution for Flink OLAP
  3. FLINK-15959

Add min number of slots configuration to limit total number of slots

    XMLWordPrintableJSON

Details

    • Hide
      Flink now supports defining the minimum resource requirements that the Flink cluster allocates using the configuration options 'slotmanager.min-total-resource.cpu', 'slotmanager.min-total-resource.memory', and 'slotmanager.number-of-slots.min'. These options are intended to ensure that a certain minimum level of resources is allocated to initialize specific workers during startup, thereby speeding up the job startup process. Please note that these configuration options do not have any effect on standalone clusters, as resource allocation in such clusters is not controlled by Flink.
      Show
      Flink now supports defining the minimum resource requirements that the Flink cluster allocates using the configuration options 'slotmanager.min-total-resource.cpu', 'slotmanager.min-total-resource.memory', and 'slotmanager.number-of-slots.min'. These options are intended to ensure that a certain minimum level of resources is allocated to initialize specific workers during startup, thereby speeding up the job startup process. Please note that these configuration options do not have any effect on standalone clusters, as resource allocation in such clusters is not controlled by Flink.

    Description

      Flink removed `-n` option after FLIP-6, change to ResourceManager start a new worker when required. But I think maintain a certain amount of slots is necessary. These workers will start immediately when ResourceManager starts and would not release even if all slots are free.
      Here are some resons:

      1. Users actually know how many resources are needed when run a single job, initialize all workers when cluster starts can speed up startup process.
      2. Job schedule in topology order, next operator won't schedule until prior execution slot allocated. The TaskExecutors will start in several batchs in some cases, it might slow down the startup speed.
      3. Flink support FLINK-12122 [Spread out tasks evenly across all available registered TaskManagers], but it will only effect if all TMs are registered. Start all TMs at begining can slove this problem.

      suggestion:

      • Add config "taskmanager.minimum.numberOfTotalSlots" and "taskmanager.maximum.numberOfTotalSlots", default behavior is still like before.
      • Start plenty number of workers to satisfy minimum slots when ResourceManager accept leadership(subtract recovered workers).
      • Don't comlete slot request until minimum number of slots are registered, and throw exeception when exceed maximum.

      update

      Finally, we'd like to introduce three config options related to the minimum resources restriction:

      • slotmanager.min-total-resource.cpu
      • slotmanager.min-total-resource.memory
      • slotmanager.number-of-slots.min

      Note that these configuration do not take effect for standalone clusters, where how many slots are allocated is not controlled by Flink. These config are best effort and Flink will not block the job progress even if the min resources are not guaranteed.

      Attachments

        Issue Links

          Activity

            People

              xiangyu0xf xiangyu feng
              liuyufei YufeiLiu
              Votes:
              3 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: