Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-7870

SlotPool should cancel the slot request to RM if not need any more.

    XMLWordPrintableJSON

Details

    Description

      1. SlotPool will request slot to rm if its slots are not enough.
      2. If a slot request is not fulfilled in a certain time, SlotPool will treat the request as timeout and send a new slot request by triggering a failover in JobMaster, the previous request is not needed any more, but rm does not know it.
      3. This may cause the rm request much more resource than the job really need.
      For example:
      1. A job need 100 slots. RM request 100 container to YARN.
      2. But YARN is busy now, it has no resource for the job.
      3. The job failover as the resource request not fulfilled in time.
      4. It ask 100 slots again, now RM request 200 container to YARN.
      5. If failover server time, the containers request will become more and more.
      6. Now YARN has resource, it will find that the job may need thousands of containers. This is a waste of resources.

      Attachments

        Issue Links

          Activity

            People

              tiemsn shuai.xu
              tiemsn shuai.xu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: