Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10868

Flink's JobCluster ResourceManager doesn't use maximum-failed-containers as limit of resource acquirement

    XMLWordPrintableJSON

Details

    Description

      Currently, YarnResourceManager does use yarn.maximum-failed-containers as limit of resource acquirement. In worse case, when new start containers consistently fail, YarnResourceManager will goes into an infinite resource acquirement process without failing the job. Together with the https://issues.apache.org/jira/browse/FLINK-10848, It will quick occupy all resources of yarn queue.

      Attachments

        Issue Links

          Activity

            People

              hpeter Zhenqiu Huang
              ZhenqiuHuang Zhenqiu Huang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h