Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10848

Flink's Yarn ResourceManager can allocate too many excess containers

    XMLWordPrintableJSON

Details

    Description

      Currently, both the YarnFlinkResourceManager and YarnResourceManager do not call removeContainerRequest() on container allocation success. Because the YARN AM-RM protocol is not a delta protocol (please see YARN-1902), AMRMClient will keep all ContainerRequests that are added and send them to RM.

      In production, we observe the following that verifies the theory: 16 containers are allocated and used upon cluster startup; when a TM is killed, 17 containers are allocated, 1 container is used, and 16 excess containers are returned; when another TM is killed, 18 containers are allocated, 1 container is used, and 17 excess containers are returned.

      Attachments

        Issue Links

          Activity

            People

              trohrmann Till Rohrmann
              suez1224 Shuyi Chen
              Votes:
              4 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h