Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10848

Flink's Yarn ResourceManager can allocate too many excess containers

    XMLWordPrintableJSON

    Details

      Description

      Currently, both the YarnFlinkResourceManager and YarnResourceManager do not call removeContainerRequest() on container allocation success. Because the YARN AM-RM protocol is not a delta protocol (please see YARN-1902), AMRMClient will keep all ContainerRequests that are added and send them to RM.

      In production, we observe the following that verifies the theory: 16 containers are allocated and used upon cluster startup; when a TM is killed, 17 containers are allocated, 1 container is used, and 16 excess containers are returned; when another TM is killed, 18 containers are allocated, 1 container is used, and 17 excess containers are returned.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                till.rohrmann Till Rohrmann
                Reporter:
                suez1224 Shuyi Chen
              • Votes:
                4 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h