Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597 YARN Federation improvements
  3. YARN-8933

[AMRMProxy] Fix potential empty fields in allocation response, move SubClusterTimeout to FederationInterceptor

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: None
    • Fix Version/s: 2.10.0, 3.3.0
    • Component/s: amrmproxy, federation
    • Labels:
      None

      Description

      After YARN-8696, the allocate response by FederationInterceptor is merged from the responses from a random subset of all sub-clusters, depending on the async heartbeat timing. As a result, cluster-wide information fields in the response, e.g. AvailableResources and NumClusterNodes, are not consistent at all. It can even be null/zero because the specific response is merged from an empty set of sub-cluster responses.

      In this patch, we let FederationInterceptor remember the last allocate response from all known sub-clusters, and always construct the cluster-wide info fields from all of them. We also moved sub-cluster timeout from LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that sub-clusters that expired (haven't had a successful allocate response for a while) won't be included in the computation.

        Attachments

        1. YARN-8933.v3.patch
          41 kB
          Botong Huang
        2. YARN-8933.v2.patch
          39 kB
          Botong Huang
        3. YARN-8933.v1.patch
          39 kB
          Botong Huang

          Activity

            People

            • Assignee:
              botong Botong Huang
              Reporter:
              botong Botong Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: