Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597 YARN Federation improvements
  3. YARN-8933

[AMRMProxy] Fix potential empty fields in allocation response, move SubClusterTimeout to FederationInterceptor

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • None
    • 2.10.0, 3.3.0
    • amrmproxy, federation
    • None

    Description

      After YARN-8696, the allocate response by FederationInterceptor is merged from the responses from a random subset of all sub-clusters, depending on the async heartbeat timing. As a result, cluster-wide information fields in the response, e.g. AvailableResources and NumClusterNodes, are not consistent at all. It can even be null/zero because the specific response is merged from an empty set of sub-cluster responses.

      In this patch, we let FederationInterceptor remember the last allocate response from all known sub-clusters, and always construct the cluster-wide info fields from all of them. We also moved sub-cluster timeout from LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that sub-clusters that expired (haven't had a successful allocate response for a while) won't be included in the computation.

      Attachments

        1. YARN-8933.v3.patch
          41 kB
          Botong Huang
        2. YARN-8933.v2.patch
          39 kB
          Botong Huang
        3. YARN-8933.v1.patch
          39 kB
          Botong Huang

        Activity

          People

            botong Botong Huang
            botong Botong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: