Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597 YARN Federation improvements
  3. YARN-8581

[AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.10.0, 3.2.0
    • Component/s: amrmproxy, federation
    • Labels:
      None

      Description

      In Federation, every time an AM heartbeat comes in, LocalityMulticastAMRMProxyPolicy in AMRMProxy splits the asks according to the list of active and enabled sub-clusters. However, if we haven't been able to heartbeat to a sub-cluster for some time (network issues, or we keep hitting some exception from YarnRM, or YarnRM master-slave switch is taking a long time etc.), we should consider the sub-cluster as unhealthy and stop routing asks there, until the heartbeat channel becomes healthy again. 

        Attachments

        1. YARN-8581.v1.patch
          19 kB
          Botong Huang
        2. YARN-8581.v2.patch
          19 kB
          Botong Huang
        3. YARN-8581-branch-2.v2.patch
          19 kB
          Botong Huang

          Activity

            People

            • Assignee:
              botong Botong Huang
              Reporter:
              botong Botong Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: