Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597 YARN Federation improvements
  3. YARN-8581

[AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.10.0, 3.2.0
    • amrmproxy, federation
    • None

    Description

      In Federation, every time an AM heartbeat comes in, LocalityMulticastAMRMProxyPolicy in AMRMProxy splits the asks according to the list of active and enabled sub-clusters. However, if we haven't been able to heartbeat to a sub-cluster for some time (network issues, or we keep hitting some exception from YarnRM, or YarnRM master-slave switch is taking a long time etc.), we should consider the sub-cluster as unhealthy and stop routing asks there, until the heartbeat channel becomes healthy again. 

      Attachments

        1. YARN-8581.v1.patch
          19 kB
          Botong Huang
        2. YARN-8581.v2.patch
          19 kB
          Botong Huang
        3. YARN-8581-branch-2.v2.patch
          19 kB
          Botong Huang

        Activity

          People

            botong Botong Huang
            botong Botong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: