Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9506

Node Managers fail to update cached IP entries of Resource Managers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      Hi,

      We are running a Yarn Cluster (for Samza Jobs) on AWS. We are running it in HA mode, with yarn.resourcemanager.ha.automatic-failover.enabled= true

      To reproduce the issue : 

      1. Have a running cluster with 2 NodeManagers and 2 Resource Managers in HA mode, with fail-over enabled.
        • These Resource Managers need to have DNS entries defined, and set in the config:
          • ex: yarnrm1.me.local and yarnrm2.me.local
      2. stop the active resource manager (yarnrm1.me.local), and retire its instance. (Node Managers will fallback to the standby yarnrm2.me.local)
      3. provision a new resource manager with a new IP. Make sure the DNS entry yarnrm1.me.local is assigned to it.
      4. stop the new active resource manager (yarnrm2.me.local).
      5. Check the logs of NodeManagers failing to access the newly provisioned Resource Manager, and trying to access it through the old IP.

      I can provide config files, yarn-site and core-site if needed.

        Attachments

        1. NM_logs.txt
          8 kB
          Marouane RAJI

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rmarou Marouane RAJI
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: