Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9506

Node Managers fail to update cached IP entries of Resource Managers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.1
    • None
    • nodemanager
    • None

    Description

      Hi,

      We are running a Yarn Cluster (for Samza Jobs) on AWS. We are running it in HA mode, with yarn.resourcemanager.ha.automatic-failover.enabled= true

      To reproduce the issue : 

      1. Have a running cluster with 2 NodeManagers and 2 Resource Managers in HA mode, with fail-over enabled.
        • These Resource Managers need to have DNS entries defined, and set in the config:
          • ex: yarnrm1.me.local and yarnrm2.me.local
      2. stop the active resource manager (yarnrm1.me.local), and retire its instance. (Node Managers will fallback to the standby yarnrm2.me.local)
      3. provision a new resource manager with a new IP. Make sure the DNS entry yarnrm1.me.local is assigned to it.
      4. stop the new active resource manager (yarnrm2.me.local).
      5. Check the logs of NodeManagers failing to access the newly provisioned Resource Manager, and trying to access it through the old IP.

      I can provide config files, yarn-site and core-site if needed.

      Attachments

        1. NM_logs.txt
          8 kB
          Marouane RAJI

        Activity

          People

            Unassigned Unassigned
            rmarou Marouane RAJI
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: