Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.7.1
-
None
-
None
Description
Hi,
We are running a Yarn Cluster (for Samza Jobs) on AWS. We are running it in HA mode, with yarn.resourcemanager.ha.automatic-failover.enabled= true
To reproduce the issue :
- Have a running cluster with 2 NodeManagers and 2 Resource Managers in HA mode, with fail-over enabled.
- These Resource Managers need to have DNS entries defined, and set in the config:
- ex: yarnrm1.me.local and yarnrm2.me.local
- These Resource Managers need to have DNS entries defined, and set in the config:
- stop the active resource manager (yarnrm1.me.local), and retire its instance. (Node Managers will fallback to the standby yarnrm2.me.local)
- provision a new resource manager with a new IP. Make sure the DNS entry yarnrm1.me.local is assigned to it.
- stop the new active resource manager (yarnrm2.me.local).
- Check the logs of NodeManagers failing to access the newly provisioned Resource Manager, and trying to access it through the old IP.
I can provide config files, yarn-site and core-site if needed.