Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
Description
Background:
----------------
I started Hadoop 2.3 on my Mac in my office network and submitted few jobs successfully. When i went to my home (new network), I submitted another job and it abruptly pulled down the RM service.
Error in RM log:
2014-03-29 12:28:56,754 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing RMDelegation token with sequence number: 3 2014-03-29 12:28:57,256 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.IllegalArgumentException: java.net.UnknownHostException: mislam-mn.<MY.OOFICE.DOMAIN> at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1342) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1208) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1167) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:642) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:556) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:696) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:740) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:88) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:543) at java.lang.Thread.run(Thread.java:695) Caused by: java.net.UnknownHostException: mislam-mn.linkedin.biz ... 15 more 2014-03-29 12:28:57,259 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2014-03-29 12:28:57,297 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:8088 2014-03-29 12:28:57,401 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032 2014-03-29 12:28:57,473 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033 .....
Proposal:
---------------
I believe the root cause : I moved my machine from one network to another with the same RM service.
My point is: Whatever the cause, RM is a long running core-service and it should not exit this way. An appropriate error message should be sufficient.
If there is an consensus (or no disagreement), I can work for a patch.
Attachments
Issue Links
- is duplicated by
-
YARN-713 ResourceManager can exit unexpectedly if DNS is unavailable
-
- Closed
-