Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.1.0-beta
-
None
-
None
-
Centos 6.3, Hortonworks vendor distro based on Hadoop 2.1
Description
We encountered an issue last night where DNS was not resolvable on our cluster briefly.
Our resource manager appears to have crashed due to an unresolvable hostname for a node manager. This is definitely not the right behavior since anyone can crash the resource manager by advertising a node manager with an unresolvable hostname. It also makes the RM non-very-robust to transient network issues that may arise.
Here is the stack trace:
2013-08-28 05:06:24,703 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.IllegalArgumentException: java.net.UnknownHostException: <hostname removed> at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:243) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.createContainer(AppSchedulable.java:160) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:237) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:338) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:364) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:160) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:149) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:907) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:980) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:110) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) at java.lang.Thread.run(Thread.java:724) Caused by: java.net.UnknownHostException: <hostname removed> ... 14 more
The following is our version information (from the hortonworks distro):
Hadoop 2.1.0.2.0.4.0-38 Subversion git@github.com:hortonworks/hadoop.git -r 1c6feea9d537846789eb3337dc5b1a8911cfd60a Compiled by jenkins on 2013-07-08T10:29Z From source with checksum d1403d7842ef98c85d5f3d1332fa4 This command was run using /usr/lib/hadoop/hadoop-common-2.1.0.2.0.4.0-38.jar
Attachments
Issue Links
- is duplicated by
-
YARN-713 ResourceManager can exit unexpectedly if DNS is unavailable
- Closed