Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1114

Resource Manager Failure Due to Unreachable DNS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0-beta
    • None
    • resourcemanager
    • None
    • Centos 6.3, Hortonworks vendor distro based on Hadoop 2.1

    Description

      We encountered an issue last night where DNS was not resolvable on our cluster briefly.

      Our resource manager appears to have crashed due to an unresolvable hostname for a node manager. This is definitely not the right behavior since anyone can crash the resource manager by advertising a node manager with an unresolvable hostname. It also makes the RM non-very-robust to transient network issues that may arise.

      Here is the stack trace:

      2013-08-28 05:06:24,703 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
      java.lang.IllegalArgumentException: java.net.UnknownHostException: <hostname removed>
              at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
              at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:243)
              at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.createContainer(AppSchedulable.java:160)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:237)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:338)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:364)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:160)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:149)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:907)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:980)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:110)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
              at java.lang.Thread.run(Thread.java:724)
      Caused by: java.net.UnknownHostException: <hostname removed>
              ... 14 more
      

      The following is our version information (from the hortonworks distro):

      Hadoop 2.1.0.2.0.4.0-38
      Subversion git@github.com:hortonworks/hadoop.git -r 1c6feea9d537846789eb3337dc5b1a8911cfd60a
      Compiled by jenkins on 2013-07-08T10:29Z
      From source with checksum d1403d7842ef98c85d5f3d1332fa4
      This command was run using /usr/lib/hadoop/hadoop-common-2.1.0.2.0.4.0-38.jar
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ekohlwey Ed Kohlwey
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: