Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1894

RM shutdown due to java.net.UnknownHostException

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 2.4.0
    • resourcemanager
    • None

    Description

      Background:
      ----------------
      I started Hadoop 2.3 on my Mac in my office network and submitted few jobs successfully. When i went to my home (new network), I submitted another job and it abruptly pulled down the RM service.

      Error in RM log:

      2014-03-29 12:28:56,754 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing RMDelegation token with sequence number: 3
      2014-03-29 12:28:57,256 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
      java.lang.IllegalArgumentException: java.net.UnknownHostException: mislam-mn.<MY.OOFICE.DOMAIN>
              at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
              at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
              at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1294)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1342)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1208)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1167)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:868)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:642)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:556)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:696)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:740)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:88)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:543)
              at java.lang.Thread.run(Thread.java:695)
      Caused by: java.net.UnknownHostException: mislam-mn.linkedin.biz
              ... 15 more
      2014-03-29 12:28:57,259 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
      2014-03-29 12:28:57,297 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:8088
      2014-03-29 12:28:57,401 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
      2014-03-29 12:28:57,473 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
      .....
      

      Proposal:
      ---------------
      I believe the root cause : I moved my machine from one network to another with the same RM service.

      My point is: Whatever the cause, RM is a long running core-service and it should not exit this way. An appropriate error message should be sufficient.

      If there is an consensus (or no disagreement), I can work for a patch.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kamrul Mohammad Islam
            kamrul Mohammad Islam
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment