Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6837

Null LocalResource visibility or resource type can crash the nodemanager

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha4
    • Fix Version/s: 2.9.0, 3.0.0-beta1, 2.8.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When I write an yarn application, I create a LocalResource like this

      LocalResource resource = Records.newRecord(LocalResource.class);

      Because I forget to set the visibilty of it, so the job is failed when I submit it.
      But NodeManager shutdown one by one at the same time, and there is NullPointerExceptionin NodeManager's log:

      2017-07-18 17:54:09,289 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop IP=10.43.156.177 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1499221670783_0067 CONTAINERID=container_1499221670783_0067_02_000003
      2017-07-18 17:54:09,292 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceSet.addResources(ResourceSet.java:84)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:868)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:819)
      at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
      at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
      at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
      at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1684)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:96)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1418)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1411)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
      at java.lang.Thread.run(Thread.java:745)
      2017-07-18 17:54:09,292 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1499221670783_0067_02_000002 by user hadoop

      Then I change my code and still set the visibility to null

      LocalResource resource = LocalResource.newInstance(
      URL.fromURI(dst.toUri()),
      LocalResourceType.FILE, null,
      fileStatus.getLen(), fileStatus.getModificationTime());

      This error still happen.

      At last I set the visibility to correct value, the error do not happen again.
      So I think the visibility of LocalResource is null will cause NodeManager shutdown.

        Attachments

        1. YARN-6837.patch
          2 kB
          Jinjiang Ling
        2. YARN-6837-1.patch
          12 kB
          Jinjiang Ling

          Issue Links

            Activity

              People

              • Assignee:
                lingjinjiang Jinjiang Ling
                Reporter:
                lingjinjiang Jinjiang Ling
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: