Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6403

Invalid local resource request can raise NPE and make NM exit

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Recently we found this problem on our testing environment. The app that caused this problem added a invalid local resource request(have no location) into ContainerLaunchContext like this:

          localResources.put("test", LocalResource.newInstance(location,
              LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100,
              System.currentTimeMillis()));
          ContainerLaunchContext amContainer =
              ContainerLaunchContext.newInstance(localResources, environment,
                vargsFinal, null, securityTokens, acls);
      

      The actual value of location was null although app doesn't expect that. This mistake cause several NMs exited with the NPE below and can't restart until the nm recovery dirs were deleted.

      FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.<init>(LocalResourceRequest.java:46)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660)
              at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
              at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
              at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
              at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
              at java.lang.Thread.run(Thread.java:745)
      

      NPE occured when created LocalResourceRequest instance for invalid resource request.

        public LocalResourceRequest(LocalResource resource)
            throws URISyntaxException {
          this(resource.getResource().toPath(),  //NPE occurred here
              resource.getTimestamp(),
              resource.getType(),
              resource.getVisibility(),
              resource.getPattern());
        }
      

      We can't guarantee the validity of local resource request now, but we could avoid damaging the cluster. Perhaps we can verify the resource both in ContainerLaunchContext and LocalResourceRequest? Please feel free to give your suggestions.

        Attachments

        1. YARN-6403.branch-2.8.004.patch
          11 kB
          Jason Lowe
        2. YARN-6403.004.patch
          10 kB
          Tao Yang
        3. YARN-6403.branch-2.8.004.patch
          11 kB
          Tao Yang
        4. YARN-6403.branch-2.8.003.patch
          8 kB
          Tao Yang
        5. YARN-6403.002.patch
          7 kB
          Tao Yang
        6. YARN-6403.001.patch
          3 kB
          Tao Yang

          Issue Links

            Activity

              People

              • Assignee:
                Tao Yang Tao Yang
                Reporter:
                Tao Yang Tao Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: