Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9164

Shutdown NM may cause NPE when opportunistic container scheduling is enabled

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.4, 3.1.2, 3.3.0, 3.2.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We have meeted an NPE which can crash the whole cluster

      2018-12-31 22:18:11,924 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher
      java.lang.NullPointerException
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:696)
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1123)
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1827)
      at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
      at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
      at java.lang.Thread.run(Thread.java:745)
      
      

      this bug also happens in the latest trunk!

       

      workload is 

      $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-$VERSION.jar pi -Dmapreduce.job.num-opportunistic-maps-percent="100" 50 100
      

      while job is running , shutdown one NM

       also need inject sleep before AbstractYarnScheduler.getNode()

        Attachments

        1. hadoop-hires-resourcemanager-hadoop11.log
          71 kB
          lujie
        2. YARN-9164-0.patch
          1 kB
          lujie
        3. YARN-9164-1.patch
          12 kB
          lujie
        4. YARN-9164-2.patch
          12 kB
          lujie

          Issue Links

            Activity

              People

              • Assignee:
                xiaoheipangzi lujie
                Reporter:
                xiaoheipangzi lujie
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: