Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4152

NM crash with NPE when LogAggregationService#stopContainer called for absent container

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      NM crash during of log aggregation.
      Ran Pi job with 500 container and killed application in between

      Logs

      2015-09-12 18:44:25,597 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e51_1442063466801_0001_01_000099 is : 143
      2015-09-12 18:44:25,670 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101
      2015-09-12 18:44:25,670 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_e51_1442063466801_0001_01_000101 from application application_1442063466801_0001
      2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
              at java.lang.Thread.run(Thread.java:745)
      2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1442063466801_0001
      2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
      2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf       OPERATION=Container Finished - Succeeded        TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_1442063466801_0001    CONTAINERID=container_e51_1442063466801_0001_01_000100
      
      

      Analysis

      Looks like for absent container also stopContainer is called

            case CONTAINER_FINISHED:
              LogHandlerContainerFinishedEvent containerFinishEvent =
                  (LogHandlerContainerFinishedEvent) event;
              stopContainer(containerFinishEvent.getContainerId(),
                  containerFinishEvent.getExitCode());
              break;
      

      Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101

      Should skip when null==context.getContainers().get(containerId)

      Attachments

        1. 0001-YARN-4152.patch
          1 kB
          Bibin Chundatt
        2. 0002-YARN-4152.patch
          2 kB
          Bibin Chundatt
        3. 0003-YARN-4152.patch
          4 kB
          Bibin Chundatt

        Issue Links

          Activity

            People

              bibinchundatt Bibin Chundatt
              bibinchundatt Bibin Chundatt
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: