Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4152

NM crash with NPE when LogAggregationService#stopContainer called for absent container

    Details

    • Hadoop Flags:
      Reviewed

      Description

      NM crash during of log aggregation.
      Ran Pi job with 500 container and killed application in between

      Logs

      2015-09-12 18:44:25,597 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e51_1442063466801_0001_01_000099 is : 143
      2015-09-12 18:44:25,670 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101
      2015-09-12 18:44:25,670 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_e51_1442063466801_0001_01_000101 from application application_1442063466801_0001
      2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
              at java.lang.Thread.run(Thread.java:745)
      2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1442063466801_0001
      2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
      2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf       OPERATION=Container Finished - Succeeded        TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_1442063466801_0001    CONTAINERID=container_e51_1442063466801_0001_01_000100
      
      

      Analysis

      Looks like for absent container also stopContainer is called

            case CONTAINER_FINISHED:
              LogHandlerContainerFinishedEvent containerFinishEvent =
                  (LogHandlerContainerFinishedEvent) event;
              stopContainer(containerFinishEvent.getContainerId(),
                  containerFinishEvent.getExitCode());
              break;
      

      Event EventType: KILL_CONTAINER sent to absent container container_e51_1442063466801_0001_01_000101

      Should skip when null==context.getContainers().get(containerId)

        Attachments

        1. 0003-YARN-4152.patch
          4 kB
          Bibin A Chundatt
        2. 0002-YARN-4152.patch
          2 kB
          Bibin A Chundatt
        3. 0001-YARN-4152.patch
          1 kB
          Bibin A Chundatt

          Issue Links

            Activity

              People

              • Assignee:
                bibinchundatt Bibin A Chundatt
                Reporter:
                bibinchundatt Bibin A Chundatt
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: