Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2617

NM does not need to send finished container whose APP is not running to RM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 2.6.0
    • nodemanager
    • None
    • Reviewed

    Description

      We(chenchun) are testing RM work preserving restart and found the following logs when we ran a simple MapReduce task "PI". NM continuously reported completed containers whose Application had already finished while AM had finished.

      2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      

      In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean up already completed applications. But it will only remove appId from 'app.context.getApplications()' when ApplicaitonImpl received evnet 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might receive this event for a long time or could not receive.

      • For NonAggregatingLogHandler, it wait for YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, then it will be scheduled to delete Application logs and send the event.
      • For LogAggregationService, it might fail(e.g. if user does not have HDFS write permission), and it will not send the event.

      Attachments

        1. YARN-2617.patch
          3 kB
          Jun Gong
        2. YARN-2617.2.patch
          7 kB
          Jun Gong
        3. YARN-2617.3.patch
          7 kB
          Jun Gong
        4. YARN-2617.4.patch
          7 kB
          Jun Gong
        5. YARN-2617.5.patch
          7 kB
          Jian He
        6. YARN-2617.5.patch
          7 kB
          Jian He
        7. YARN-2617.5.patch
          7 kB
          Jian He
        8. YARN-2617.6.patch
          7 kB
          Jian He

        Issue Links

          Activity

            People

              hex108 Jun Gong
              hex108 Jun Gong
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: