Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2617

NM does not need to send finished container whose APP is not running to RM

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.6.0
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We(Chun Chen) are testing RM work preserving restart and found the following logs when we ran a simple MapReduce task "PI". NM continuously reported completed containers whose Application had already finished while AM had finished.

      2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
      

      In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean up already completed applications. But it will only remove appId from 'app.context.getApplications()' when ApplicaitonImpl received evnet 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might receive this event for a long time or could not receive.

      • For NonAggregatingLogHandler, it wait for YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, then it will be scheduled to delete Application logs and send the event.
      • For LogAggregationService, it might fail(e.g. if user does not have HDFS write permission), and it will not send the event.

        Attachments

        1. YARN-2617.patch
          3 kB
          Jun Gong
        2. YARN-2617.2.patch
          7 kB
          Jun Gong
        3. YARN-2617.3.patch
          7 kB
          Jun Gong
        4. YARN-2617.4.patch
          7 kB
          Jun Gong
        5. YARN-2617.5.patch
          7 kB
          Jian He
        6. YARN-2617.5.patch
          7 kB
          Jian He
        7. YARN-2617.5.patch
          7 kB
          Jian He
        8. YARN-2617.6.patch
          7 kB
          Jian He

          Issue Links

            Activity

              People

              • Assignee:
                hex108 Jun Gong
                Reporter:
                hex108 Jun Gong
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: