[YARN-2617] NM does not need to send finished container whose APP is not running to RM - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.6.0
Component/s: nodemanager
Labels:
None

Target Version/s:

2.6.0
Hadoop Flags:

Reviewed

Description

We(chenchun) are testing RM work preserving restart and found the following logs when we ran a simple MapReduce task "PI". NM continuously reported completed containers whose Application had already finished while AM had finished.

2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
2014-09-26 17:00:42,228 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
2014-09-26 17:00:43,230 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
2014-09-26 17:00:44,233 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...

In the patch for ~~YARN-1372~~, ApplicationImpl on NM should guarantee to clean up already completed applications. But it will only remove appId from 'app.context.getApplications()' when ApplicaitonImpl received evnet 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might receive this event for a long time or could not receive.

For NonAggregatingLogHandler, it wait for YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, then it will be scheduled to delete Application logs and send the event.
For LogAggregationService, it might fail(e.g. if user does not have HDFS write permission), and it will not send the event.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-2617.2.patch
29/Sep/14 13:32
7 kB
Jun Gong
YARN-2617.3.patch
30/Sep/14 13:27
7 kB
Jun Gong
YARN-2617.4.patch
01/Oct/14 02:36
7 kB
Jun Gong
YARN-2617.5.patch
02/Oct/14 00:59
7 kB
Jian He
YARN-2617.5.patch
01/Oct/14 23:53
7 kB
Jian He
YARN-2617.5.patch
01/Oct/14 20:03
7 kB
Jian He
YARN-2617.6.patch
02/Oct/14 01:22
7 kB
Jian He
YARN-2617.patch
29/Sep/14 02:09
3 kB
Jun Gong

Issue Links

is duplicated by

YARN-2612 Some completed containers are not reported to NM

Closed

Activity

People

Assignee:: Jun Gong

Reporter:: Jun Gong

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Sep/14 02:08

Updated:: 01/Dec/14 03:09

Resolved:: 02/Oct/14 17:10