[MAPREDUCE-3926] No information of unfinished map task in Job History, if all attempts of another map task fail. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 0.20.205.0
Fix Version/s: None
Component/s: jobtracker
Labels:
None

Description

No information of unfinished map task in Job History, if all attempts of another map task fail.

For example,
1. The first map task's first attempt m_000000_0 was making progress

2. The second map task failed 4 times, before completion of first map task attempt.

3. Hence, a job cleanup task was launched and completed, before completion of first map task attempt.

4. After job cleanup task, runningMapCache is cleaned

completedTask() -> jobComplete() -> garbageCollect() ->  this.runningMapCache = null;
           |-----> retireMap() -> if (runningMapCache == null) "Running cache for maps missing!! Job details are missing."

5. Hence, "Running cache for maps missing!! Job details are missing." error comes
(from retireMap() which is called after jobComplete() ) and no information is
added further to Job History. Therefore, first map task's information is
missing from Job History page.

I have created a sample streaming MR job, to reproduce this issue.

mapper.sh

#!/bin/bash
read line
if [[ "$line" == "sleep" ]]
then
    for i in 1 2 3
    do
        echo "Sleeping" >&2
        sleep 5
    done
    exit 0
else
    echo "Exiting" >&2
    exit -1
fi

Input file: in1.txt is for long running map task (here first map task)

/user/mitesh/input/in1.txt

sleep

Input file: in2.txt is for failing map task (here second map task)

/user/mitesh/input/in2.txt

exit

Running the sample streaming MR job.

$ hadoop fs -rmr -skipTrash xyz
$ hadoop jar $HADOOP_INSTALL/hadoop-streaming.jar -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7 -Dmapred.map.tasks=2 -mapper "mapper.sh" -file mapper.sh -reducer NONE -input /user/mitesh/input/in1.txt -input /user/mitesh/input/in2.txt -output xyz

Job History web UI

Hadoop Job job_201201310454_542302 on History Viewer
User: mitesh
JobName: streamjob7439640883203077520.jar
JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml
Job-ACLs:
    mapreduce.job.acl-view-job: No users are allowed
    mapreduce.job.acl-modify-job: No users are allowed
Submitted At: 27-Feb-2012 12:56:02
Launched At: 27-Feb-2012 12:56:11 (8sec)
Finished At: 27-Feb-2012 12:56:31 (20sec)
Status: FAILED
Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201201310454_542302_m_000001
Analyse This Job
Kind	Total Tasks(successful+failed+killed)	Successful tasks	Failed tasks	Killed tasks	Start Time	Finish Time
Setup 	1 	1 	0 	0 	27-Feb-2012 12:56:12 	27-Feb-2012 12:56:16 (4sec)
Map 	2 	0 	2 	0 	27-Feb-2012 12:56:16 	27-Feb-2012 12:56:26 (10sec)
Reduce 	0 	0 	0 	0 		
Cleanup 	1 	1 	0 	0 	27-Feb-2012 12:56:26 	27-Feb-2012 12:56:31 (4sec)

Above it shows, only 2 failed tasks (belong to second map task).
Only from JT logs, the task tracker of first map task can be found.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Mitesh Singh Jat

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/Feb/12 13:11

Updated:: 28/Feb/12 10:29