Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
0.20.205.0
-
None
-
None
Description
No information of unfinished map task in Job History, if all attempts of another map task fail.
For example,
1. The first map task's first attempt m_000000_0 was making progress
2. The second map task failed 4 times, before completion of first map task attempt.
3. Hence, a job cleanup task was launched and completed, before completion of first map task attempt.
4. After job cleanup task, runningMapCache is cleaned
completedTask() -> jobComplete() -> garbageCollect() -> this.runningMapCache = null; |-----> retireMap() -> if (runningMapCache == null) "Running cache for maps missing!! Job details are missing."
5. Hence, "Running cache for maps missing!! Job details are missing." error comes
(from retireMap() which is called after jobComplete() ) and no information is
added further to Job History. Therefore, first map task's information is
missing from Job History page.
I have created a sample streaming MR job, to reproduce this issue.
#!/bin/bash read line if [[ "$line" == "sleep" ]] then for i in 1 2 3 do echo "Sleeping" >&2 sleep 5 done exit 0 else echo "Exiting" >&2 exit -1 fi
Input file: in1.txt is for long running map task (here first map task)
sleep
Input file: in2.txt is for failing map task (here second map task)
exit
Running the sample streaming MR job.
$ hadoop fs -rmr -skipTrash xyz $ hadoop jar $HADOOP_INSTALL/hadoop-streaming.jar -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7 -Dmapred.map.tasks=2 -mapper "mapper.sh" -file mapper.sh -reducer NONE -input /user/mitesh/input/in1.txt -input /user/mitesh/input/in2.txt -output xyz
Job History web UI
Hadoop Job job_201201310454_542302 on History Viewer User: mitesh JobName: streamjob7439640883203077520.jar JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml Job-ACLs: mapreduce.job.acl-view-job: No users are allowed mapreduce.job.acl-modify-job: No users are allowed Submitted At: 27-Feb-2012 12:56:02 Launched At: 27-Feb-2012 12:56:11 (8sec) Finished At: 27-Feb-2012 12:56:31 (20sec) Status: FAILED Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201201310454_542302_m_000001 Analyse This Job Kind Total Tasks(successful+failed+killed) Successful tasks Failed tasks Killed tasks Start Time Finish Time Setup 1 1 0 0 27-Feb-2012 12:56:12 27-Feb-2012 12:56:16 (4sec) Map 2 0 2 0 27-Feb-2012 12:56:16 27-Feb-2012 12:56:26 (10sec) Reduce 0 0 0 0 Cleanup 1 1 0 0 27-Feb-2012 12:56:26 27-Feb-2012 12:56:31 (4sec)
Above it shows, only 2 failed tasks (belong to second map task).
Only from JT logs, the task tracker of first map task can be found.