Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3926

No information of unfinished map task in Job History, if all attempts of another map task fail.

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.20.205.0
    • Fix Version/s: None
    • Component/s: jobtracker
    • Labels:
      None

      Description

      No information of unfinished map task in Job History, if all attempts of another map task fail.

      For example,
      1. The first map task's first attempt m_000000_0 was making progress

      2. The second map task failed 4 times, before completion of first map task attempt.

      3. Hence, a job cleanup task was launched and completed, before completion of first map task attempt.

      4. After job cleanup task, runningMapCache is cleaned

      completedTask() -> jobComplete() -> garbageCollect() ->  this.runningMapCache = null;
                 |-----> retireMap() -> if (runningMapCache == null) "Running cache for maps missing!! Job details are missing."
      

      5. Hence, "Running cache for maps missing!! Job details are missing." error comes
      (from retireMap() which is called after jobComplete() ) and no information is
      added further to Job History. Therefore, first map task's information is
      missing from Job History page.

      I have created a sample streaming MR job, to reproduce this issue.

      mapper.sh
      #!/bin/bash
      read line
      if [[ "$line" == "sleep" ]]
      then
          for i in 1 2 3
          do
              echo "Sleeping" >&2
              sleep 5
          done
          exit 0
      else
          echo "Exiting" >&2
          exit -1
      fi
      

      Input file: in1.txt is for long running map task (here first map task)

      /user/mitesh/input/in1.txt
      sleep
      

      Input file: in2.txt is for failing map task (here second map task)

      /user/mitesh/input/in2.txt
      exit
      

      Running the sample streaming MR job.

      $ hadoop fs -rmr -skipTrash xyz
      $ hadoop jar $HADOOP_INSTALL/hadoop-streaming.jar -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7 -Dmapred.map.tasks=2 -mapper "mapper.sh" -file mapper.sh -reducer NONE -input /user/mitesh/input/in1.txt -input /user/mitesh/input/in2.txt -output xyz
      

      Job History web UI

      Hadoop Job job_201201310454_542302 on History Viewer
      User: mitesh
      JobName: streamjob7439640883203077520.jar
      JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml
      Job-ACLs:
          mapreduce.job.acl-view-job: No users are allowed
          mapreduce.job.acl-modify-job: No users are allowed
      Submitted At: 27-Feb-2012 12:56:02
      Launched At: 27-Feb-2012 12:56:11 (8sec)
      Finished At: 27-Feb-2012 12:56:31 (20sec)
      Status: FAILED
      Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201201310454_542302_m_000001
      Analyse This Job
      Kind	Total Tasks(successful+failed+killed)	Successful tasks	Failed tasks	Killed tasks	Start Time	Finish Time
      Setup 	1 	1 	0 	0 	27-Feb-2012 12:56:12 	27-Feb-2012 12:56:16 (4sec)
      Map 	2 	0 	2 	0 	27-Feb-2012 12:56:16 	27-Feb-2012 12:56:26 (10sec)
      Reduce 	0 	0 	0 	0 		
      Cleanup 	1 	1 	0 	0 	27-Feb-2012 12:56:26 	27-Feb-2012 12:56:31 (4sec)
      

      Above it shows, only 2 failed tasks (belong to second map task).
      Only from JT logs, the task tracker of first map task can be found.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              miteshsjat Mitesh Singh Jat
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: