Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3362

Job always stay at 'Pending' status and cannot finish several days

    XMLWordPrintableJSON

Details

    Description

      Our jobs are always keeping at 'pending' status several days. We checked jobtracker log and found that one task(attemp) failed due to failure to store job history to HDFS.

      The issue begins from the business that another job remove the folder that this job is being written with history log. In this case, there has 'ConcurrentModificationException' at JobHistory#log(ArrayList<PrintWriter> writers, RecordTypes recordType, Keys[] keys, String[] values, JobID id). One thread checked if there has any output error and removed output with history folder at HDFS has been removed, another thread got 'ConcurrentModificationException' at current 'writers' is blank.

      Unfortunately, no one want to catch this exception and it go thought to TaskTracker(it jump over the calculating part to add 'finishedMapTask'). Then, another task(attemp) runs from 'failedMap' successfully, but the total 'finishedMapTask' number is not the all finishedMapTask. JobCleanupTask cannot startup and job always stay at 'pending' status.

      The root cause:
      First task(attemp) failed with exception and this task add to 'failedMap' with decrease the 'finishedMap' counter. Next task(attemp) runs successfully and increase one for 'finishedMap'. Due to failure the total 'finishedMap' is less that actual finishedMap counter, so the cleanup task cannot runs.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dennyy Denny Ye
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: