Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5369

Progress for jobs with multiple splits in local mode is wrong

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.20.2
    • 1.2.2
    • None
    • None

    Description

      In case a job with multiple splits is executed in local mode (LocalJobRunner) its progress calculation is wrong.
      After the first split is processed it jumps to 100%, then back to 50% and so on.

      The reason lies in the progress calculation in LocalJobRunner:

            float taskIndex = mapIds.indexOf(taskId);
            if (taskIndex >= 0) {                       // mapping
              float numTasks = mapIds.size();
              status.setMapProgress(taskIndex/numTasks + taskStatus.getProgress()/numTasks);
            } else {
              status.setReduceProgress(taskStatus.getProgress());
            }
      

      The problem is that mapIds is filled lazily in run(). There is an loop over all splits. In the loop, the splits task id is added to mapIds, then the split is processed. That means numTasks is 1 while the first split is processed, it is 2 while the second task is processed and so on...

      I tried Hadoop 0.20.2, 1.0.3, 1.1.2 and cdh-4.1. All the same behaviour!

      Attachments

        Activity

          People

            Unassigned Unassigned
            oae Johannes Zillmann
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: