Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.20.2
-
None
-
None
Description
In case a job with multiple splits is executed in local mode (LocalJobRunner) its progress calculation is wrong.
After the first split is processed it jumps to 100%, then back to 50% and so on.
The reason lies in the progress calculation in LocalJobRunner:
float taskIndex = mapIds.indexOf(taskId); if (taskIndex >= 0) { // mapping float numTasks = mapIds.size(); status.setMapProgress(taskIndex/numTasks + taskStatus.getProgress()/numTasks); } else { status.setReduceProgress(taskStatus.getProgress()); }
The problem is that mapIds is filled lazily in run(). There is an loop over all splits. In the loop, the splits task id is added to mapIds, then the split is processed. That means numTasks is 1 while the first split is processed, it is 2 while the second task is processed and so on...
I tried Hadoop 0.20.2, 1.0.3, 1.1.2 and cdh-4.1. All the same behaviour!