|
Attaching patch for 0.18
Patch looks good. Its good to use TaskCommitThread for doing the FileSystem operations. The only concern is that PENDING tasks that are waiting for commit will now contend with KILLED/FAILED tasks and we might run into the risk of slowing down the commit rate. The case will be visible when one job (job1) finishes while others (job2) are in middle where job1's speculative tasks might now fill up the commit queue. Hence I feel this should be benchmarked or tested to see if the interference level is less and PENDING tasks are not kept waiting for long slowing down the commit rate. The other question is whether or not we want to do it for 0.18?
If job1 has finished, it will be a no-op in TaskCommitThread, since the higher lever directory will be deleted in garbage collect. It will still do a cleanup for corner cases described at http://issues.apache.org/jira/browse/HADOOP-2391?focusedCommentId=12566183#action_12566183 I also ran sort benchmark with and without the patch. And I didnt see any performance difference, For 0.19 (and the trunk), I am worried about the approach where we spawn a commit task for every successful task. The load on the JT in terms of memory among other things would be simply doubled. Of course, there are other issues like the need to associate the commit task with the real task (since a task's success/failure would be governed by the success/failure of its commit task), and so on..
So for 0.19 (and the trunk), I propose that we keep the existing model of running the commit task (OutputCommitter.commitTask) in the same task context and just spawn new tasks for cleaning up the outputs of failed/killed tasks (for running OutputCommitter.abortTask). Presumably the number of failed tasks won't be many and hence the load on the JT shouldn't increase that much. Also, we can probably stop launching cleanup tasks for each failed/killed task once the job succeeds/fails (which also means that the job level cleanup task (OutputCommitted.cleanupJob) has run), with the assumption that the job level cleanup has cleaned all garbage up. Thoughts? Should this jira be used for 18.3 alone? and shall we have another jira for 19.1 and trunk?
+1 to making a separate jira for 0.19/trunk.
Attaching patch after removing unnecessary fs. exists call from earlier patch.
test-patch result on branch 0.18: [exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
All core and contrib tests passed on 0.18. I just committed this to the 0.18 branch. Thanks, Amareshwari!
I created
Hi Amareshwari, do you think that this patch is applicable to the hadoop 0.17 branch as well? (This is just a question and not a request to apply it to any branch).
It applies to 0.17, if you replace src/mapred with src/java in the patch.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Till 0.18.X, the task commit is done by TaskCommitThread in JT. We can have functionality added here if needed for 0.18.
After
HADOOP-3150, it is exposed to the user. Now, each task does commit at the end of it's execution. To remove temporary output directory of failed/killed tasks as soon as they fail, we should consider the following:1. Failure/Kill can be anywhere between 'launching the task' to 'commiting the task'
2. Failure/Kill can be because of KillTaskAction or Exception/Error
Owen's suggestion on
to have task commit as separate task looks like the right approach here.
HADOOP-3150at http://issues.apache.org/jira/browse/HADOOP-3150?focusedCommentId=12626372#action_12626372and http://issues.apache.org/jira/browse/HADOOP-3150?focusedCommentId=12628736#action_12628736
For successful tasks, a commit task will be launched for tha task commit. For failed/killed tasks, an abort task will be launched for the task cleanup.
Thoughts?