Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      For job does not use OutputCommitter.abort(), this should be able to turn off.
      This improves the latency of the job because failed tasks are often the bottleneck of the jobs.

        Issue Links

          Activity

          Hide
          Scott Chen added a comment -

          The other problem we have observed is that task-cleanup tasks always go to the same node as the failed tasks.
          This is bad because it may go to the same bad node.
          We have seen many times that a task and its cleanup tasks both got timeout on same node.

          Show
          Scott Chen added a comment - The other problem we have observed is that task-cleanup tasks always go to the same node as the failed tasks. This is bad because it may go to the same bad node. We have seen many times that a task and its cleanup tasks both got timeout on same node.
          Hide
          dhruba borthakur added a comment -

          from my understanding, the cleanup task is machine specific and is supposed to go to the same machine where the user's task was run. Maybe a mapred guru can clarify this.

          Show
          dhruba borthakur added a comment - from my understanding, the cleanup task is machine specific and is supposed to go to the same machine where the user's task was run. Maybe a mapred guru can clarify this.
          Hide
          Joydeep Sen Sarma added a comment -

          afaik from looking at the code - there's no requirement for the cleanup to go to the same machine. it happens to go to the same machine because whenever a task reports failed/killed - a slot is freed up and the JT schedules the newly created cleanup task on the same TT. but there's no hard requirement for the same in the code and it's possible that the JT does not schedule it on the same machine (for example if the TT was previously oversubscribed).

          If the failure was because of problems with task localization (for example) - the results are truly miserable. i have hit scenarios where two 10 min task timeouts were required to fail a task (one for the task failure and one for it's cleanup) on a bad node.

          Show
          Joydeep Sen Sarma added a comment - afaik from looking at the code - there's no requirement for the cleanup to go to the same machine. it happens to go to the same machine because whenever a task reports failed/killed - a slot is freed up and the JT schedules the newly created cleanup task on the same TT. but there's no hard requirement for the same in the code and it's possible that the JT does not schedule it on the same machine (for example if the TT was previously oversubscribed). If the failure was because of problems with task localization (for example) - the results are truly miserable. i have hit scenarios where two 10 min task timeouts were required to fail a task (one for the task failure and one for it's cleanup) on a bad node.
          Hide
          Amareshwari Sriramadasu added a comment -

          Joydeep is right with analysis. There is no requirement for the task-cleanup to go to the same machine, because task-cleanup will do cleanup on hdfs. For ex, if the task failed due lost tracker, it cannot go to the same machine.

          Show
          Amareshwari Sriramadasu added a comment - Joydeep is right with analysis. There is no requirement for the task-cleanup to go to the same machine, because task-cleanup will do cleanup on hdfs. For ex, if the task failed due lost tracker, it cannot go to the same machine.
          Hide
          Scott Chen added a comment -

          I think we should avoid always scheduling cleanup task on the same node. I will also make that change in this patch.

          Show
          Scott Chen added a comment - I think we should avoid always scheduling cleanup task on the same node. I will also make that change in this patch.
          Hide
          Scott Chen added a comment -

          I will also make that change in this patch.

          I take back this. I think this is an independent problem. I will open another issue for that.

          Show
          Scott Chen added a comment - I will also make that change in this patch. I take back this. I think this is an independent problem. I will open another issue for that.
          Hide
          Scott Chen added a comment -

          I have open MAPREDUCE-2207 for the problem that task-cleanup task always go to the same node that fails the task.

          Show
          Scott Chen added a comment - I have open MAPREDUCE-2207 for the problem that task-cleanup task always go to the same node that fails the task.
          Hide
          Joydeep Sen Sarma added a comment -

          +1 from my side.

          one thing is that very very few people will be aware that this can be turned off. in particular - i think the default outputformats don't need a task cleanup. i am wondering how this can be turned on automatically for more use cases.

          • we can make the setting a default one in hive-default.xml - i will file a jira for that.
          • how about hadoop streaming? can we turn task cleanup off if hadoop streaming is used with the (default) fileoutputformat?
          Show
          Joydeep Sen Sarma added a comment - +1 from my side. one thing is that very very few people will be aware that this can be turned off. in particular - i think the default outputformats don't need a task cleanup. i am wondering how this can be turned on automatically for more use cases. we can make the setting a default one in hive-default.xml - i will file a jira for that. how about hadoop streaming? can we turn task cleanup off if hadoop streaming is used with the (default) fileoutputformat?
          Hide
          Scott Chen added a comment -

          test-patch

          
               [exec] BUILD SUCCESSFUL
               [exec] Total time: 2 minutes 2 seconds
               [exec]
               [exec]
               [exec]
               [exec]
               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
               [exec]     +1 system test framework.  The patch passed system test framework compile.
               [exec]
               [exec]
               [exec]
               [exec]
               [exec] ======================================================================
               [exec] ======================================================================
               [exec]     Finished build.
               [exec] ======================================================================
               [exec] ======================================================================
          
          Show
          Scott Chen added a comment - test-patch [exec] BUILD SUCCESSFUL [exec] Total time: 2 minutes 2 seconds [exec] [exec] [exec] [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ======================================================================
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12465003/MAPREDUCE-2206.txt
          against trunk revision 1074251.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/49//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/49//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/49//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12465003/MAPREDUCE-2206.txt against trunk revision 1074251. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/49//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/49//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/49//console This message is automatically generated.
          Hide
          Scott Chen added a comment -

          The failed contrib tests are unrelated.

          I have committed this. Thanks for the review, Joydeep.

          Show
          Scott Chen added a comment - The failed contrib tests are unrelated. I have committed this. Thanks for the review, Joydeep.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #616 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/616/)
          MAPREDUCE-2206. The task-cleanup tasks should be optional. (schen)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #616 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/616/ ) MAPREDUCE-2206 . The task-cleanup tasks should be optional. (schen)
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )

            People

            • Assignee:
              Scott Chen
              Reporter:
              Scott Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development