Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1398

TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixed TaskLauncher to stop waiting for blocking slots, for a TIP that is killed / failed while it is in queue.

      Description

      Tasks could be assigned to trackers for slots that are running other tasks in a commit pending state. This is an optimization done to pipeline task assignment and launch. When the task reaches the tracker, it waits until sufficient slots become free for it. This wait is done in the TaskLauncher thread. Now, while waiting, if the task is killed externally (maybe because the job finishes, etc), the TaskLauncher is not notified of this. So, it continues to wait for the killed task to get sufficient slots. If slots do not become free for a long time, this would result in considerable delay in waking up the TaskLauncher thread. If the waiting task happens to be a high RAM task, then it is also wasteful, because by waking up, it can make way for normal tasks that can run on the available number of slots.

      1. patch-1398-ydist.txt
        11 kB
        Amareshwari Sriramadasu
      2. patch-1398-2.txt
        12 kB
        Amareshwari Sriramadasu
      3. patch-1398-1.txt
        11 kB
        Amareshwari Sriramadasu
      4. patch-1398.txt
        11 kB
        Amareshwari Sriramadasu
      5. mr-1398-y20.patch
        11 kB
        Hemanth Yamijala

        Activity

        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314045 ]
        Fix Version/s 0.22.0 [ 12314184 ]
        Hemanth Yamijala made changes -
        Attachment mr-1398-y20.patch [ 12436724 ]
        Hide
        Hemanth Yamijala added a comment -

        Updated patch for earlier version of Hadoop. Not for commit here.

        Show
        Hemanth Yamijala added a comment - Updated patch for earlier version of Hadoop. Not for commit here.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #242 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/242/)
        . Fix TaskLauncher to stop waiting for slots on a TIP that is killed / failed. Contributed by Amareshwari Sriramadasu.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #242 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/242/ ) . Fix TaskLauncher to stop waiting for slots on a TIP that is killed / failed. Contributed by Amareshwari Sriramadasu.
        Amareshwari Sriramadasu made changes -
        Release Note Fixed TaskLauncher to stop waiting for blocking slots, for a TIP that is killed / failed while it is in queue.
        Hemanth Yamijala made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Amareshwari !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Amareshwari !
        Hide
        Hemanth Yamijala added a comment -

        The patch did not introduce any new unsynchronized code.

        I verified this. Also, per comment from Chris in MAPREDUCE-1497, IndexCache is probably already thead-safe. Hence, this might be a legitimate case of suppressing findbugs warning. Based on this, I think the patch is ready for commit.

        Show
        Hemanth Yamijala added a comment - The patch did not introduce any new unsynchronized code. I verified this. Also, per comment from Chris in MAPREDUCE-1497 , IndexCache is probably already thead-safe. Hence, this might be a legitimate case of suppressing findbugs warning. Based on this, I think the patch is ready for commit.
        Amareshwari Sriramadasu made changes -
        Attachment patch-1398-ydist.txt [ 12435964 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch for Yahoo! distribution.
        Ran ant test and test-patch. test-patch failed because of MAPREDUCE-1497. All unit tests passed except TestNodeRefresh (due to MAPREDUCE-677). TestNodeRefresh passed when I reran the test.

        Show
        Amareshwari Sriramadasu added a comment - Patch for Yahoo! distribution. Ran ant test and test-patch. test-patch failed because of MAPREDUCE-1497 . All unit tests passed except TestNodeRefresh (due to MAPREDUCE-677 ). TestNodeRefresh passed when I reran the test.
        Hide
        Amareshwari Sriramadasu added a comment -

        -1 findbugs.

        The patch did not introduce any new unsynchronized code. TaskTracker.indexCache is already accessed without synchronization from MapOutputServlet. The patch introduced a method setIndexCache which is called from a synchronized method, findbugs is complaining because percentage of synchronization got modified. I raised MAPREDUCE-1497 to address the same

        Show
        Amareshwari Sriramadasu added a comment - -1 findbugs. The patch did not introduce any new unsynchronized code. TaskTracker.indexCache is already accessed without synchronization from MapOutputServlet. The patch introduced a method setIndexCache which is called from a synchronized method, findbugs is complaining because percentage of synchronization got modified. I raised MAPREDUCE-1497 to address the same
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12435943/patch-1398-2.txt
        against trunk revision 910223.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435943/patch-1398-2.txt against trunk revision 910223. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/455/console This message is automatically generated.
        Amareshwari Sriramadasu made changes -
        Fix Version/s 0.22.0 [ 12314184 ]
        Amareshwari Sriramadasu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Amareshwari Sriramadasu made changes -
        Attachment patch-1398-2.txt [ 12435943 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch with comments incorporated.

        The default value for taskMemoryManagerEnabled was changed in the patch which seemed unnecessary. Can we instead override isTaskMemoryManagerEnabled, if we just want to short circuit this in the test case ?

        Instead of overriding isTaskMemoryManagerEnabled(), I made setTaskMemoryManagerEnabledFlag() method package private and called it from testcase to turn off memory management.

        Show
        Amareshwari Sriramadasu added a comment - Patch with comments incorporated. The default value for taskMemoryManagerEnabled was changed in the patch which seemed unnecessary. Can we instead override isTaskMemoryManagerEnabled, if we just want to short circuit this in the test case ? Instead of overriding isTaskMemoryManagerEnabled(), I made setTaskMemoryManagerEnabledFlag() method package private and called it from testcase to turn off memory management.
        Amareshwari Sriramadasu made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Hemanth Yamijala added a comment -

        Looks good overall. Few minor nits:

        • The default value for taskMemoryManagerEnabled was changed in the patch which seemed unnecessary. Can we instead override isTaskMemoryManagerEnabled, if we just want to short circuit this in the test case ?
        • We can use the setIndexCache API to initialize the index cache in the main TT also.
        • It would be really helpful to add a log line where we break the TaskLauncher's loop on detecting the killed condition.

        Can you please make these changes and make the patch run through Hudson ?

        Show
        Hemanth Yamijala added a comment - Looks good overall. Few minor nits: The default value for taskMemoryManagerEnabled was changed in the patch which seemed unnecessary. Can we instead override isTaskMemoryManagerEnabled, if we just want to short circuit this in the test case ? We can use the setIndexCache API to initialize the index cache in the main TT also. It would be really helpful to add a log line where we break the TaskLauncher's loop on detecting the killed condition. Can you please make these changes and make the patch run through Hudson ?
        Amareshwari Sriramadasu made changes -
        Attachment patch-1398-1.txt [ 12435935 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Added one more assertion to the testcase.

        Show
        Amareshwari Sriramadasu added a comment - Added one more assertion to the testcase.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12435570/patch-1398.txt
        against trunk revision 909241.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435570/patch-1398.txt against trunk revision 909241. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/317/console This message is automatically generated.
        Amareshwari Sriramadasu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Amareshwari Sriramadasu [ amareshwari ]
        Amareshwari Sriramadasu made changes -
        Field Original Value New Value
        Attachment patch-1398.txt [ 12435570 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch fixing the bug. Added a testcase which fails without the patch and passes with the patch.

        Show
        Amareshwari Sriramadasu added a comment - Patch fixing the bug. Added a testcase which fails without the patch and passes with the patch.
        Hemanth Yamijala created issue -

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development