Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1845

FairScheduler.tasksToPeempt() can return negative number

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0, 0.22.0
    • Component/s: contrib/fair-share
    • Labels:
      None

      Description

      This method can return negative number. This will cause the preemption to under-preempt.
      The bug was discovered by Joydeep.

      1. MAPREDUCE-1845.20100717.txt
        4 kB
        Scott Chen
      2. MAPREDUCE-1845-v2.txt
        5 kB
        Scott Chen

        Activity

        Hide
        Scott Chen added a comment -

        I am changing this to major because this is a serious problem for preemption.
        The jobs which are supposed to be preempt (running task > fairshare) will actually generate negative tasksToPreempt.
        That makes the total number of tasks to preemption go down.
        This actually makes them able to escape from preemption.

        Show
        Scott Chen added a comment - I am changing this to major because this is a serious problem for preemption. The jobs which are supposed to be preempt (running task > fairshare) will actually generate negative tasksToPreempt. That makes the total number of tasks to preemption go down. This actually makes them able to escape from preemption.
        Hide
        Scott Chen added a comment -

        The patch simply check if tasksToPreempt is negative and set it back to zero.
        The included unit test produced a negative number of tasksToPreempt before adding the fix.

        Show
        Scott Chen added a comment - The patch simply check if tasksToPreempt is negative and set it back to zero. The included unit test produced a negative number of tasksToPreempt before adding the fix.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12447520/MAPREDUCE-1845.20100717.txt
        against trunk revision 956171.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447520/MAPREDUCE-1845.20100717.txt against trunk revision 956171. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/579/console This message is automatically generated.
        Hide
        Scott Chen added a comment -

        I am submitting this to hudson again because testReport is gone.

        Show
        Scott Chen added a comment - I am submitting this to hudson again because testReport is gone.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12447520/MAPREDUCE-1845.20100717.txt
        against trunk revision 957126.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447520/MAPREDUCE-1845.20100717.txt against trunk revision 957126. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/261/console This message is automatically generated.
        Hide
        Scott Chen added a comment -

        The failed tests are not on the code path of the change.

        Show
        Scott Chen added a comment - The failed tests are not on the code path of the change.
        Hide
        Matei Zaharia added a comment -

        This looks good. Thanks for adding the unit tests too. We should check this into 0.21 as well, if that's not out yet.

        The only concern I have is that the existing unit tests, such as testMinAndFairSharePreemption, work correctly. This seems to be because those pools either only test one type of preemption (min-share or fair-share), or they place the over-scheduled jobs in pools that have no min share set. This means that one of the values in max(tasksDueToMinShare, tasksDueToFairShare) is zero. Would you mind creating a second copy of testMinAndFairSharePreemption where job 1 is in a pool with a min share set (i.e. not in the default pool)?

        A minor comment on clarity: rather than adding the line "tasksToPreempt = tasksToPreempt < 0 ? 0 : tasksToPreempt", it would be better to make sure that tasksDueToMinShare and tasksDueToFairShare are themselves never negative. You can do it by adding a max(0, ...) on the lines where they are computed (for example, tasksDueToMinShare = Math.max(0, target - sched.getRunningTasks())).

        Show
        Matei Zaharia added a comment - This looks good. Thanks for adding the unit tests too. We should check this into 0.21 as well, if that's not out yet. The only concern I have is that the existing unit tests, such as testMinAndFairSharePreemption, work correctly. This seems to be because those pools either only test one type of preemption (min-share or fair-share), or they place the over-scheduled jobs in pools that have no min share set. This means that one of the values in max(tasksDueToMinShare, tasksDueToFairShare) is zero. Would you mind creating a second copy of testMinAndFairSharePreemption where job 1 is in a pool with a min share set (i.e. not in the default pool)? A minor comment on clarity: rather than adding the line "tasksToPreempt = tasksToPreempt < 0 ? 0 : tasksToPreempt", it would be better to make sure that tasksDueToMinShare and tasksDueToFairShare are themselves never negative. You can do it by adding a max(0, ...) on the lines where they are computed (for example, tasksDueToMinShare = Math.max(0, target - sched.getRunningTasks())).
        Hide
        Scott Chen added a comment -

        Thanks.
        Good suggestions. I will update the patch.

        Show
        Scott Chen added a comment - Thanks. Good suggestions. I will update the patch.
        Hide
        Scott Chen added a comment -

        I have carefully read the code. I think this bug is not that easy to trigger.
        Because when update tasksDueToMinShare and tasksDueToFairShare, tasksToPreempt will check

        curTime - sched.getLastTimeAtMinShare() > minShareTimeout
        curTime - sched.getLastTimeAtHalfFairShare() > fairShareTimeout
        

        So they may turn negative only when they got the timeout and over-scheduled at the same time.
        And this two thing must happen between update() and preemptIfNecessary():

        update()
        // spent lots of time here so we get timeout
        // get over-scheduled tasks
        preemptIfNecessary()
        

        It is not likely to happen unless people set starving timeout to be very short.
        But it is still good to fix it.
        I will write a unit test to simulate this situation.

        Show
        Scott Chen added a comment - I have carefully read the code. I think this bug is not that easy to trigger. Because when update tasksDueToMinShare and tasksDueToFairShare, tasksToPreempt will check curTime - sched.getLastTimeAtMinShare() > minShareTimeout curTime - sched.getLastTimeAtHalfFairShare() > fairShareTimeout So they may turn negative only when they got the timeout and over-scheduled at the same time. And this two thing must happen between update() and preemptIfNecessary(): update() // spent lots of time here so we get timeout // get over-scheduled tasks preemptIfNecessary() It is not likely to happen unless people set starving timeout to be very short. But it is still good to fix it. I will write a unit test to simulate this situation.
        Hide
        Scott Chen added a comment -

        Update. Change the patch according to Matei's suggestions.

        Show
        Scott Chen added a comment - Update. Change the patch according to Matei's suggestions.
        Hide
        Scott Chen added a comment -

        Submit to Hudson again.

        Show
        Scott Chen added a comment - Submit to Hudson again.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12448140/MAPREDUCE-1845-v2.txt
        against trunk revision 958279.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448140/MAPREDUCE-1845-v2.txt against trunk revision 958279. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/270/console This message is automatically generated.
        Hide
        Matei Zaharia added a comment -

        +1, looks good to me. Let me know if you are ready to have it committed (or just have Dhruba commit it).

        Show
        Matei Zaharia added a comment - +1, looks good to me. Let me know if you are ready to have it committed (or just have Dhruba commit it).
        Hide
        Scott Chen added a comment -

        @Matei, The patch is ready. Could you help me commit it? Thanks.

        Show
        Scott Chen added a comment - @Matei, The patch is ready. Could you help me commit it? Thanks.
        Hide
        Matei Zaharia added a comment -

        I've committed the patch to both trunk and 0.21. Thanks Scott!

        Show
        Matei Zaharia added a comment - I've committed the patch to both trunk and 0.21. Thanks Scott!
        Hide
        Scott Chen added a comment -

        Thanks for the help, Matei

        Show
        Scott Chen added a comment - Thanks for the help, Matei

          People

          • Assignee:
            Scott Chen
            Reporter:
            Scott Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development