Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1089

Fair Scheduler preemption triggers NPE when tasks are scheduled but not running

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: contrib/fair-share
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We see exceptions like this when preemption runs when a task has been scheduled on a TT but has not yet started running.

      2009-10-09 14:30:53,989 INFO org.apache.hadoop.mapred.FairScheduler: Should preempt 2 MAP tasks for job_200910091420_0006: tasksDueToMinShare = 2, tasksDueToFairShare = 0
      2009-10-09 14:30:54,036 ERROR org.apache.hadoop.mapred.FairScheduler: Exception in fair scheduler UpdateThread
      java.lang.NullPointerException
      at org.apache.hadoop.mapred.FairScheduler$2.compare(FairScheduler.java:1015)
      at org.apache.hadoop.mapred.FairScheduler$2.compare(FairScheduler.java:1013)
      at java.util.Arrays.mergeSort(Arrays.java:1270)
      at java.util.Arrays.sort(Arrays.java:1210)
      at java.util.Collections.sort(Collections.java:159)
      at org.apache.hadoop.mapred.FairScheduler.preemptTasks(FairScheduler.java:1013)
      at org.apache.hadoop.mapred.FairScheduler.preemptTasksIfNecessary(FairScheduler.java:911)
      at org.apache.hadoop.mapred.FairScheduler$UpdateThread.run(FairScheduler.java:286)

        Activity

        Hide
        Todd Lipcon added a comment -

        Here's a patch to fix this issue. Unfortunately it's not triggered by the unit tests, since it uses a fake JobTracker which doesn't have quite the same behavior as the real one with regard to TaskStatus population. I'd like to get a unit test in that catches this issue before committing, but wanted to gather feedback on this patch now.

        Show
        Todd Lipcon added a comment - Here's a patch to fix this issue. Unfortunately it's not triggered by the unit tests, since it uses a fake JobTracker which doesn't have quite the same behavior as the real one with regard to TaskStatus population. I'd like to get a unit test in that catches this issue before committing, but wanted to gather feedback on this patch now.
        Hide
        Matei Zaharia added a comment -

        The patch looks good, but I agree you should try to include a unit test.

        Show
        Matei Zaharia added a comment - The patch looks good, but I agree you should try to include a unit test.
        Hide
        Todd Lipcon added a comment -

        Hi Matei,

        I spent about 4 hours this morning attempting to get a unit test to demonstrate this behavior. Unfortunately it's very difficult since the current test doesn't separate the scheduling of a task and the beginning of the execution of that task. Despite my best efforts to get this working, I haven't been able to. At this point I can't dedicate any more time to it - how do you feel about committing as is without a test, or giving me some pointers on how to proceed with testing?

        -Todd

        Show
        Todd Lipcon added a comment - Hi Matei, I spent about 4 hours this morning attempting to get a unit test to demonstrate this behavior. Unfortunately it's very difficult since the current test doesn't separate the scheduling of a task and the beginning of the execution of that task. Despite my best efforts to get this working, I haven't been able to. At this point I can't dedicate any more time to it - how do you feel about committing as is without a test, or giving me some pointers on how to proceed with testing? -Todd
        Hide
        Todd Lipcon added a comment -

        Manual test plan: set mapred.fairscheduler.update.interval and mapred.fairscheduler.preemption.interval both to very small numbers like 10ms, and then run some simple jobs. You will see NPEs in the logs with this patch not present.

        Show
        Todd Lipcon added a comment - Manual test plan: set mapred.fairscheduler.update.interval and mapred.fairscheduler.preemption.interval both to very small numbers like 10ms, and then run some simple jobs. You will see NPEs in the logs with this patch not present.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421790/mapreduce-1089.txt
        against trunk revision 827884.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421790/mapreduce-1089.txt against trunk revision 827884. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/196/console This message is automatically generated.
        Hide
        Matei Zaharia added a comment -

        Hey Todd,

        I was going to commit your patch, but I'm getting a lot of fair scheduler test failures even without the patch due to "java.lang.RuntimeException: No queues defined" when initializing the JobTracker. I will wait until whoever broke this fixes it. Do you have any idea what patch caused this? I remember seeing a JIRA about this no queues issue before, but I can't find it.

        Show
        Matei Zaharia added a comment - Hey Todd, I was going to commit your patch, but I'm getting a lot of fair scheduler test failures even without the patch due to "java.lang.RuntimeException: No queues defined" when initializing the JobTracker. I will wait until whoever broke this fixes it. Do you have any idea what patch caused this? I remember seeing a JIRA about this no queues issue before, but I can't find it.
        Hide
        Aaron Kimball added a comment -

        Matei,

        They changed the queues.xml format you need to run a 'git clean -f conf/' or an equivalent svn command to remove non-versioned files from your working directory. 'ant clean' doesn't cut it. Do that then run 'ant compile-core' again and you should be good to go again.

        Show
        Aaron Kimball added a comment - Matei, They changed the queues.xml format you need to run a 'git clean -f conf/' or an equivalent svn command to remove non-versioned files from your working directory. 'ant clean' doesn't cut it. Do that then run 'ant compile-core' again and you should be good to go again.
        Hide
        Matei Zaharia added a comment -

        Thanks Aaron and Todd! I've committed this to trunk and to the 0.21 branch.

        Show
        Matei Zaharia added a comment - Thanks Aaron and Todd! I've committed this to trunk and to the 0.21 branch.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #101 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/101/)
        . Fix NPE in fair scheduler preemption when tasks are
        scheduled but not running. Contributed by Todd Lipcon.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #101 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/101/ ) . Fix NPE in fair scheduler preemption when tasks are scheduled but not running. Contributed by Todd Lipcon.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #127 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/127/)
        . Fix NPE in fair scheduler preemption when tasks are
        scheduled but not running. Contributed by Todd Lipcon.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #127 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/127/ ) . Fix NPE in fair scheduler preemption when tasks are scheduled but not running. Contributed by Todd Lipcon.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development