Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2638

Create a simple stress test for the fair scheduler

    Details

    • Type: Test
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib/fair-share
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      This would be a test that runs against a cluster, typically with settings that allow preemption to be exercised.

      1. MAPREDUCE-2638.patch
        19 kB
        Tom White
      2. MAPREDUCE-2638.patch
        8 kB
        Tom White

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12485714/MAPREDUCE-2638.patch
        against trunk revision 82db334.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5254//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485714/MAPREDUCE-2638.patch against trunk revision 82db334. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5254//console This message is automatically generated.
        Hide
        vito256 David Ginzburg added a comment -

        I'm not sure this is related to this issue, but I suspect preemption causes
        Inconsistent result when fairscheduler preemption is on.
        I have reported it at http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201201.mbox/%3CSNT135-W231F0B5F0D6A3AE0A5EDC9B79F0@phx.gbl%3E

        Maybe this stress test can help reproduce it easily, on clusters other than the one I'm running.

        Show
        vito256 David Ginzburg added a comment - I'm not sure this is related to this issue, but I suspect preemption causes Inconsistent result when fairscheduler preemption is on. I have reported it at http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201201.mbox/%3CSNT135-W231F0B5F0D6A3AE0A5EDC9B79F0@phx.gbl%3E Maybe this stress test can help reproduce it easily, on clusters other than the one I'm running.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12485714/MAPREDUCE-2638.patch
        against trunk revision 1148421.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 11 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestMRCLI
        org.apache.hadoop.fs.TestFileSystem
        org.apache.hadoop.mapred.TestDebugScript

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/483//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/483//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/483//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485714/MAPREDUCE-2638.patch against trunk revision 1148421. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.fs.TestFileSystem org.apache.hadoop.mapred.TestDebugScript -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/483//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/483//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/483//console This message is automatically generated.
        Hide
        matei Matei Zaharia added a comment -

        OK, that makes sense. +1 to commit this then.

        Show
        matei Matei Zaharia added a comment - OK, that makes sense. +1 to commit this then.
        Hide
        tomwhite Tom White added a comment -

        Thanks Matei. The preemption intervals are indeed very low - they are set like this in order to trigger preemption in a pseudo-distributed cluster and so stress the scheduler. For larger clusters the settings you suggest are entirely appropriate, as well as increasing the sleep time in the jobs by setting test.fairscheduler.sleepTime to a higher value.

        Show
        tomwhite Tom White added a comment - Thanks Matei. The preemption intervals are indeed very low - they are set like this in order to trigger preemption in a pseudo-distributed cluster and so stress the scheduler. For larger clusters the settings you suggest are entirely appropriate, as well as increasing the sleep time in the jobs by setting test.fairscheduler.sleepTime to a higher value.
        Hide
        matei Matei Zaharia added a comment -

        Hi Tom,

        This looks good, except that the preemption intervals you've set are very low. In my experience it can take a while for Hadoop to preempt a task; when you call killTask(), it must wait for the next heartbeat from the task's node, send it a KillTaskAction, and then wait for another heartbeat back to hear that it's gone. In a larger cluster, this might be more than 10 seconds. I would set this to 30 seconds. This is the same thing I recommend for users.

        Show
        matei Matei Zaharia added a comment - Hi Tom, This looks good, except that the preemption intervals you've set are very low. In my experience it can take a while for Hadoop to preempt a task; when you call killTask(), it must wait for the next heartbeat from the task's node, send it a KillTaskAction, and then wait for another heartbeat back to hear that it's gone. In a larger cluster, this might be more than 10 seconds. I would set this to 30 seconds. This is the same thing I recommend for users.
        Hide
        tomwhite Tom White added a comment -

        Thanks for having a look Matei. The goal is to exercise the scheduler with lots of small jobs, although in the future we could make this more complex by having larger jobs. I managed to use the test to trigger preemption with a suitable allocations file (described in the class javadoc).

        Here's an updated patch which refactors TestFairSchedulerSystem so that the two tests share code. I exposed properties for number of threads, jobs, pools, etc, to make the test parameterized.

        Show
        tomwhite Tom White added a comment - Thanks for having a look Matei. The goal is to exercise the scheduler with lots of small jobs, although in the future we could make this more complex by having larger jobs. I managed to use the test to trigger preemption with a suitable allocations file (described in the class javadoc). Here's an updated patch which refactors TestFairSchedulerSystem so that the two tests share code. I exposed properties for number of threads, jobs, pools, etc, to make the test parameterized.
        Hide
        matei Matei Zaharia added a comment -

        This looks like a good start, but what will you do to test other situations (e.g. preemption or jobs with different numbers of tasks)? Is the goal to have a single test that goes through all of this, or do you just want to stress test one part of the system with a lot of small jobs?

        Show
        matei Matei Zaharia added a comment - This looks like a good start, but what will you do to test other situations (e.g. preemption or jobs with different numbers of tasks)? Is the goal to have a single test that goes through all of this, or do you just want to stress test one part of the system with a lot of small jobs?
        Hide
        tomwhite Tom White added a comment -

        Here's some starting code, based on TestFairSchedulerSystem. Still need to check that all jobs complete without errors and produce the expected output (even when preempted).

        Show
        tomwhite Tom White added a comment - Here's some starting code, based on TestFairSchedulerSystem. Still need to check that all jobs complete without errors and produce the expected output (even when preempted).

          People

          • Assignee:
            tomwhite Tom White
            Reporter:
            tomwhite Tom White
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development