Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1606

TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixed TestJobACLs test timeout failure because of no slots for launching JOB_CLEANUP task.

      Description

      TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task. Because MiniMRCluster with 0 TaskTrackers is started in the test. In trunk, we can set the config property mapreduce.job.committer.setup.cleanup.needed to false sothat we don't get into this issue.

      1. MAPREDUCE-1606-20100610.txt
        2 kB
        Vinod Kumar Vavilapalli
      2. 1606.v2.patch
        2 kB
        Ravi Gummadi
      3. 1606.v1.1.patch
        2 kB
        Ravi Gummadi
      4. 1606.v1.patch
        2 kB
        Ravi Gummadi
      5. 1606.patch
        1 kB
        Ravi Gummadi
      6. MR1606.20S.1.patch
        2 kB
        Arun C Murthy
      7. MR1606.patch
        0.7 kB
        Ravi Gummadi
      8. MR1606.20S.patch
        0.8 kB
        Ravi Gummadi

        Activity

        Hide
        Ravi Gummadi added a comment -

        Attaching patch for earlier version of hadoop. Not for commit here.

        Show
        Ravi Gummadi added a comment - Attaching patch for earlier version of hadoop. Not for commit here.
        Hide
        Ravi Gummadi added a comment -

        Attaching patch for trunk.

        Show
        Ravi Gummadi added a comment - Attaching patch for trunk.
        Hide
        Ravi Gummadi added a comment -

        As Amareshwari pointed offline, trunk patch would probably need more changes. Let me investigate more.

        Show
        Ravi Gummadi added a comment - As Amareshwari pointed offline, trunk patch would probably need more changes. Let me investigate more.
        Hide
        Arun C Murthy added a comment -

        Updated patch on behalf of Ravi.

        The problem with the previous patch was that the test could still timeout due to setup/cleanup tasks getting done before the kill, so a single map has been added to the job.

        Show
        Arun C Murthy added a comment - Updated patch on behalf of Ravi. The problem with the previous patch was that the test could still timeout due to setup/cleanup tasks getting done before the kill, so a single map has been added to the job.
        Hide
        Ravi Gummadi added a comment -

        Attaching updated patch for trunk with the fix.

        This patch needs to be committed only after MAPREDUCE-1727 because the testcase anyway fails becasue of MAPREDUCE-1727.

        Show
        Ravi Gummadi added a comment - Attaching updated patch for trunk with the fix. This patch needs to be committed only after MAPREDUCE-1727 because the testcase anyway fails becasue of MAPREDUCE-1727 .
        Hide
        Amareshwari Sriramadasu added a comment -

        Comments/javadoc added in 20 patch are not there in trunk's patch. Can you add them and make it patch available?

        Show
        Amareshwari Sriramadasu added a comment - Comments/javadoc added in 20 patch are not there in trunk's patch. Can you add them and make it patch available?
        Hide
        Ravi Gummadi added a comment -

        Attaching patch with the comments added.

        Show
        Ravi Gummadi added a comment - Attaching patch with the comments added.
        Hide
        Amareshwari Sriramadasu added a comment -

        One minor nit : In the Javadoc ".... a long time(2000 sec)" should be modified to 2000 milli sec or 2 seconds.

        Show
        Amareshwari Sriramadasu added a comment - One minor nit : In the Javadoc ".... a long time(2000 sec)" should be modified to 2000 milli sec or 2 seconds.
        Hide
        Ravi Gummadi added a comment -

        It is actually 2000sec. Because 2000millisec * 1000 iterations.

        Show
        Ravi Gummadi added a comment - It is actually 2000sec. Because 2000millisec * 1000 iterations.
        Hide
        Ravi Gummadi added a comment -

        My understanding was wrong. It is 2000sec only. I would like to increase it to at least 1 min.
        Attaching new patch with the change.

        Show
        Ravi Gummadi added a comment - My understanding was wrong. It is 2000sec only. I would like to increase it to at least 1 min. Attaching new patch with the change.
        Hide
        Ravi Gummadi added a comment -

        Earlier 2sec only. Changed it to 60sec in the latest patch uploaded.

        Show
        Ravi Gummadi added a comment - Earlier 2sec only. Changed it to 60sec in the latest patch uploaded.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch looks fine. Submitting for hudson.

        Show
        Amareshwari Sriramadasu added a comment - Patch looks fine. Submitting for hudson.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12443059/1606.v1.1.patch
        against trunk revision 938805.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443059/1606.v1.1.patch against trunk revision 938805. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/153/console This message is automatically generated.
        Hide
        Ravi Gummadi added a comment -

        Unit tests failures are because of zipException and not related to this patch.

        Though I couldn't reproduce the testcase TestJobACLs without this patch on trunk, I doubt it can fail some times(may be very rare) and would like this fix to get committed.

        Vinod/Amareshwari, What do you say ?

        Show
        Ravi Gummadi added a comment - Unit tests failures are because of zipException and not related to this patch. Though I couldn't reproduce the testcase TestJobACLs without this patch on trunk, I doubt it can fail some times(may be very rare) and would like this fix to get committed. Vinod/Amareshwari, What do you say ?
        Hide
        Ravi Gummadi added a comment -

        Could reproduce the issue in trunk by placing a sleep for 1 sec before killJob() call in verifyACLPersistence(). So once job is initialized, the problem of "not having slot for JOB_CLEANUP task" arises when killJob() is done.

        Show
        Ravi Gummadi added a comment - Could reproduce the issue in trunk by placing a sleep for 1 sec before killJob() call in verifyACLPersistence(). So once job is initialized, the problem of "not having slot for JOB_CLEANUP task" arises when killJob() is done.
        Hide
        Ravi Gummadi added a comment -

        Attaching patch for trunk fixing the issue by
        (1) making submitJobAsUser() wait till the job goes into RUNNING state
        (2) having a long running map task in the job so that job won't finish before we kill it
        (3) and having 2 TaskTrackers in the cluster

        Show
        Ravi Gummadi added a comment - Attaching patch for trunk fixing the issue by (1) making submitJobAsUser() wait till the job goes into RUNNING state (2) having a long running map task in the job so that job won't finish before we kill it (3) and having 2 TaskTrackers in the cluster
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Can you please upload the difference between test execution times before and after the patch? Just checking to see if we are bloating the execution time by an order of magnitude.

        Show
        Vinod Kumar Vavilapalli added a comment - Can you please upload the difference between test execution times before and after the patch? Just checking to see if we are bloating the execution time by an order of magnitude.
        Hide
        Ravi Gummadi added a comment -

        For the whole TestJobACLs, the execution time on my local machine increased from 5 sec to 37 sec.
        But Are we really worried about execution time here ? I think correctness of the testcases in TestJobACLs is the issue here and we want to have view-job, modify-job checks to be done after initialization of job is done. Right ?

        Show
        Ravi Gummadi added a comment - For the whole TestJobACLs, the execution time on my local machine increased from 5 sec to 37 sec. But Are we really worried about execution time here ? I think correctness of the testcases in TestJobACLs is the issue here and we want to have view-job, modify-job checks to be done after initialization of job is done. Right ?
        Hide
        Amareshwari Sriramadasu added a comment -

        I agree with Ravi that it is correctness that matters here.

        Show
        Amareshwari Sriramadasu added a comment - I agree with Ravi that it is correctness that matters here.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Here's a patch which reduces the test time down to 8-10 seconds.

        Ravi, can you please look at it?

        Show
        Vinod Kumar Vavilapalli added a comment - Here's a patch which reduces the test time down to 8-10 seconds. Ravi, can you please look at it?
        Hide
        Ravi Gummadi added a comment -

        Patch looks good. Reduced the execution time to 8sec.
        +1

        Show
        Ravi Gummadi added a comment - Patch looks good. Reduced the execution time to 8sec. +1
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12446759/MAPREDUCE-1606-20100610.txt
        against trunk revision 953490.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446759/MAPREDUCE-1606-20100610.txt against trunk revision 953490. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/562/console This message is automatically generated.
        Hide
        Ravi Gummadi added a comment -

        TestCopyFiles failures are not related to this patch. This patch changes only testcase TestJobACLs.java.

        Show
        Ravi Gummadi added a comment - TestCopyFiles failures are not related to this patch. This patch changes only testcase TestJobACLs.java.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        TestSimulatorDeterministicReplay failure is because of MAPREDUCE-1834.

        TestCopyFiles fails consistently, tracked at MAPREDUCE-1858.

        I am check this in.

        Show
        Vinod Kumar Vavilapalli added a comment - TestSimulatorDeterministicReplay failure is because of MAPREDUCE-1834 . TestCopyFiles fails consistently, tracked at MAPREDUCE-1858 . I am check this in.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        The test was the same in 0.21 and hence the failure too.

        Show
        Vinod Kumar Vavilapalli added a comment - The test was the same in 0.21 and hence the failure too.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        I just committed this to trunk and 0.21. Thanks Ravi!

        Show
        Vinod Kumar Vavilapalli added a comment - I just committed this to trunk and 0.21. Thanks Ravi!

          People

          • Assignee:
            Ravi Gummadi
            Reporter:
            Ravi Gummadi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development