Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1062

MRReliability test does not work with retired jobs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Ensure that MRReliability works with retired-jobs feature turned on.

      Description

      Currently the MRReliability uses job clients get all job api which also includes retired jobs.

      In case we have retired jobs in cluster,
      The retired jobs are appended at the end of the job list, this results in Test always getting completed job and not spawning off KillTask thread and KillTracker threads.

      1. mapreduce-ydist-20-1.patch
        2 kB
        Sreekanth Ramakrishnan
      2. mapreduce-1062-4.patch
        3 kB
        Sreekanth Ramakrishnan
      3. mapreduce-1062-3-ydist.patch
        3 kB
        Sreekanth Ramakrishnan
      4. mapreduce-1062-3.patch
        3 kB
        Sreekanth Ramakrishnan
      5. mapreduce-1062-2.patch
        2 kB
        Sreekanth Ramakrishnan
      6. mapreduce-1062-1.patch
        2 kB
        Sreekanth Ramakrishnan

        Issue Links

          Activity

          Sreekanth Ramakrishnan created issue -
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching Yahoo! distribution patch, tested it running test on a cluster with retired jobs.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching Yahoo! distribution patch, tested it running test on a cluster with retired jobs.
          Sreekanth Ramakrishnan made changes -
          Field Original Value New Value
          Attachment mapreduce-ydist-20-1.patch [ 12421389 ]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching patch for 21 and trunk.

          Using schedulers job list which always has only waiting/running jobs in it. in order which scheduler looks at it. Which makes the lastest job in last position.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching patch for 21 and trunk. Using schedulers job list which always has only waiting/running jobs in it. in order which scheduler looks at it. Which makes the lastest job in last position.
          Sreekanth Ramakrishnan made changes -
          Attachment mapreduce-1062-1.patch [ 12421390 ]
          Hide
          Ramya Sunil added a comment -

          This is a duplicate of MAPREDUCE-1053

          Show
          Ramya Sunil added a comment - This is a duplicate of MAPREDUCE-1053
          Hide
          Hemanth Yamijala added a comment -

          I started looking at the patch. Unfortunately, I think the current algorithm makes assumptions about how the scheduler works. So, while it works perfectly well for the CapacityTaskScheduler, it may not work correctly with the FairshareScheduler, because the latter removes jobs it maintains per pool lazily. Hence, there may be a case where the number of jobs returned by getJobsFromQueue is non-zero, but it doesn't mean the current job is submitted.

          I think there is already an assumption that this test is run independently on a cluster, because it kills tasktrackers etc and could affect other jobs if they are run in parallel. For the same reason, jobs within the reliability test are run one after the other. So, wouldn't it be right to use jobsToComplete instead of getJobsFromQueue and as long as this is non-zero, we can assume it is the job most recently submitted ?

          Some other minor points:

          • Can we update the documentation to say how the reliability test should be run ? For instance, we have to run it on a cluster that is not running other jobs, as stated above.
          • Also, I would suggest we fail noisily if the last job we get is not in the PREP or RUNNING state, so that we wouldn't have false positive runs of the MRReliabiliy test.
          Show
          Hemanth Yamijala added a comment - I started looking at the patch. Unfortunately, I think the current algorithm makes assumptions about how the scheduler works. So, while it works perfectly well for the CapacityTaskScheduler, it may not work correctly with the FairshareScheduler, because the latter removes jobs it maintains per pool lazily. Hence, there may be a case where the number of jobs returned by getJobsFromQueue is non-zero, but it doesn't mean the current job is submitted. I think there is already an assumption that this test is run independently on a cluster, because it kills tasktrackers etc and could affect other jobs if they are run in parallel. For the same reason, jobs within the reliability test are run one after the other. So, wouldn't it be right to use jobsToComplete instead of getJobsFromQueue and as long as this is non-zero, we can assume it is the job most recently submitted ? Some other minor points: Can we update the documentation to say how the reliability test should be run ? For instance, we have to run it on a cluster that is not running other jobs, as stated above. Also, I would suggest we fail noisily if the last job we get is not in the PREP or RUNNING state, so that we wouldn't have false positive runs of the MRReliabiliy test.
          Sreekanth Ramakrishnan made changes -
          Attachment mapreduce-1062-2.patch [ 12421948 ]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attached patch fixes to use jobsToComplete() instead of getJobsFromQueue().
          Modified the javadoc and usage to mention that tests should run on a free cluster.
          Also failing nosily if the job returned from the jobsToComplete() is complete.

          Show
          Sreekanth Ramakrishnan added a comment - Attached patch fixes to use jobsToComplete() instead of getJobsFromQueue() . Modified the javadoc and usage to mention that tests should run on a free cluster. Also failing nosily if the job returned from the jobsToComplete() is complete.
          Sreekanth Ramakrishnan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Missed out Hemanth's comment about removing the prevJobNum. Also corrected typo in documentation based on Hemanth's comment.

          Show
          Sreekanth Ramakrishnan added a comment - Missed out Hemanth's comment about removing the prevJobNum . Also corrected typo in documentation based on Hemanth's comment.
          Sreekanth Ramakrishnan made changes -
          Attachment mapreduce-1062-3.patch [ 12422086 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12421948/mapreduce-1062-2.patch
          against trunk revision 825055.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421948/mapreduce-1062-2.patch against trunk revision 825055. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/72/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          Looks OK to me. Sreekanth, can you confirm running reliability test with the retired jobs feature on is working now ?

          Show
          Hemanth Yamijala added a comment - Looks OK to me. Sreekanth, can you confirm running reliability test with the retired jobs feature on is working now ?
          Sreekanth Ramakrishnan made changes -
          Link This issue is blocked by HADOOP-6269 [ HADOOP-6269 ]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Marking HADOOP-6269 as blocker for this issue, as the client.jobsToComplete() throws a concurrent modification exception, as two threads in code modify the same static level resource object.

          The same behavior is not exhibited in 20 tho'

          Show
          Sreekanth Ramakrishnan added a comment - Marking HADOOP-6269 as blocker for this issue, as the client.jobsToComplete() throws a concurrent modification exception, as two threads in code modify the same static level resource object. The same behavior is not exhibited in 20 tho'
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching Yahoo! distribution patch. The trunk patch is blocked by HADOOP-6269 as the JobClient.jobsToComplete() on trunk creates new JobConf and job submission is happening at same time causing a ConcurrentModificationException due to defaultResources being modified in the Configuration object.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching Yahoo! distribution patch. The trunk patch is blocked by HADOOP-6269 as the JobClient.jobsToComplete() on trunk creates new JobConf and job submission is happening at same time causing a ConcurrentModificationException due to defaultResources being modified in the Configuration object.
          Sreekanth Ramakrishnan made changes -
          Attachment mapreduce-1062-3-ydist.patch [ 12422201 ]
          Arun C Murthy made changes -
          Release Note Ensure that MRReliability works with retired-jobs feature turned on.
          Sreekanth Ramakrishnan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Re running thro' Hudson

          Show
          Sreekanth Ramakrishnan added a comment - Re running thro' Hudson
          Sreekanth Ramakrishnan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12422201/mapreduce-1062-3-ydist.patch
          against trunk revision 884628.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/274/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422201/mapreduce-1062-3-ydist.patch against trunk revision 884628. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/274/console This message is automatically generated.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Attaching mapred-1062-3.patch as mapreduce-1062-4.patch for running thro' hudson.

          Show
          Sreekanth Ramakrishnan added a comment - Attaching mapred-1062-3.patch as mapreduce-1062-4.patch for running thro' hudson.
          Sreekanth Ramakrishnan made changes -
          Attachment mapreduce-1062-4.patch [ 12427293 ]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Cancelling patch and running thro' hudson with latest patch.

          Show
          Sreekanth Ramakrishnan added a comment - Cancelling patch and running thro' hudson with latest patch.
          Sreekanth Ramakrishnan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Sreekanth Ramakrishnan added a comment -

          Rerunning the patch thro' hudson.

          Show
          Sreekanth Ramakrishnan added a comment - Rerunning the patch thro' hudson.
          Sreekanth Ramakrishnan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427293/mapreduce-1062-4.patch
          against trunk revision 888269.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427293/mapreduce-1062-4.patch against trunk revision 888269. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/301/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          I committed this. Thanks, Sreekanth!

          Show
          Chris Douglas added a comment - I committed this. Thanks, Sreekanth!
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.22.0 [ 12314184 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #291 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/291/)
          MAPREDUCE-1062. Fix ReliabilityTest to work with retired jobs. Contributed by Sreekanth Ramakrishnan

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #291 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/291/ ) MAPREDUCE-1062 . Fix ReliabilityTest to work with retired jobs. Contributed by Sreekanth Ramakrishnan
          Tom White made changes -
          Fix Version/s 0.21.0 [ 12314045 ]
          Fix Version/s 0.22.0 [ 12314184 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Sreekanth Ramakrishnan
              Reporter:
              Sreekanth Ramakrishnan
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development