Hadoop Common
  1. Hadoop Common
  2. HADOOP-4220

Job Restart tests take 10 minutes, can time out very easily

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HADOOP-3245 added job restart and tests for it, but the tests take a long time

      TestJobTrackerRestart 667.682
      TestJobTrackerRestartWithLostTracker 322.223

      Something needs to be done to speed them up to keep the test cycle viable.

      1. HADOOP-4220-v1.9.patch
        7 kB
        Amar Kamat
      2. HADOOP-4220-v1.8.patch
        7 kB
        Amar Kamat
      3. HADOOP-4220-v1.4.patch
        7 kB
        Amar Kamat
      4. HADOOP-4220-v1.1.patch
        11 kB
        Amar Kamat
      5. HADOOP-4220-v1.patch
        13 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Amar!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Amar!
          Hide
          Amar Kamat added a comment -

          Attaching a patch that applies to the trunk.

          Show
          Amar Kamat added a comment - Attaching a patch that applies to the trunk.
          Hide
          Devaraj Das added a comment -

          Sorry this patch doesn't apply cleanly. Could you please generate a new patch?

          Show
          Devaraj Das added a comment - Sorry this patch doesn't apply cleanly. Could you please generate a new patch?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12400081/HADOOP-4220-v1.8.patch
          against trunk revision 743513.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400081/HADOOP-4220-v1.8.patch against trunk revision 743513. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3839/console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          Attaching a new patch the fixes the failure of TestJobTrackerRestart.

          Show
          Amar Kamat added a comment - Attaching a new patch the fixes the failure of TestJobTrackerRestart .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12396736/HADOOP-4220-v1.4.patch
          against trunk revision 743045.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12396736/HADOOP-4220-v1.4.patch against trunk revision 743045. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3832/console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          Resubmitting.

          Show
          Amar Kamat added a comment - Resubmitting.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12396736/HADOOP-4220-v1.4.patch
          against trunk revision 742937.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12396736/HADOOP-4220-v1.4.patch against trunk revision 742937. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3827/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          overall patch looks good.

          Show
          Amareshwari Sriramadasu added a comment - overall patch looks good.
          Hide
          Amar Kamat added a comment -

          Attaching a patch updated to trunk.

          Show
          Amar Kamat added a comment - Attaching a patch updated to trunk.
          Hide
          Devaraj Das added a comment -

          Let's get HADOOP-4880 committed first. This patch depends on that.

          Show
          Devaraj Das added a comment - Let's get HADOOP-4880 committed first. This patch depends on that.
          Hide
          Amar Kamat added a comment -

          Attaching a patch that is updated to trunk. The test times on my box were 2 min 55 secs for TestJobTrackerRestart and 1 min 39 secs for TestJobTrackerRestartWithLostTracker.

          Show
          Amar Kamat added a comment - Attaching a patch that is updated to trunk. The test times on my box were 2 min 55 secs for TestJobTrackerRestart and 1 min 39 secs for TestJobTrackerRestartWithLostTracker .
          Hide
          Hemanth Yamijala added a comment -

          Given the number of times this is failing, I think it makes sense to address this issue for Hadoop 0.20.

          Show
          Hemanth Yamijala added a comment - Given the number of times this is failing, I think it makes sense to address this issue for Hadoop 0.20.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          TestJobTrackerRestart failed in hudson build #3722. See http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3722/testReport/

          Show
          Tsz Wo Nicholas Sze added a comment - TestJobTrackerRestart failed in hudson build #3722. See http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3722/testReport/
          Hide
          Amar Kamat added a comment -

          Attaching a patch the brings down the runtime of TestJobTrackerRestart and TestJobTrackerRestartWithLostTracker

          Test Run-time
          TestJobTrackerRestart 276.645 sec
          TestJobTrackerRestartWithLostTracker 136.736 sec

          Trying to optimize it further.

          Show
          Amar Kamat added a comment - Attaching a patch the brings down the runtime of TestJobTrackerRestart and TestJobTrackerRestartWithLostTracker Test Run-time TestJobTrackerRestart 276.645 sec TestJobTrackerRestartWithLostTracker 136.736 sec Trying to optimize it further.
          Hide
          steve_l added a comment -

          one cause of the delay is the 60second wait
          // Wait for a minute before submitting a job
          waitFor(60 * 1000);
          Would it be possible to spin and poll for whatever state change is required before starting the service. Relying on delays is very brittle.

          Show
          steve_l added a comment - one cause of the delay is the 60second wait // Wait for a minute before submitting a job waitFor(60 * 1000); Would it be possible to spin and poll for whatever state change is required before starting the service. Relying on delays is very brittle.
          Hide
          Amar Kamat added a comment -

          I can think of 2 options
          1) Split the test TestJobTrackerRestart into 2 test cases. Currently it comprises of 3 test cases.
          2) Reduce the timeout to << 1 min

          I will check if the test case can be improved further.

          Show
          Amar Kamat added a comment - I can think of 2 options 1) Split the test TestJobTrackerRestart into 2 test cases. Currently it comprises of 3 test cases. 2) Reduce the timeout to << 1 min I will check if the test case can be improved further.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development