Hadoop Common
  1. Hadoop Common
  2. HADOOP-9330

Add custom JUnit4 test runner with configurable timeout

    Details

    • Type: Test Test
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: test
    • Labels:
      None

      Description

      HADOOP-9112 has added a requirement for all new test methods to declare a timeout, so that jenkins/maven builds will have better information on a timeout.

      Hard coding timeouts into tests is dangerous as it will generate spurious failures on slower machines/networks and when debugging a test.

      I propose providing a custom JUnit4 test runner that test cases can declare as their test runner; this can provide timeouts specified at run-time, rather than in-source.

        Issue Links

          Activity

          Steve Loughran created issue -
          Steve Loughran made changes -
          Field Original Value New Value
          Affects Version/s 3.0.0 [ 12320357 ]
          Hide
          Steve Loughran added a comment -

          Discussions with the Maven Surefire team (https://issues.apache.org/jira/browse/HADOOP-9112) imply that we can use the @RunWith attribute in a test class to define a custom test runner for it.

          This means we can

          1. Write our own test runner that picks up a default timeout from a system property.
          2. Define a base test class, HadoopTestBase, that declares its use of this.
          3. Set the maven build up to propagate a timeout via a system property.
          4. Have jenkins choose a timeout appropriate to it.

          Doing this will obviate the need to place brittle test timeouts in source, and be easy to retrofit to existing test classes.

          Assuming all future test cases get built off a new base class (possibly with tuned YarnTestBase, HdfsTestBase classes), the base classes would have to go back into branch-1 if there was any goal of backporting new tests from trunk. Unless Ant can be set up to switch to the new test runner, the @RunWith attribute would have to stripped from the backported test cases.

          Show
          Steve Loughran added a comment - Discussions with the Maven Surefire team ( https://issues.apache.org/jira/browse/HADOOP-9112 ) imply that we can use the @RunWith attribute in a test class to define a custom test runner for it. This means we can Write our own test runner that picks up a default timeout from a system property. Define a base test class, HadoopTestBase , that declares its use of this. Set the maven build up to propagate a timeout via a system property. Have jenkins choose a timeout appropriate to it. Doing this will obviate the need to place brittle test timeouts in source, and be easy to retrofit to existing test classes. Assuming all future test cases get built off a new base class (possibly with tuned YarnTestBase, HdfsTestBase classes), the base classes would have to go back into branch-1 if there was any goal of backporting new tests from trunk. Unless Ant can be set up to switch to the new test runner, the @RunWith attribute would have to stripped from the backported test cases.
          Steve Loughran made changes -
          Link This issue relates to HADOOP-9112 [ HADOOP-9112 ]
          Hide
          Konstantin Boudnik added a comment -

          As I have commented elsewhere setting a time out in the runtime can be achievable through surefire.timeout property, can it not?

          Show
          Konstantin Boudnik added a comment - As I have commented elsewhere setting a time out in the runtime can be achievable through surefire.timeout property, can it not?
          Hide
          Luke Lu added a comment -

          setting a time out in the runtime can be achievable through surefire.timeout property, can it not?

          surefire.timeout is a test process timeout or per test class if fork mode is always, not per test, which is the rationale for HADOOP-9112 in the first place.

          we can use the @RunWith attribute in a test class to define a custom test runner for it.

          There is no need to use a custom test runner. org.junit.rules.Timeout is what you want. We can create a common base test class like this.

          public class TestBase {
            @Rule public Timeout defaultTimeout = new Timeout(Integer.parseInt(
                System.getProperty("test.default.timeout", 100000)));
            ...
          }
          
          Show
          Luke Lu added a comment - setting a time out in the runtime can be achievable through surefire.timeout property, can it not? surefire.timeout is a test process timeout or per test class if fork mode is always, not per test , which is the rationale for HADOOP-9112 in the first place. we can use the @RunWith attribute in a test class to define a custom test runner for it. There is no need to use a custom test runner. org.junit.rules.Timeout is what you want. We can create a common base test class like this. public class TestBase { @Rule public Timeout defaultTimeout = new Timeout( Integer .parseInt( System .getProperty( "test. default .timeout" , 100000))); ... }
          Hide
          Konstantin Boudnik added a comment -

          Avoiding subclassing of a common test class is one of the main reason people are migrating away from Junit3, IMO. Do we really want to enforce subclassing?

          As for 'per class' vs 'per testcase': if your test case has timed out then it will terminate the whole class run, won't it?

          Show
          Konstantin Boudnik added a comment - Avoiding subclassing of a common test class is one of the main reason people are migrating away from Junit3, IMO. Do we really want to enforce subclassing? As for 'per class' vs 'per testcase': if your test case has timed out then it will terminate the whole class run, won't it?
          Hide
          Luke Lu added a comment -

          Do we really want to enforce subclassing?

          No, you can use the @Test(timeout=seconds) just fine. You can add the default per test timeout rule to any test class as well. TestBase with a per test default timeout is just a convenience.

          if your test case has timed out then it will terminate the whole class run, won't it?

          Please read the comments on HADOOP-9112 to find out why this is not desirable.

          Show
          Luke Lu added a comment - Do we really want to enforce subclassing? No, you can use the @Test(timeout=seconds) just fine. You can add the default per test timeout rule to any test class as well. TestBase with a per test default timeout is just a convenience. if your test case has timed out then it will terminate the whole class run, won't it? Please read the comments on HADOOP-9112 to find out why this is not desirable.
          Hide
          Konstantin Boudnik added a comment -

          Oh right... this is @Rule. Sorry, I misread your comment. Yes, this sounds like a proper way to go. Plus, we need to lose the enforcement in the test-patch - it is no go in the first place.

          Show
          Konstantin Boudnik added a comment - Oh right... this is @Rule . Sorry, I misread your comment. Yes, this sounds like a proper way to go. Plus, we need to lose the enforcement in the test-patch - it is no go in the first place.
          Hide
          Steve Loughran added a comment -

          @Luke: if that's all we need to do, that's all we need.

          1. We should be able to put this base test case into hadoop-common test & have it imported by the other projects -which all import the -test JAR.
          2. What name & package?

          Konstantin: I thought the switch from JUnit3 to JUnit4 was driven by the goal of adding beforeclass/afterclass methods and so setup/teardown miniclusters only once per test class, not per test.

          The added benefit we get is being able to skip, not just in attributes, but by raising AssumptionViolatedException, either from the assume(), method, or in your own code. I'm using that in some of my tests already.

          Show
          Steve Loughran added a comment - @Luke: if that's all we need to do, that's all we need. We should be able to put this base test case into hadoop-common test & have it imported by the other projects -which all import the -test JAR. What name & package? Konstantin: I thought the switch from JUnit3 to JUnit4 was driven by the goal of adding beforeclass/afterclass methods and so setup/teardown miniclusters only once per test class, not per test. The added benefit we get is being able to skip, not just in attributes, but by raising AssumptionViolatedException , either from the assume() , method, or in your own code. I'm using that in some of my tests already.
          Hide
          Alejandro Abdelnur added a comment -

          If using a @Rule/MethodRule Timeout I'd suggest 2 additional things:

          • Have a @TestTimeout annotation as well to be able to set a timeout other than the default for particular test method.
          • Instead having a system property set the default timeout, lets specify a timeout.ratio, with default 1, by doing this the ratio can be also applied to tests that have a custom timeout
          Show
          Alejandro Abdelnur added a comment - If using a @Rule/MethodRule Timeout I'd suggest 2 additional things: Have a @TestTimeout annotation as well to be able to set a timeout other than the default for particular test method. Instead having a system property set the default timeout, lets specify a timeout.ratio, with default 1, by doing this the ratio can be also applied to tests that have a custom timeout
          Hide
          Konstantin Boudnik added a comment -

          Alejandro, the functionality of @TestTimeout exists in JUnit4. It is done through @Test{timeout=<number>} as far as I remember.

          Show
          Konstantin Boudnik added a comment - Alejandro, the functionality of @TestTimeout exists in JUnit4. It is done through @Test{timeout=<number> } as far as I remember.
          Hide
          Konstantin Boudnik added a comment -

          Konstantin: I thought the switch from JUnit3 to JUnit4 was driven by the goal of adding beforeclass/afterclass methods

          Partially. However, the most pain has been caused by the need to subclass TestCase all the time. Oftentimes it blocked having a common pieces of the code being isolated in the same superclass for later use in the children, IIRC.

          Show
          Konstantin Boudnik added a comment - Konstantin: I thought the switch from JUnit3 to JUnit4 was driven by the goal of adding beforeclass/afterclass methods Partially. However, the most pain has been caused by the need to subclass TestCase all the time. Oftentimes it blocked having a common pieces of the code being isolated in the same superclass for later use in the children, IIRC.
          Hide
          Steve Loughran added a comment -

          @Konstantin: I see. Having a std test base with a timeout rule doesn't stop this provided all test cases add the same timeout rule. We could isolate the timeout extraction logic into a static method in the planned base class; other test cases could use that an their own @Rule declaration

          Show
          Steve Loughran added a comment - @Konstantin: I see. Having a std test base with a timeout rule doesn't stop this provided all test cases add the same timeout rule. We could isolate the timeout extraction logic into a static method in the planned base class; other test cases could use that an their own @Rule declaration
          Hide
          Alejandro Abdelnur added a comment - - edited

          Cos, @Test{timeout = <> ) exist, yes, if we can tweak that timeout to take a global ratio into account then we are good.

          Agree with Steve that subclassing should be a convenience, not a requirement. If you subclass you get the @Rule setting for free, otherwise you have to do it yourself.

          Show
          Alejandro Abdelnur added a comment - - edited Cos, @Test{timeout = <> ) exist, yes, if we can tweak that timeout to take a global ratio into account then we are good. Agree with Steve that subclassing should be a convenience, not a requirement. If you subclass you get the @Rule setting for free, otherwise you have to do it yourself.
          Hide
          Steve Loughran added a comment -

          first cut: test class (without any tests underneath to verify that it works)

          Show
          Steve Loughran added a comment - first cut: test class (without any tests underneath to verify that it works)
          Steve Loughran made changes -
          Attachment HADOOP-9330-timeouts-1.patch [ 12573149 ]
          Steve Loughran made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12573149/HADOOP-9330-timeouts-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 tests included appear to have a timeout.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2310//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2310//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573149/HADOOP-9330-timeouts-1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 tests included appear to have a timeout. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2310//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2310//console This message is automatically generated.
          Hide
          Robert Joseph Evans added a comment -

          The concept seems fine, but the Timeout rule and @Test(timeout=XXX) are not aware of each other. This means that the effective timeout of any test is which ever is smaller. I don't think that this is a real problem, just that the comments and the name of the member variable defaultTimout is slightly misleading. I also don't know if we have any tests that are intended to run for more than 100s. If so they will always timeout after 100s unless they do not extend the HadoopBase, or we set the default to be higher.

          Also, I don't know if there is anything we can do about this or not, but when we use both timeouts, the Timeout rule's backtrace, when it fails is close to useless.

          testSleep(org.apache.hadoop.test.TestSomething)  Time elapsed: 1091 sec  <<< ERROR!
          java.lang.Exception: test timed out after 1000 milliseconds
                  at java.lang.Object.wait(Native Method)
                  at java.lang.Thread.join(Thread.java:1194)
                  at org.junit.internal.runners.statements.FailOnTimeout.evaluate(FailOnTimeout.java:36)
                  at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
          

          It simply says that the code that "timed out" was a thread waiting for the actual test to finish running This is because there are actually two threads monitoring the test, instead of just one.

          I realize that a lot of my complaints are perhaps things that need to just be addressed by the JUnit, I just want us to be fully aware of them as we go into this and document things appropriately, so we know what is happening when issues arise.

          Show
          Robert Joseph Evans added a comment - The concept seems fine, but the Timeout rule and @Test(timeout=XXX) are not aware of each other. This means that the effective timeout of any test is which ever is smaller. I don't think that this is a real problem, just that the comments and the name of the member variable defaultTimout is slightly misleading. I also don't know if we have any tests that are intended to run for more than 100s. If so they will always timeout after 100s unless they do not extend the HadoopBase, or we set the default to be higher. Also, I don't know if there is anything we can do about this or not, but when we use both timeouts, the Timeout rule's backtrace, when it fails is close to useless. testSleep(org.apache.hadoop.test.TestSomething) Time elapsed: 1091 sec <<< ERROR! java.lang.Exception: test timed out after 1000 milliseconds at java.lang. Object .wait(Native Method) at java.lang. Thread .join( Thread .java:1194) at org.junit.internal.runners.statements.FailOnTimeout.evaluate(FailOnTimeout.java:36) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) It simply says that the code that "timed out" was a thread waiting for the actual test to finish running This is because there are actually two threads monitoring the test, instead of just one. I realize that a lot of my complaints are perhaps things that need to just be addressed by the JUnit, I just want us to be fully aware of them as we go into this and document things appropriately, so we know what is happening when issues arise.
          Hide
          Steve Loughran added a comment -

          I'm happy with having a very long default timeout -as it's purpose is to stop Jenkins hanging.

          I personally think having any timeouts in individual tests is incredibly brittle for testing on different machines, so people should not be explicitly setting timeouts in tests except in specific cases, where somehow they can't just set the test runner properties to change the global default.

          1. What default do you think the base class should have? 100s is <2 minutes, which should be enough for most tests -are there any which regularly come close to that time on anyone's system? (that's excluding minicluster setup/teardown)/
          2. What documentation are you thinking of? Is there something on writing and running tests? If not, it may be time.
          Show
          Steve Loughran added a comment - I'm happy with having a very long default timeout -as it's purpose is to stop Jenkins hanging. I personally think having any timeouts in individual tests is incredibly brittle for testing on different machines, so people should not be explicitly setting timeouts in tests except in specific cases, where somehow they can't just set the test runner properties to change the global default. What default do you think the base class should have? 100s is <2 minutes, which should be enough for most tests -are there any which regularly come close to that time on anyone's system? (that's excluding minicluster setup/teardown)/ What documentation are you thinking of? Is there something on writing and running tests? If not, it may be time.
          Hide
          Robert Joseph Evans added a comment -

          I agree that there should be something about writing and running tests, but I am not aware of it either. I was thinking of just the javadocs for HadoopTestBase, but a dedicated wiki page or a subsection of HowToContribute would probably be better. I agree that having timeout= in the code is brittle, and we probably want to start removing it once this goes in (along with the changes to test-patch.sh).

          But in a follow on JIRA I was thinking we probably could support something similar to what Luke Lu proposed. It should not be that hard to add in our own timeout test runner that can look for an @Test annotation with a timeout, output a warning about the timeout, and then allow JUnit to run with that timeout. We could also provide an @Timeout annotation that would let us specify a timeout multiplier that is X times the configured base timeout. That way we can keep a 100s timeout and adjust it for tests that do take longer.

          I am not tied to the idea though, and if it feels like too much work compared simply upping the default to something like 600s works we could do that instead.

          Show
          Robert Joseph Evans added a comment - I agree that there should be something about writing and running tests, but I am not aware of it either. I was thinking of just the javadocs for HadoopTestBase, but a dedicated wiki page or a subsection of HowToContribute would probably be better. I agree that having timeout= in the code is brittle, and we probably want to start removing it once this goes in (along with the changes to test-patch.sh). But in a follow on JIRA I was thinking we probably could support something similar to what Luke Lu proposed. It should not be that hard to add in our own timeout test runner that can look for an @Test annotation with a timeout, output a warning about the timeout, and then allow JUnit to run with that timeout. We could also provide an @Timeout annotation that would let us specify a timeout multiplier that is X times the configured base timeout. That way we can keep a 100s timeout and adjust it for tests that do take longer. I am not tied to the idea though, and if it feels like too much work compared simply upping the default to something like 600s works we could do that instead.

            People

            • Assignee:
              Unassigned
              Reporter:
              Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:

                Development