Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5114

TestMetaReaderEditor can timeout during test execution

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.94.0
    • None
    • test
    • None
    • Reviewed

    Description

      Likely because it belongs to the too long list of flaky tests. A test should not fail randomly. Here is the list, from 2974 to 2598, with a timeouted TestMetaReaderEditor
      https://builds.apache.org/job/HBase-TRUNK/2597/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/HBase-TRUNK/2588/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/HBase-TRUNK/2585/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/HBase-TRUNK/2584/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/HBase-TRUNK/2582/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/HBase-TRUNK/2577/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/HBase-TRUNK/2574/testReport/org.apache.hadoop.hbase.catalog/

      It happens on hadoop-qa as well:
      https://builds.apache.org/job/PreCommit-HBASE-Build/640/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/PreCommit-HBASE-Build/638/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/PreCommit-HBASE-Build/630/testReport/org.apache.hadoop.hbase.catalog/
      https://builds.apache.org/job/PreCommit-HBASE-Build/622/testReport/org.apache.hadoop.hbase.catalog/

      Looking at the source code, main issues are:

      • in testRetrying, some threads are started, but will not be stopped if an assertion fails. As a consequence, the process will not stop
      • still in testRetrying, it's not possible to use @Test(timeout=180000), because if this timeout is reached, the thread won't be stopped.

      This does not explain why the test fails sometimes, but explain why we have a timeout instead of a clean failure.

      I also fixed some stuff that seems unrelated, and I cannot reproduce the error anymore locally after 10 tries. So I will try the patch on hadoop-qa.

      The patch includes as well the removal of TestAdmin#testHundredsOfTable because this test takes 2 minutes to execute; but proves nothing on success as there is no check on ressources used.

      Attachments

        1. 5114.patch
          7 kB
          Nicolas Liochon

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nkeywal Nicolas Liochon
            nkeywal Nicolas Liochon
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment