HBase
  1. HBase
  2. HBASE-10281

TestMultiParallel.testFlushCommitsNoAbort fails frequently in 0.94

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.16
    • Component/s: None
    • Labels:
      None

      Description

      Here's a run (with JDK7, but I've seen it with 0.96 as well).
      https://builds.apache.org/job/HBase-0.94-JDK7/17/testReport/junit/org.apache.hadoop.hbase.client/TestMultiParallel/testFlushCommitsNoAbort/

      Error Message
      
      Count of regions=10
      
      Stacktrace
      
      java.lang.AssertionError: Count of regions=10
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.apache.hadoop.hbase.client.TestMultiParallel.doTestFlushCommits(TestMultiParallel.java:289)
      	at org.apache.hadoop.hbase.client.TestMultiParallel.testFlushCommitsNoAbort(TestMultiParallel.java:222)
              ...
      

      This might be a side-effect of: HBASE-10259

      1. 10281-0.94.txt
        1 kB
        Lars Hofhansl

        Activity

        Hide
        Lars Hofhansl added a comment -

        So looking at the failure, I do not actually understand what the test is try to do.
        The offending part is here (in doTestFlushCommits called from testFlushCommitsNoAbort):

            for (JVMClusterUtil.RegionServerThread t: liveRSs) {
              int regions = t.getRegionServer().getOnlineRegions().size();
              Assert.assertTrue("Count of regions=" + regions, regions > 10);
            }
        

        So this verifies that each RS has at least 10 regions. We start out with creating 25 regions, at we have two RSs in this test. So this part would only pass if the cluster is nicely balanced. (It tries to balance before each test, but either that is not finished or it is not doing it perfectly at all time).

        In any case, we're not testing the balancer. So in the end we only need to check that we have at least 25 regions - if we want to check that at all. Before we get here we have verified that all rows are expecting are in fact present.

        Show
        Lars Hofhansl added a comment - So looking at the failure, I do not actually understand what the test is try to do. The offending part is here (in doTestFlushCommits called from testFlushCommitsNoAbort): for (JVMClusterUtil.RegionServerThread t: liveRSs) { int regions = t.getRegionServer().getOnlineRegions().size(); Assert.assertTrue( "Count of regions=" + regions, regions > 10); } So this verifies that each RS has at least 10 regions. We start out with creating 25 regions, at we have two RSs in this test. So this part would only pass if the cluster is nicely balanced. (It tries to balance before each test, but either that is not finished or it is not doing it perfectly at all time). In any case, we're not testing the balancer. So in the end we only need to check that we have at least 25 regions - if we want to check that at all. Before we get here we have verified that all rows are expecting are in fact present.
        Hide
        Lars Hofhansl added a comment -

        And that observation would lead me to this fix.

        Show
        Lars Hofhansl added a comment - And that observation would lead me to this fix.
        Hide
        Andrew Purtell added a comment -

        Seems reasonable to me. +1

        Show
        Andrew Purtell added a comment - Seems reasonable to me. +1
        Hide
        Lars Hofhansl added a comment -

        Looks like 0.96 and later have better mechanisms to wait for a balance to finish.
        I'll only apply this to 0.94.

        Show
        Lars Hofhansl added a comment - Looks like 0.96 and later have better mechanisms to wait for a balance to finish. I'll only apply this to 0.94.
        Hide
        Lars Hofhansl added a comment -

        Committed to 0.94.
        Thanks for taking a look Andrew Purtell

        Show
        Lars Hofhansl added a comment - Committed to 0.94. Thanks for taking a look Andrew Purtell

          People

          • Assignee:
            Lars Hofhansl
            Reporter:
            Lars Hofhansl
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development