Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14420 Zombie Stomping Session
  3. HBASE-14883

TestSplitTransactionOnCluster#testFailedSplit flakey

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.0, 1.3.0
    • 1.2.0, 1.3.0
    • test
    • None

    Description

      Only in branch-1 and branch-1.2.

      Fails look like this:

      https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.3/jdk=latest1.8,label=Hadoop/397/

      TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml.<init>

      If I look in the xml, I see this:

        <testcase name="testFailedSplit" classname="org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster" time="8.275">
          <flakyFailure type="java.lang.AssertionError:">java.lang.AssertionError: null
      	at org.junit.Assert.fail(Assert.java:86)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.junit.Assert.assertTrue(Assert.java:52)
      	at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testFailedSplit(TestSplitTransactionOnCluster.java:1339)
      
            <system-err><![CDATA[
      

      ... the xml is cut off.

      testFailedSplit seems to be the culprit.

      If I look in the -output.txt I see:

      ....
      
      2015-11-25 09:00:37,816 DEBUG [asf905.gq1.ygridcore.net,48894,1448441976062_ChoreService_1] balancer.BaseLoadBalancer$Cluster(838):  Lowest locality region index is 0 and its region server contains 3 regions
      2015-11-25 09:00:37,816 DEBUG [asf905.gq1.ygridcore.net,48894,1448441976062_ChoreService_1] balancer.BaseLoadBalancer$Cluster(813): Lowest locality region server with non zero regions is asf905.gq1.ygridcore.net with locality 0.0
      2015-11-25 09:00:37,816 DEBUG [asf905.gq1.ygridcore.net,48894,1448441976062_ChoreService_1] balancer.BaseLoadBalancer$Cluster(838):  Lowest locality region index is 0 and its region server contains 3 regions
      2015-11-25 09:00:37,817 DEBUG [asf905.gq1.ygridcore.net,48894,1448441976062_ChoreService_1] balancer.BaseLoadBalancer$Cluster(813): Lowest locality region server with non zero regions is asf905.gq1.ygridcore.net with locality 0.0
      
      ...
      

      spewing...

      This test was added here:

      kalashnikov:hbase.git.commit stack$ git log -S testFailedSplit  ./hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
      commit 871444cb0a733b82af843952253b4545a407979a
      Author: Andrew Purtell <apurtell@apache.org>
      Date:   Mon Dec 15 17:31:33 2014 -0800
      
          HBASE-12686 Failures in split before PONR not clearing the daughter regions from regions in transition during rollback (Vandana Ayyalasomayajula)
      

      The balancer is not coming back true (line #1339 assert is null according to above)

      ...
      1337       regions = TESTING_UTIL.getHBaseAdmin().getTableRegions(tableName);
      1338       assertTrue(regions.size() == 1);
      1339       assertTrue(admin.balancer());
      ...
      

      Line #1339 was not in original test. It was added later:

      commit 46f993b19fa11d1a8880d08045be43e38017b46a
      Author: Virag Kothari <virag@yahoo-inc.com>
      Date:   Wed Jan 7 10:58:32 2015 -0800
      
          HBASE-12694 testTableExistsIfTheSpecifiedTableRegionIsSplitParent in TestSplitTransactionOnCluster class leaves regions in transition (Vandana Ayyalasomayajula)
      
      

      We are having trouble achieving a balance.... Let me see.

      Attachments

        1. 14883-branch-1.txt
          2 kB
          Michael Stack

        Activity

          People

            stack Michael Stack
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: