HBase
  1. HBase
  2. HBASE-5726

TestSplitTransactionOnCluster occasionally failing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When I ran TestSplitTransactionOnCluster, some times tests are failing.

      java.lang.AssertionError: expected:<1> but was:<0>
      at org.junit.Assert.fail(Assert.java:93)
      at org.junit.Assert.failNotEquals(Assert.java:647)
      at org.junit.Assert.assertEquals(Assert.java:128)
      at org.junit.Assert.assertEquals(Assert.java:472)
      at org.junit.Assert.assertEquals(Assert.java:456)
      at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.getAndCheckSingleTableRegion(TestSplitTransactionOnCluster.java:89)
      at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:298)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)

      Seems like test is flaky, random other cases also fails.

        Activity

        Hide
        stack added a comment -

        Marking closed.

        Show
        stack added a comment - Marking closed.
        Hide
        Ted Yu added a comment -

        Trunk build has succeeded 5 times.

        Show
        Ted Yu added a comment - Trunk build has succeeded 5 times.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #49 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/49/)
        HBASE-5726 TestSplitTransactionOnCluster occasionally failing (Revision 1347852)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #49 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/49/ ) HBASE-5726 TestSplitTransactionOnCluster occasionally failing (Revision 1347852) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #3001 (See https://builds.apache.org/job/HBase-TRUNK/3001/)
        HBASE-5726 TestSplitTransactionOnCluster occasionally failing (Revision 1347852)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #3001 (See https://builds.apache.org/job/HBase-TRUNK/3001/ ) HBASE-5726 TestSplitTransactionOnCluster occasionally failing (Revision 1347852) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
        Hide
        Ted Yu added a comment -

        Integrated to trunk.

        Will resolve after at least 5 trunk builds where this test passes.

        Thanks for the review, Stack.

        Show
        Ted Yu added a comment - Integrated to trunk. Will resolve after at least 5 trunk builds where this test passes. Thanks for the review, Stack.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12531334/5726.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531334/5726.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2126//console This message is automatically generated.
        Hide
        stack added a comment -

        +1 Lets try it Ted.

        Show
        stack added a comment - +1 Lets try it Ted.
        Hide
        Ted Yu added a comment -

        I ran TestSplitTransactionOnCluster#testShutdownFixupWhenDaughterHasSplit 5 times with the patch - they passed.

        Show
        Ted Yu added a comment - I ran TestSplitTransactionOnCluster#testShutdownFixupWhenDaughterHasSplit 5 times with the patch - they passed.
        Hide
        Ted Yu added a comment -

        From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/3000/testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/:

        2012-06-07 18:33:22,794 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZKUtil(1142): master:49315-0x137c838bfa60000 Retrieved 103 byte(s) of data from znode /hbase/unassigned/73830568ee93434ba97f7b5ade48ae30 and set watcher; region=ephemeral,,1339093997065.73830568ee93434ba97f7b5ade48ae30., state=RS_ZK_REGION_SPLITTING, servername=juno.apache.org,39424,1339093992166, createTime=1339094002792, payload.length=0
        ...
        2012-06-07 18:33:47,887 DEBUG [Thread-941] regionserver.TestSplitTransactionOnCluster(482): Waiting on region to split
        2012-06-07 18:33:47,922 DEBUG [RegionServer:8;juno.apache.org,43570,1339094025325-splits-1339094027483] regionserver.HRegion(463): Instantiated testMasterRestartAtRegionSplitPendingCatalogJanitor,,1339094027484.23694c0a5312f5801dfd5a2857cc3556.
        2012-06-07 18:33:23,648 DEBUG [RegionServer:0;juno.apache.org,39424,1339093992166-splits-1339094002786] regionserver.HRegion(463): Instantiated ephemeral,mnk,1339094002786.b5c2d9c3e0939c583f874e3efd51b478.
        2012-06-07 18:33:23,680 INFO  [RegionServer:0;juno.apache.org,39424,1339093992166-splits-1339094002786] catalog.MetaEditor(191): Offlined parent region ephemeral,,1339093997065.73830568ee93434ba97f7b5ade48ae30. in META
        

        We can see that region 73830568ee93434ba97f7b5ade48ae30 didn't finish splitting after the last 'Waiting on region to split' was printed.
        In split() method:

            while (ProtobufUtil.getOnlineRegions(server).size() <= regionCount) {
              LOG.debug("Waiting on region to split");
        

        I think the above method should be improved: if a region is moved onto server, the loop would exit but number of daughter regions wouldn't be 2.

        Show
        Ted Yu added a comment - From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/3000/testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/: 2012-06-07 18:33:22,794 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZKUtil(1142): master:49315-0x137c838bfa60000 Retrieved 103 byte (s) of data from znode /hbase/unassigned/73830568ee93434ba97f7b5ade48ae30 and set watcher; region=ephemeral,,1339093997065.73830568ee93434ba97f7b5ade48ae30., state=RS_ZK_REGION_SPLITTING, servername=juno.apache.org,39424,1339093992166, createTime=1339094002792, payload.length=0 ... 2012-06-07 18:33:47,887 DEBUG [ Thread -941] regionserver.TestSplitTransactionOnCluster(482): Waiting on region to split 2012-06-07 18:33:47,922 DEBUG [RegionServer:8;juno.apache.org,43570,1339094025325-splits-1339094027483] regionserver.HRegion(463): Instantiated testMasterRestartAtRegionSplitPendingCatalogJanitor,,1339094027484.23694c0a5312f5801dfd5a2857cc3556. 2012-06-07 18:33:23,648 DEBUG [RegionServer:0;juno.apache.org,39424,1339093992166-splits-1339094002786] regionserver.HRegion(463): Instantiated ephemeral,mnk,1339094002786.b5c2d9c3e0939c583f874e3efd51b478. 2012-06-07 18:33:23,680 INFO [RegionServer:0;juno.apache.org,39424,1339093992166-splits-1339094002786] catalog.MetaEditor(191): Offlined parent region ephemeral,,1339093997065.73830568ee93434ba97f7b5ade48ae30. in META We can see that region 73830568ee93434ba97f7b5ade48ae30 didn't finish splitting after the last 'Waiting on region to split' was printed. In split() method: while (ProtobufUtil.getOnlineRegions(server).size() <= regionCount) { LOG.debug( "Waiting on region to split" ); I think the above method should be improved: if a region is moved onto server, the loop would exit but number of daughter regions wouldn't be 2.
        Hide
        ramkrishna.s.vasudevan added a comment -

        This happens because the tests before the failed test does some regionserver abort.
        What happens is sometimes the HLog split takes time and it happens only when testShutdownFixupWhenDaughterHasSplit is going on. So the regions count changes thus failing the testcase.
        One simple way is to stop and start the cluster every time.

        Show
        ramkrishna.s.vasudevan added a comment - This happens because the tests before the failed test does some regionserver abort. What happens is sometimes the HLog split takes time and it happens only when testShutdownFixupWhenDaughterHasSplit is going on. So the regions count changes thus failing the testcase. One simple way is to stop and start the cluster every time.
        Hide
        Ted Yu added a comment -
        Show
        Ted Yu added a comment - The test failure has become almost consistent in Hadoop QA builds. Here is one recent example: https://builds.apache.org/job/PreCommit-HBASE-Build/1894//testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/
        Hide
        Uma Maheswara Rao G added a comment -

        Was this seen on trunk build ?

        Yes, Ted. this is on trunk.

        Show
        Uma Maheswara Rao G added a comment - Was this seen on trunk build ? Yes, Ted. this is on trunk.
        Hide
        Uma Maheswara Rao G added a comment -

        Hi Ted,
        I just ran this test locally several times. Tests failed 2 times and all the other times they passed.

        I attached the log when running this test.

        I am also looking into it. If you get some clue, please let me know.

        Show
        Uma Maheswara Rao G added a comment - Hi Ted, I just ran this test locally several times. Tests failed 2 times and all the other times they passed. I attached the log when running this test. I am also looking into it. If you get some clue, please let me know.
        Hide
        Ted Yu added a comment -

        @Uma:
        Can you attach test output to this issue ?
        Was this seen on trunk build ?

        Show
        Ted Yu added a comment - @Uma: Can you attach test output to this issue ? Was this seen on trunk build ?

          People

          • Assignee:
            Ted Yu
            Reporter:
            Uma Maheswara Rao G
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development