Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.1.0
    • None
    • test
    • None

    Description

      In TestMetaWithReplicas, start and shutdown of mini cluster is done at start and end of every test in that class respectively, which makes the test class to take more time to complete. Instead we can start and stop the mini cluster only once per the class.

      Attachments

        1. HBASE-13659-branch-1.1-v1.patch
          4 kB
          Ashish Singhi
        2. HBASE-13659-branch-1.1.patch
          5 kB
          Ashish Singhi
        3. org.apache.hadoop.hbase.client.TestMetaWithReplicas-output.txt
          655 kB
          Nick Dimiduk
        4. HBASE-13659.patch
          3 kB
          Ashish Singhi

        Activity

          ashish singhi Ashish Singhi added a comment -

          I have closed this as Won't Fix, if it should be something else just let me know.
          Thanks.

          ashish singhi Ashish Singhi added a comment - I have closed this as Won't Fix, if it should be something else just let me know. Thanks.
          ddas Devaraj Das added a comment -

          I guess the test depends on resetting the cluster each round

          I think so too but it has been a while since I wrote those tests..

          ddas Devaraj Das added a comment - I guess the test depends on resetting the cluster each round I think so too but it has been a while since I wrote those tests..
          ndimiduk Nick Dimiduk added a comment -

          I've run this test without the patch a bunch of times locally and it's passing consistently. I guess the test depends on resetting the cluster each round.

          ndimiduk Nick Dimiduk added a comment - I've run this test without the patch a bunch of times locally and it's passing consistently. I guess the test depends on resetting the cluster each round.
          ndimiduk Nick Dimiduk added a comment -

          adding

              RegionLocator metaLoc = TEST_UTIL.getConnection().getRegionLocator(TableName.META_TABLE_NAME);
              LOG.info("testMetaAddressChange -- metaLocator says "
                      + metaLoc.getRegionLocation(null).getServerName().getServerName());
          

          It now seems the way the test parses meta location and the location returned by RegionLocator instance disagree. Maybe RegionLocator is not looking specifically for the first replica?

          2015-07-01 17:57:53,135 INFO  [main] client.TestMetaWithReplicas(340): testMetaAddressChange -- starting test.
          2015-07-01 17:57:53,136 INFO  [main] client.TestMetaWithReplicas(350): testMetaAddressChange -- parsed meta location is 10.0.0.110,59702,1435798645712
          2015-07-01 17:57:53,136 INFO  [main] client.TestMetaWithReplicas(352): testMetaAddressChange -- metaLocator says 10.0.0.110,59570,1435798598482
          

          Test looks for meta with

              ZooKeeperWatcher zkw = TEST_UTIL.getZooKeeperWatcher();
              String baseZNode = conf.get(HConstants.ZOOKEEPER_ZNODE_PARENT,
                  HConstants.DEFAULT_ZOOKEEPER_ZNODE_PARENT);
              String primaryMetaZnode = ZKUtil.joinZNode(baseZNode,
                  conf.get("zookeeper.znode.metaserver", "meta-region-server"));
          

          while ZooKeeperWatcher appears to use

                str = ZKUtil.joinZNode(baseZNode,
                    conf.get("zookeeper.znode.metaserver", "meta-region-server") + "-" + replicaId);
          

          Looking at logic in MetaTableLocator, it seems to specify a default replicaId of 1, which means it'll always be going to a "-" + replicaId location instead of the bare location used in the test.

          ndimiduk Nick Dimiduk added a comment - adding RegionLocator metaLoc = TEST_UTIL.getConnection().getRegionLocator(TableName.META_TABLE_NAME); LOG.info("testMetaAddressChange -- metaLocator says " + metaLoc.getRegionLocation(null).getServerName().getServerName()); It now seems the way the test parses meta location and the location returned by RegionLocator instance disagree. Maybe RegionLocator is not looking specifically for the first replica? 2015-07-01 17:57:53,135 INFO [main] client.TestMetaWithReplicas(340): testMetaAddressChange -- starting test. 2015-07-01 17:57:53,136 INFO [main] client.TestMetaWithReplicas(350): testMetaAddressChange -- parsed meta location is 10.0.0.110,59702,1435798645712 2015-07-01 17:57:53,136 INFO [main] client.TestMetaWithReplicas(352): testMetaAddressChange -- metaLocator says 10.0.0.110,59570,1435798598482 Test looks for meta with ZooKeeperWatcher zkw = TEST_UTIL.getZooKeeperWatcher(); String baseZNode = conf.get(HConstants.ZOOKEEPER_ZNODE_PARENT, HConstants.DEFAULT_ZOOKEEPER_ZNODE_PARENT); String primaryMetaZnode = ZKUtil.joinZNode(baseZNode, conf.get("zookeeper.znode.metaserver", "meta-region-server")); while ZooKeeperWatcher appears to use str = ZKUtil.joinZNode(baseZNode, conf.get("zookeeper.znode.metaserver", "meta-region-server") + "-" + replicaId); Looking at logic in MetaTableLocator , it seems to specify a default replicaId of 1, which means it'll always be going to a "-" + replicaId location instead of the bare location used in the test.
          ddas Devaraj Das added a comment -

          Let me take a look at it later tonight. I think when I wrote the tests I deliberately did it this way - start/shutdown cluster for each test, maybe because I was mucking around with ZK. Not sure why..

          ddas Devaraj Das added a comment - Let me take a look at it later tonight. I think when I wrote the tests I deliberately did it this way - start/shutdown cluster for each test, maybe because I was mucking around with ZK. Not sure why..
          ndimiduk Nick Dimiduk added a comment -

          Ashish Singhi I think your patch makes things better than they were, certainly the conf.setInt(ServerManager.WAIT_ON_REGIONSERVERS_MINTOSTART, 2); bit.

          I'm seeing consistent failure at like 370 as well. Adding some extra logging,

          2015-07-01 17:36:34,023 INFO  [main] client.TestMetaWithReplicas(350): testMetaAddressChange -- i think meta is on 10.0.0.110,59059,1435797366643
          ...
          2015-07-01 17:36:35,686 INFO  [main] client.TestMetaWithReplicas(367): testMetaAddressChange -- sending move request of 1588230740 to 10.0.0.110,58926,1435797319567
          2015-07-01 17:36:35,687 DEBUG [B.defaultRpcServer.handler=0,queue=0,port=59142] master.HMaster(1402): Skipping move of region hbase:meta,,1.1588230740 because region already assigned to the same server 10.0.0.110,58926,1435797319567.
          

          In between here and there there's no mention of 1588230740. This is failing consistently for me locally.

          No comment here Devaraj Kavali, Enis Soztutar?

          ndimiduk Nick Dimiduk added a comment - Ashish Singhi I think your patch makes things better than they were, certainly the conf.setInt(ServerManager.WAIT_ON_REGIONSERVERS_MINTOSTART, 2); bit. I'm seeing consistent failure at like 370 as well. Adding some extra logging, 2015-07-01 17:36:34,023 INFO [main] client.TestMetaWithReplicas(350): testMetaAddressChange -- i think meta is on 10.0.0.110,59059,1435797366643 ... 2015-07-01 17:36:35,686 INFO [main] client.TestMetaWithReplicas(367): testMetaAddressChange -- sending move request of 1588230740 to 10.0.0.110,58926,1435797319567 2015-07-01 17:36:35,687 DEBUG [B.defaultRpcServer.handler=0,queue=0,port=59142] master.HMaster(1402): Skipping move of region hbase:meta,,1.1588230740 because region already assigned to the same server 10.0.0.110,58926,1435797319567. In between here and there there's no mention of 1588230740 . This is failing consistently for me locally. No comment here Devaraj Kavali , Enis Soztutar ?
          ashish singhi Ashish Singhi added a comment -

          Still there is a flakey test

          Flaked tests: 
          org.apache.hadoop.hbase.client.TestMetaWithReplicas.testMetaAddressChange(org.apache.hadoop.hbase.client.TestMetaWithReplicas)
            Run 1: TestMetaWithReplicas.testMetaAddressChange:370 null
            Run 2: PASS
          
          ashish singhi Ashish Singhi added a comment - Still there is a flakey test Flaked tests: org.apache.hadoop.hbase.client.TestMetaWithReplicas.testMetaAddressChange(org.apache.hadoop.hbase.client.TestMetaWithReplicas) Run 1: TestMetaWithReplicas.testMetaAddressChange:370 null Run 2: PASS
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12740364/HBASE-13659-branch-1.1-v1.patch
          against branch-1.1 branch at commit 51b606cd185437802f0a7a4620f1434e8e2d9c74.
          ATTACHMENT ID: 12740364

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 checkstyle. The applied patch does not increase the total number of checkstyle errors

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn post-site goal to fail.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportExport
          org.apache.hadoop.hbase.util.TestProcessBasedCluster

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//testReport/
          Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//artifact/patchprocess/newFindbugsWarnings.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//artifact/patchprocess/checkstyle-aggregate.html

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12740364/HBASE-13659-branch-1.1-v1.patch against branch-1.1 branch at commit 51b606cd185437802f0a7a4620f1434e8e2d9c74. ATTACHMENT ID: 12740364 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 checkstyle . The applied patch does not increase the total number of checkstyle errors +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn post-site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportExport org.apache.hadoop.hbase.util.TestProcessBasedCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14460//console This message is automatically generated.
          ashish singhi Ashish Singhi added a comment -

          I have attached another patch v1 for branch-1.1. In this instead of starting RS again at the end of testShutdownOfReplicaHolder I have set the conf hbase.master.wait.on.regionservers.mintostart to 2. With this all the tests in this class are passing 5/5 times. With the earlier patch for branch-1.1 there were some tests which were flakey

          java.lang.AssertionError: null
          	at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testMetaAddressChange(TestMetaWithReplicas.java:368)
          
          testHBaseFsckWithMetaReplicas(org.apache.hadoop.hbase.client.TestMetaWithReplicas)  Time elapsed: 0.234 sec  <<< FAILURE!
          java.lang.AssertionError: expected:<[]> but was:<[MULTI_META_REGION, UNKNOWN]>
          	at org.junit.Assert.fail(Assert.java:88)
          	at org.junit.Assert.failNotEquals(Assert.java:743)
          	at org.junit.Assert.assertEquals(Assert.java:118)
          	at org.junit.Assert.assertEquals(Assert.java:144)
          	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.assertNoErrors(HbckTestingUtil.java:91)
          	at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testHBaseFsckWithMetaReplicas(TestMetaWithReplicas.java:279)
          
          testHBaseFsckWithExcessMetaReplicas(org.apache.hadoop.hbase.client.TestMetaWithReplicas)  Time elapsed: 1.29 sec  <<< FAILURE!
          java.lang.AssertionError: expected:<[UNKNOWN, SHOULD_NOT_BE_DEPLOYED]> but was:<[UNKNOWN, SHOULD_NOT_BE_DEPLOYED, MULTI_META_REGION]>
          	at org.junit.Assert.fail(Assert.java:88)
          	at org.junit.Assert.failNotEquals(Assert.java:743)
          	at org.junit.Assert.assertEquals(Assert.java:118)
          	at org.junit.Assert.assertEquals(Assert.java:144)
          	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.assertErrors(HbckTestingUtil.java:99)
          	at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testHBaseFsckWithExcessMetaReplicas(TestMetaWithReplicas.java:412)
          
          testHBaseFsckWithFewerMetaReplicas(org.apache.hadoop.hbase.client.TestMetaWithReplicas)  Time elapsed: 1.265 sec  <<< FAILURE!
          java.lang.AssertionError: expected:<[UNKNOWN, NO_META_REGION]> but was:<[UNKNOWN, NO_META_REGION, SHOULD_NOT_BE_DEPLOYED, MULTI_META_REGION]>
          	at org.junit.Assert.fail(Assert.java:88)
          	at org.junit.Assert.failNotEquals(Assert.java:743)
          	at org.junit.Assert.assertEquals(Assert.java:118)
          	at org.junit.Assert.assertEquals(Assert.java:144)
          	at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.assertErrors(HbckTestingUtil.java:99)
          	at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testHBaseFsckWithFewerMetaReplicas(TestMetaWithReplicas.java:292)
          
          testMetaLookupThreadPoolCreated(org.apache.hadoop.hbase.client.TestMetaWithReplicas)  Time elapsed: 1.301 sec  <<< ERROR!
          org.apache.hadoop.hbase.TableNotFoundException: Table 'testMetaLookupThreadPoolCreated' was not found, got: hbase:namespace.
          	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1274)
          	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155)
          	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1126)
          	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1110)
          	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1132)
          	at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testMetaLookupThreadPoolCreated(TestMetaWithReplicas.java:234)
          

          I could not find out the fix for this as I am not much aware of read replica feature
          Devaraj Kavali, Enis Soztutar, Nick Dimiduk... can you help on this ?

          ashish singhi Ashish Singhi added a comment - I have attached another patch v1 for branch-1.1. In this instead of starting RS again at the end of testShutdownOfReplicaHolder I have set the conf hbase.master.wait.on.regionservers.mintostart to 2. With this all the tests in this class are passing 5/5 times. With the earlier patch for branch-1.1 there were some tests which were flakey java.lang.AssertionError: null at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testMetaAddressChange(TestMetaWithReplicas.java:368) testHBaseFsckWithMetaReplicas(org.apache.hadoop.hbase.client.TestMetaWithReplicas) Time elapsed: 0.234 sec <<< FAILURE! java.lang.AssertionError: expected:<[]> but was:<[MULTI_META_REGION, UNKNOWN]> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.assertNoErrors(HbckTestingUtil.java:91) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testHBaseFsckWithMetaReplicas(TestMetaWithReplicas.java:279) testHBaseFsckWithExcessMetaReplicas(org.apache.hadoop.hbase.client.TestMetaWithReplicas) Time elapsed: 1.29 sec <<< FAILURE! java.lang.AssertionError: expected:<[UNKNOWN, SHOULD_NOT_BE_DEPLOYED]> but was:<[UNKNOWN, SHOULD_NOT_BE_DEPLOYED, MULTI_META_REGION]> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.assertErrors(HbckTestingUtil.java:99) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testHBaseFsckWithExcessMetaReplicas(TestMetaWithReplicas.java:412) testHBaseFsckWithFewerMetaReplicas(org.apache.hadoop.hbase.client.TestMetaWithReplicas) Time elapsed: 1.265 sec <<< FAILURE! java.lang.AssertionError: expected:<[UNKNOWN, NO_META_REGION]> but was:<[UNKNOWN, NO_META_REGION, SHOULD_NOT_BE_DEPLOYED, MULTI_META_REGION]> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.assertErrors(HbckTestingUtil.java:99) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testHBaseFsckWithFewerMetaReplicas(TestMetaWithReplicas.java:292) testMetaLookupThreadPoolCreated(org.apache.hadoop.hbase.client.TestMetaWithReplicas) Time elapsed: 1.301 sec <<< ERROR! org.apache.hadoop.hbase.TableNotFoundException: Table 'testMetaLookupThreadPoolCreated' was not found, got: hbase:namespace. at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1274) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1126) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1110) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1132) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testMetaLookupThreadPoolCreated(TestMetaWithReplicas.java:234) I could not find out the fix for this as I am not much aware of read replica feature Devaraj Kavali , Enis Soztutar , Nick Dimiduk ... can you help on this ?
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12740315/HBASE-13659-branch-1.1.patch
          against branch-1.1 branch at commit 41d9e8d9b4895d0711f006d926a39e5ae3bd7c9d.
          ATTACHMENT ID: 12740315

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 checkstyle. The applied patch does not increase the total number of checkstyle errors

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn post-site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//testReport/
          Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//artifact/patchprocess/newFindbugsWarnings.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//artifact/patchprocess/checkstyle-aggregate.html

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12740315/HBASE-13659-branch-1.1.patch against branch-1.1 branch at commit 41d9e8d9b4895d0711f006d926a39e5ae3bd7c9d. ATTACHMENT ID: 12740315 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 checkstyle . The applied patch does not increase the total number of checkstyle errors +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn post-site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14452//console This message is automatically generated.
          ashish singhi Ashish Singhi added a comment -

          Thanks Nick Dimiduk for looking into this.
          testShutdownHandling was failing because as you said here. But then checked why we have only 2 online RS in the cluster ? I found that in testShutdownOfReplicaHolder we are killing a RS but not starting it back. So now we are left with only 2 RS online in the cluster but master will keep on wait for 3(minimum) RS to become online.
          Attached patch for branch-1.1.
          But looks like it is not failing in master branch but better we can commit the same branch-1.1 patch in master branch also.

          Please review.

          ashish singhi Ashish Singhi added a comment - Thanks Nick Dimiduk for looking into this. testShutdownHandling was failing because as you said here . But then checked why we have only 2 online RS in the cluster ? I found that in testShutdownOfReplicaHolder we are killing a RS but not starting it back. So now we are left with only 2 RS online in the cluster but master will keep on wait for 3(minimum) RS to become online. Attached patch for branch-1.1. But looks like it is not failing in master branch but better we can commit the same branch-1.1 patch in master branch also. Please review.
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12740165/org.apache.hadoop.hbase.client.TestMetaWithReplicas-output.txt
          against master branch at commit 623fd63827b2953c150597f24c7205737119bebe.
          ATTACHMENT ID: 12740165

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 179 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14451//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12740165/org.apache.hadoop.hbase.client.TestMetaWithReplicas-output.txt against master branch at commit 623fd63827b2953c150597f24c7205737119bebe. ATTACHMENT ID: 12740165 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 179 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14451//console This message is automatically generated.
          ndimiduk Nick Dimiduk added a comment -

          Pushing out of 1.1.1 for now.

          ndimiduk Nick Dimiduk added a comment - Pushing out of 1.1.1 for now.
          ndimiduk Nick Dimiduk added a comment -

          Looks like it's getting stuck replaying logs to recover a killed RS, meanwhile master just hangs waiting for minimum number of RS's to rejoin cluster.

          Attaching test run log.

          ndimiduk Nick Dimiduk added a comment - Looks like it's getting stuck replaying logs to recover a killed RS, meanwhile master just hangs waiting for minimum number of RS's to rejoin cluster. Attaching test run log.
          ndimiduk Nick Dimiduk added a comment -

          Hi Ashish Singhi I applied your patch here to master, works fine. Brought it back to branch-1 and I'm seeing it consistently hang.

          From jstack

          "main" prio=5 tid=0x00007fe8e980b800 nid=0x1903 waiting on condition [0x000000010cd5e000]
             java.lang.Thread.State: TIMED_WAITING (sleeping)
                  at java.lang.Thread.sleep(Native Method)
                  at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146)
                  at org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:485)
                  at org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:205)
                  at org.apache.hadoop.hbase.client.TestMetaWithReplicas.shutdownMetaAndDoValidations(TestMetaWithReplicas.java:221)
                  at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:145)
          

          From the test logs, I see

          2015-06-17 10:53:09,602 WARN  [main] regionserver.HRegionServer(2063): Unable to report fatal error to master
          com.google.protobuf.ServiceException: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to /10.0.0.110:50399 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to /10.0.0.110:50399 is closing. Call id=47, waitTime=1
          	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:224)
          	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:288)
          	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:9006)
          	at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2060)
          	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174)
          	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$200(MiniHBaseCluster.java:108)
          	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:356)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
          	at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:306)
          	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165)
          	at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2072)
          	at org.apache.hadoop.hbase.regionserver.HRegionServer.kill(HRegionServer.java:2087)
          	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.kill(MiniHBaseCluster.java:161)
          	at org.apache.hadoop.hbase.MiniHBaseCluster.killRegionServer(MiniHBaseCluster.java:246)
          	at org.apache.hadoop.hbase.client.TestMetaWithReplicas.shutdownMetaAndDoValidations(TestMetaWithReplicas.java:201)
          	at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:145)
          
          ndimiduk Nick Dimiduk added a comment - Hi Ashish Singhi I applied your patch here to master, works fine. Brought it back to branch-1 and I'm seeing it consistently hang. From jstack "main" prio=5 tid=0x00007fe8e980b800 nid=0x1903 waiting on condition [0x000000010cd5e000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) at org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:485) at org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:205) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.shutdownMetaAndDoValidations(TestMetaWithReplicas.java:221) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:145) From the test logs, I see 2015-06-17 10:53:09,602 WARN [main] regionserver.HRegionServer(2063): Unable to report fatal error to master com.google.protobuf.ServiceException: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to /10.0.0.110:50399 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to /10.0.0.110:50399 is closing. Call id=47, waitTime=1 at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:224) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:288) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:9006) at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2060) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$200(MiniHBaseCluster.java:108) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:306) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165) at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2072) at org.apache.hadoop.hbase.regionserver.HRegionServer.kill(HRegionServer.java:2087) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.kill(MiniHBaseCluster.java:161) at org.apache.hadoop.hbase.MiniHBaseCluster.killRegionServer(MiniHBaseCluster.java:246) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.shutdownMetaAndDoValidations(TestMetaWithReplicas.java:201) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:145)
          ndimiduk Nick Dimiduk added a comment -

          LGTM. What say you Devaraj Kavali Enis Soztutar?

          ndimiduk Nick Dimiduk added a comment - LGTM. What say you Devaraj Kavali Enis Soztutar ?
          ashish singhi Ashish Singhi added a comment -

          In build #14007 it took 1min 50secs to complete the test where as the patch build i.e., #14008 took 1min 3secs.

          ashish singhi Ashish Singhi added a comment - In build #14007 it took 1min 50secs to complete the test where as the patch build i.e., #14008 took 1min 3secs.
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12731937/HBASE-13659.patch
          against master branch at commit 9aeafe30b7d932e562f803fd071812cd27aebaf8.
          ATTACHMENT ID: 12731937

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 hadoop versions. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0)

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 protoc. The applied patch does not increase the total number of protoc compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 checkstyle. The applied patch does not increase the total number of checkstyle errors

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//testReport/
          Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//artifact/patchprocess/newFindbugsWarnings.html
          Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//artifact/patchprocess/checkstyle-aggregate.html

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12731937/HBASE-13659.patch against master branch at commit 9aeafe30b7d932e562f803fd071812cd27aebaf8. ATTACHMENT ID: 12731937 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified tests. +1 hadoop versions . The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 protoc . The applied patch does not increase the total number of protoc compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 checkstyle . The applied patch does not increase the total number of checkstyle errors +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14008//console This message is automatically generated.

          People

            Unassigned Unassigned
            ashish singhi Ashish Singhi
            Votes:
            0 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment