Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12072

Standardize retry handling for master operations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.98.6
    • 1.0.0, 0.99.2
    • None
    • None
    • Reviewed

    Description

      For master requests, there are two retry mechanisms in effect. The first one is from HBaseAdmin.executeCallable()

        private <V> V executeCallable(MasterCallable<V> callable) throws IOException {
          RpcRetryingCaller<V> caller = rpcCallerFactory.newCaller();
          try {
            return caller.callWithRetries(callable);
          } finally {
            callable.close();
          }
        }
      

      And inside, the other one is from StubMaker.makeStub():

      /**
             * Create a stub against the master.  Retry if necessary.
             * @return A stub to do <code>intf</code> against the master
             * @throws MasterNotRunningException
             */
            @edu.umd.cs.findbugs.annotations.SuppressWarnings (value="SWL_SLEEP_WITH_LOCK_HELD")
            Object makeStub() throws MasterNotRunningException {
      

      The tests will just hang for 10 min * 35 ~= 6hours.

      2014-09-23 16:19:05,151 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 1 of 35 failed; retrying after sleep of 100, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:05,253 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 2 of 35 failed; retrying after sleep of 200, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:05,456 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 3 of 35 failed; retrying after sleep of 300, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:05,759 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 4 of 35 failed; retrying after sleep of 500, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:06,262 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 5 of 35 failed; retrying after sleep of 1008, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:07,273 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 6 of 35 failed; retrying after sleep of 2011, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:09,286 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 7 of 35 failed; retrying after sleep of 4012, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:13,303 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 8 of 35 failed; retrying after sleep of 10033, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:23,343 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 9 of 35 failed; retrying after sleep of 10089, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:33,439 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 10 of 35 failed; retrying after sleep of 10027, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:43,473 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 11 of 35 failed; retrying after sleep of 10004, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:19:53,485 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 12 of 35 failed; retrying after sleep of 20160, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:20:13,656 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 13 of 35 failed; retrying after sleep of 20006, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:20:33,675 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 14 of 35 failed; retrying after sleep of 20076, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:20:53,762 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 15 of 35 failed; retrying after sleep of 20077, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:21:13,852 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 16 of 35 failed; retrying after sleep of 20103, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:21:33,967 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 17 of 35 failed; retrying after sleep of 20136, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:21:54,115 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 18 of 35 failed; retrying after sleep of 20147, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:22:14,274 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 19 of 35 failed; retrying after sleep of 20131, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:22:34,417 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 20 of 35 failed; retrying after sleep of 20171, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:22:54,601 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 21 of 35 failed; retrying after sleep of 20177, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:23:14,790 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 22 of 35 failed; retrying after sleep of 20193, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:23:34,996 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 23 of 35 failed; retrying after sleep of 20195, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:23:55,203 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 24 of 35 failed; retrying after sleep of 20107, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:24:15,322 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 25 of 35 failed; retrying after sleep of 20186, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:24:35,520 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 26 of 35 failed; retrying after sleep of 20106, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:24:55,638 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 27 of 35 failed; retrying after sleep of 20173, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:25:15,824 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 28 of 35 failed; retrying after sleep of 20136, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:25:35,973 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 29 of 35 failed; retrying after sleep of 20188, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:25:56,174 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 30 of 35 failed; retrying after sleep of 20144, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:26:16,330 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 31 of 35 failed; retrying after sleep of 20106, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:26:36,448 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 32 of 35 failed; retrying after sleep of 20003, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:26:56,463 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 33 of 35 failed; retrying after sleep of 20114, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:16,590 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 34 of 35 failed; retrying after sleep of 20154, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:36,756 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 35 of 35 failed; no more retrying.
      java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      	at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:114)
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1554)
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1599)
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1653)
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1860)
      	at org.apache.hadoop.hbase.client.HBaseAdmin$MasterCallable.prepare(HBaseAdmin.java:3359)
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:122)
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:92)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3386)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:2201)
      	at org.apache.hadoop.hbase.DistributedHBaseCluster.getClusterStatus(DistributedHBaseCluster.java:74)
      	at org.apache.hadoop.hbase.DistributedHBaseCluster.<init>(DistributedHBaseCluster.java:57)
      	at org.apache.hadoop.hbase.IntegrationTestingUtility.createDistributedHBaseCluster(IntegrationTestingUtility.java:140)
      	at org.apache.hadoop.hbase.IntegrationTestingUtility.initializeCluster(IntegrationTestingUtility.java:75)
      	at org.apache.hadoop.hbase.IntegrationTestManyRegions.setUp(IntegrationTestManyRegions.java:80)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.junit.runners.Suite.runChild(Suite.java:127)
      	at org.junit.runners.Suite.runChild(Suite.java:26)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
      	at org.apache.hadoop.hbase.IntegrationTestsDriver.doWork(IntegrationTestsDriver.java:110)
      	at org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      	at org.apache.hadoop.hbase.IntegrationTestsDriver.main(IntegrationTestsDriver.java:46)
      2014-09-23 16:27:37,061 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 1 of 35 failed; retrying after sleep of 100, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:37,163 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 2 of 35 failed; retrying after sleep of 200, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:37,365 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 3 of 35 failed; retrying after sleep of 301, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:37,669 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 4 of 35 failed; retrying after sleep of 504, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:38,176 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 5 of 35 failed; retrying after sleep of 1008, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:39,185 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 6 of 35 failed; retrying after sleep of 2018, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:41,207 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 7 of 35 failed; retrying after sleep of 4019, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:45,231 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 8 of 35 failed; retrying after sleep of 10004, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:27:55,241 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 9 of 35 failed; retrying after sleep of 10005, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:28:05,253 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 10 of 35 failed; retrying after sleep of 10099, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:28:15,359 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 11 of 35 failed; retrying after sleep of 10059, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:28:25,425 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 12 of 35 failed; retrying after sleep of 20069, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:28:45,507 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 13 of 35 failed; retrying after sleep of 20006, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:29:05,525 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 14 of 35 failed; retrying after sleep of 20186, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:29:25,723 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 15 of 35 failed; retrying after sleep of 20080, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:29:45,814 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 16 of 35 failed; retrying after sleep of 20001, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:30:05,826 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 17 of 35 failed; retrying after sleep of 20019, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:30:25,857 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 18 of 35 failed; retrying after sleep of 20159, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:30:46,028 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 19 of 35 failed; retrying after sleep of 20170, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:31:06,211 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 20 of 35 failed; retrying after sleep of 20146, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:31:26,368 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 21 of 35 failed; retrying after sleep of 20138, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:31:46,518 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 22 of 35 failed; retrying after sleep of 20140, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:32:06,670 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 23 of 35 failed; retrying after sleep of 20196, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:32:26,878 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 24 of 35 failed; retrying after sleep of 20123, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      2014-09-23 16:32:47,013 INFO  [main] client.ConnectionManager$HConnectionImplementation: getMaster attempt 25 of 35 failed; retrying after sleep of 20033, exception=java.io.IOException: Can't get master address from ZooKeeper; znode data == null
      

      Attachments

        1. 12072-v1.txt
          2 kB
          Ted Yu
        2. 12072-v2.txt
          0.9 kB
          Ted Yu
        3. hbase-12072_v1.patch
          89 kB
          Enis Soztutar
        4. hbase-12072_v2.patch
          97 kB
          Michael Stack
        5. hbase-12072_v2.patch
          97 kB
          Enis Soztutar
        6. hbase-12072_v3.patch
          96 kB
          Enis Soztutar

        Issue Links

          Activity

            People

              enis Enis Soztutar
              enis Enis Soztutar
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: