Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8764

Some MasterMonitorCallable should retry

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.95.1
    • 0.95.2
    • IPC/RPC
    • None
    • Reviewed
    • Hide
      Add retrying of Master operations; helps when running hbase-it and chaos monkey kills
      Master or Master is not yet up ready to take on operations.

      Refactors ServerCallable. ServerCallable had a public call() method and then beside
      it a withRetries() and also a withoutRetries(). Confusing. Also the rpc retrying
      with its specific handling of server exception returns was not reusable buried down
      in ServerCallable guts.

      This patch moves the rpc retrying code out of ServerCallable into a utility
      RpcRetryingCaller class (A 'Caller' runs the 'Callable'). ServerCallable shrinks,
      implements a new RetryingCallable Interface, and becomes RegionServerCallable, a class
      that is just about Calling -- no rpc nor retries, a Callable class with added details
      on where the Callable is to be applied (table name and row), -- etc.

      This pattern is then applied to Master operations. Master operations were not retried
      previously. The Master operation Callables are now like RegionServerCallable (though
      they need to carry way less detail), implement RetryingCallable, and are passed to
      RpcRetryingCaller so they are retried. Changed some exceptions so they now implement
      DoNotRetryException because not all master operations should be retried.
      Show
      Add retrying of Master operations; helps when running hbase-it and chaos monkey kills Master or Master is not yet up ready to take on operations. Refactors ServerCallable. ServerCallable had a public call() method and then beside it a withRetries() and also a withoutRetries(). Confusing. Also the rpc retrying with its specific handling of server exception returns was not reusable buried down in ServerCallable guts. This patch moves the rpc retrying code out of ServerCallable into a utility RpcRetryingCaller class (A 'Caller' runs the 'Callable'). ServerCallable shrinks, implements a new RetryingCallable Interface, and becomes RegionServerCallable, a class that is just about Calling -- no rpc nor retries, a Callable class with added details on where the Callable is to be applied (table name and row), -- etc. This pattern is then applied to Master operations. Master operations were not retried previously. The Master operation Callables are now like RegionServerCallable (though they need to carry way less detail), implement RetryingCallable, and are passed to RpcRetryingCaller so they are retried. Changed some exceptions so they now implement DoNotRetryException because not all master operations should be retried.

    Description

      Calls in the admin that only get status should re-try.

      got a call stack like:

      org.apache.hadoop.hbase.exceptions.PleaseHoldException: org.apache.hadoop.hbase.exceptions.PleaseHoldException: Master is initializing
      	at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2266)
      	at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1610)
      	at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1646)
      	at org.apache.hadoop.hbase.protobuf.generated.MasterAdminProtos$MasterAdminService$2.callBlockingMethod(MasterAdminProtos.java:20930)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122)
      	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829)
      
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
      	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
      	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
      	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:2705)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.execute(HBaseAdmin.java:2674)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:524)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:417)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:349)
      	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.createSchema(IntegrationTestBigLinkedList.java:437)
      	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.runGenerator(IntegrationTestBigLinkedList.java:471)
      	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.run(IntegrationTestBigLinkedList.java:505)
      	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.runGenerator(IntegrationTestBigLinkedList.java:698)
      	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.run(IntegrationTestBigLinkedList.java:748)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedListWithChaosMonkey.testContinuousIngest(IntegrationTestBigLinkedListWithChaosMonkey.java:80)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:601)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.junit.runners.Suite.runChild(Suite.java:127)
      	at org.junit.runners.Suite.runChild(Suite.java:26)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:138)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:117)
      	at org.apache.hadoop.hbase.IntegrationTestsDriver.doWork(IntegrationTestsDriver.java:111)
      	at org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:108)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      	at org.apache.hadoop.hbase.IntegrationTestsDriver.main(IntegrationTestsDriver.java:47)
      

      Attachments

        1. 8764.txt
          80 kB
          Michael Stack
        2. 8764v2.txt
          85 kB
          Michael Stack
        3. 8764v3.txt
          110 kB
          Michael Stack
        4. 8764v4.txt
          113 kB
          Michael Stack
        5. 8796v5.txt
          114 kB
          Michael Stack
        6. 8796v7.txt
          119 kB
          Michael Stack

        Activity

          People

            stack Michael Stack
            eclark Elliott Neil Clark
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: