Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12377

HBaseAdmin#deleteTable fails when META region is moved around the same timeframe

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.98.4
    • 0.99.2
    • Client
    • None
    • Reviewed

    Description

      This is the same issue that HBASE-10809 tried to address. The fix of HBASE-10809 refetch the latest meta location in retry-loop. However, there are 2 problems: (1). inside the retry loop, there is another try-catch block that would throw the exception before retry can kick in; (2). It looks like that HBaseAdmin::getFirstMetaServerForTable() always tries to get meta data from meta cache, which means if the meta cache is stale and out of date, retries would not solve the problem by fetching from the stale meta cache.

      Here is the call stack of the issue:

      2014-10-27 10:11:58,495|beaver.machine|INFO|18218|140065036261120|MainThread|org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on ip-172-31-0-48.ec2.internal,60020,1414403435009
      2014-10-27 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
      2014-10-27 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
      2014-10-27 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3156)
      2014-10-27 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
      2014-10-27 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
      2014-10-27 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
      2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
      2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
      2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at java.lang.Thread.run(Thread.java:745)
      2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|
      2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
      2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      2014-10-27 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      2014-10-27 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
      2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
      2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)
      2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:699)
      2014-10-27 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:654)
      2014-10-27 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at org.apache.hadoop.hbase.IntegrationTestManyRegions.tearDown(IntegrationTestManyRegions.java:99)
      

      The META region was Online in RS1 when the delete table starts, it was moved to RS2 during the delete table operation. And the problem appears.

      Attachments

        1. HBASE-12377.v3-2.0.patch
          6 kB
          Stephen Yuan Jiang
        2. HBASE-12377.v2-2.0.patch
          7 kB
          Stephen Yuan Jiang
        3. HBASE-12377.v1-2.0.patch
          4 kB
          Stephen Yuan Jiang

        Issue Links

          Activity

            People

              syuanjiang Stephen Yuan Jiang
              syuanjiang Stephen Yuan Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: