Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8531

TestZooKeeper fails in trunk/0.95 builds

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.98.0, 0.95.1
    • Zookeeper
    • None
    • Reviewed

    Description

      TestZooKeeper fails on occasion. I caught a good example recently. See below failure stack trace.

      It took me a while. I thought the issue had to do w/ our recent ipc refactorings but it looks like a problem we have always had. In short, MetaScanner is not handling DoNotRetryIOEs – it is letting them out. DNRIOEs when scanning are supposed to force a reset of the scan. HTable#next catches these and does the necessary scanner reset up. MetaScanner is running some subset of what HTable does when it is scanning except the part where it catches a DNRIOE and redoes the scan. Odd.

      TestZooKeeper failed in this instance because the test kills a regionserver at same time as we are trying to create a table. In create table we do a meta scan using MetaScanner passing a Visitor. The scan starts and gets a RegionServerStoppedException (This is NOT a DNRIOE – it should be – but later we convert it into one up in ScannerCallable).

      DNRIOEs are thrown to the upper layers to handle....

      Let me look into having MetaScanner just use HTable scanning. It makes an instance just to find where to start the scan... let me try using this instance for actually scanning.

      TODO: Do this convertion everywhere a DNRIOE could come out.

      Here is the stack trace

      org.apache.hadoop.hbase.exceptions.DoNotRetryIOException: Reset scanner
      	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:209)
      	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:52)
      	at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:170)
      	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:212)
      	at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
      	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:131)
      	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:128)
      	at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:398)
      	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:128)
      	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
      	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:81)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:448)
      	at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:348)
      	at org.apache.hadoop.hbase.TestZooKeeper.testSanity(TestZooKeeper.java:242)
      	at org.apache.hadoop.hbase.TestZooKeeper.testRegionServerSessionExpired(TestZooKeeper.java:203)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.junit.runners.Suite.runChild(Suite.java:127)
      	at org.junit.runners.Suite.runChild(Suite.java:26)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: Server p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
      	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
      	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
      	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:227)
      	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:175)
      	... 43 more
      Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: Server p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2310)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:2874)
      	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20577)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2103)
      	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1810)
      
      	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1336)
      	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532)
      	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1587)
      	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:21012)
      	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
      	... 43 more
      
      

      Attachments

        1. 8531.addendum.txt
          12 kB
          Michael Stack
        2. 8531v9.txt
          50 kB
          Michael Stack
        3. 8531v8.txt
          50 kB
          Michael Stack
        4. 8531v8.txt
          50 kB
          Michael Stack
        5. 8531v7.txt
          49 kB
          Michael Stack
        6. 8531v6.txt
          49 kB
          Michael Stack
        7. 8531v5.txt
          29 kB
          Michael Stack
        8. 8531v4.txt
          14 kB
          Michael Stack
        9. 8531.txt
          14 kB
          Michael Stack

        Activity

          People

            stack Michael Stack
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: