Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-9720

TestSplitTransactionOnCluster#testShutdownFixupWhenDaughterHasSplit occasionally times out

    XMLWordPrintableJSON

Details

    • Test
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.98.0, 0.96.1
    • None
    • None
    • Reviewed

    Description

      From https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/779/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/ :

      java.lang.AssertionError: Waited too long for split
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.junit.Assert.assertFalse(Assert.java:64)
      	at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.split(TestSplitTransactionOnCluster.java:1065)
      	at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit(TestSplitTransactionOnCluster.java:442)
      ...
      2013-10-05 13:00:18,060 DEBUG [RS:2;quirinus:46584-smallCompactions-1380978003766] regionserver.HRegionFileSystem(338): Committing store file hdfs://localhost:45166/user/jenkins/hbase/data/default/testShutdownFixupWhenDaughterHasSplit/0d7218d1ce3bd629779009821908a3ed/.tmp/8b155b635b304a368e11dbd675d09312 as hdfs://localhost:45166/user/jenkins/hbase/data/default/testShutdownFixupWhenDaughterHasSplit/0d7218d1ce3bd629779009821908a3ed/info/8b155b635b304a368e11dbd675d09312
      2013-10-05 13:00:18,436 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(310): master:48355-0x14188b3d7940000 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/balancer
      2013-10-05 13:00:18,436 DEBUG [AM.ZK.Worker-pool2-t11] master.AssignmentManager(818): Handling RS_ZK_REGION_SPLITTING, server=quirinus.apache.org,46584,1380977990795, region=e29b00c3bdaa3e10f6c4fe252a82399f, current_state={e29b00c3bdaa3e10f6c4fe252a82399f state=SPLITTING, ts=1380978012468, server=quirinus.apache.org,46584,1380977990795}
      

      We can see the following stack traces at the end of test output:

      Potentially hanging thread: RS:2;quirinus:46584-smallCompactions-1380978003766
        java.lang.Object.wait(Native Method)
        java.lang.Object.wait(Object.java:485)
        org.apache.hadoop.ipc.Client.call(Client.java:1333)
        org.apache.hadoop.ipc.Client.call(Client.java:1300)
        org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        com.sun.proxy.$Proxy17.rename(Unknown Source)
        sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        java.lang.reflect.Method.invoke(Method.java:597)
        org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
        org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        com.sun.proxy.$Proxy17.rename(Unknown Source)
        org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:396)
        sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        java.lang.reflect.Method.invoke(Method.java:597)
        org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
        com.sun.proxy.$Proxy22.rename(Unknown Source)
        org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1512)
        org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:528)
        org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:210)
        org.apache.hadoop.hbase.regionserver.HRegionFileSystem.rename(HRegionFileSystem.java:924)
        org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitStoreFile(HRegionFileSystem.java:340)
        org.apache.hadoop.hbase.regionserver.HRegionFileSystem.commitStoreFile(HRegionFileSystem.java:312)
        org.apache.hadoop.hbase.regionserver.HStore.moveFileIntoPlace(HStore.java:1032)
        org.apache.hadoop.hbase.regionserver.HStore.moveCompatedFilesIntoPlace(HStore.java:1018)
        org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1001)
        org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1287)
        org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:475)
      ...
      Potentially hanging thread: RS:2;quirinus:46584-splits-1380978002577
        java.lang.Object.wait(Native Method)
        java.lang.Object.wait(Object.java:485)
        org.apache.hadoop.ipc.Client.call(Client.java:1333)
        org.apache.hadoop.ipc.Client.call(Client.java:1300)
        org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        com.sun.proxy.$Proxy17.mkdirs(Unknown Source)
        sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        java.lang.reflect.Method.invoke(Method.java:597)
        org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
        org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        com.sun.proxy.$Proxy17.mkdirs(Unknown Source)
        org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:467)
        sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        java.lang.reflect.Method.invoke(Method.java:597)
        org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
        com.sun.proxy.$Proxy22.mkdirs(Unknown Source)
        org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2350)
        org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2321)
        org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:828)
        org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:824)
        org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78)
        org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:824)
        org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:817)
        org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:277)
        org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1929)
        org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createDir(HRegionFileSystem.java:902)
        org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createSplitsDir(HRegionFileSystem.java:505)
        org.apache.hadoop.hbase.regionserver.SplitTransaction.stepsBeforePONR(SplitTransaction.java:322)
        org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:236)
        org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:500)
        org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:82)
      

      Looks like the timeout was due to in complete compaction causing split to take longer.

      Attachments

        1. 9720-v1.txt
          0.9 kB
          Ted Yu

        Activity

          People

            yuzhihong@gmail.com Ted Yu
            yuzhihong@gmail.com Ted Yu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: