Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14327

TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 1.2.0, 0.98.15, 1.0.3, 1.1.3, 2.0.0
    • test
    • None
    • Reviewed

    Description

      I'm looking into some more of the flaky tests on trunk and this one seems to be particularly gross, failing about half the time in recent days. Some probably-relevant output from a recent run:

      2015-08-27 18:50:14,318 INFO  [main] hbase.TestIOFencing(326): Allowing compaction to proceed
      2015-08-27 18:50:14,318 DEBUG [main] hbase.TestIOFencing$CompactionBlockerRegion(110): allowing compactions
      2015-08-27 18:50:14,318 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732): Removing store files after compaction...
      2015-08-27 18:50:14,323 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732): Removing store files after compaction...
      2015-08-27 18:50:14,330 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224): Archiving compacted store files.
      2015-08-27 18:50:14,331 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224): Archiving compacted store files.
      2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464
      2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe
      2015-08-27 18:50:14,341 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
      2015-08-27 18:50:14,342 INFO  [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353): Completed compaction of 2 (all) file(s) in family of tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311. into e138bb0ec6c64ad19efab3b44dbbcb1a(size=68.7 K), total size for store is 146.9 K. This selection was in queue for 0sec, and took 10sec to execute.
      2015-08-27 18:50:14,343 INFO  [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.CompactSplitThread$CompactionRunner(527): Completed compaction: Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311., storeName=family, fileCount=2, fileSize=73.1 K, priority=998, time=525052314434020; duration=10sec
      2015-08-27 18:50:14,343 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b
      2015-08-27 18:50:14,347 DEBUG [RS_CLOSE_REGION-hemera:35619-0] regionserver.HRegion(1405): Updates disabled for region tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
      2015-08-27 18:50:14,343 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.CompactSplitThread$CompactionRunner(550): CompactSplitThread Status: compaction_queue=(0:3), split_queue=0, merge_queue=0
      2015-08-27 18:50:14,348 ERROR [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.CompactSplitThread$CompactionRunner(547): Compaction failed Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311., storeName=family, fileCount=2, fileSize=51.6 K, priority=998, time=525051341420855
      java.lang.NullPointerException
      	at org.apache.hadoop.hbase.regionserver.HStore.logCompactionEndMessage(HStore.java:1343)
      	at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1259)
      	at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1812)
      	at org.apache.hadoop.hbase.TestIOFencing$CompactionBlockerRegion.compact(TestIOFencing.java:127)
      	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      2015-08-27 18:50:14,349 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.CompactSplitThread$CompactionRunner(550): CompactSplitThread Status: compaction_queue=(0:3), split_queue=0, merge_queue=0
      2015-08-27 18:50:14,351 INFO  [StoreCloserThread-tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.-1] regionserver.HStore(889): Closed family
      2015-08-27 18:50:14,352 ERROR [RS_CLOSE_REGION-hemera:35619-0] regionserver.HRegion(1493): Memstore size is 62664
      2015-08-27 18:50:14,353 INFO  [RS_CLOSE_REGION-hemera:35619-0] regionserver.HRegion(1506): Closed tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
      2015-08-27 18:50:14,353 DEBUG [RS_CLOSE_REGION-hemera:35619-0] handler.CloseRegionHandler(122): Closed tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311.
      2015-08-27 18:50:14,361 ERROR [RS:1;hemera:39431-shortCompactions-1440701414271] regionserver.CompactSplitThread$CompactionRunner(541): Compaction failed Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311., storeName=family, fileCount=9, fileSize=289.6 K (25.6 K, 47.0 K, 26.0 K, 68.7 K, 26.1 K, 26.1 K, 26.1 K, 26.1 K, 18.0 K), priority=991, time=525062316035397
      java.io.FileNotFoundException: File does not exist: /user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
      	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
      	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
      
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
      	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
      	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
      	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
      	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
      	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
      	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
      	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
      	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
      	at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:160)
      	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
      	at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:106)
      	at org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:82)
      	at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:248)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:385)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:492)
      	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:115)
      	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:99)
      	at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createFileScanners(Compactor.java:203)
      	at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:70)
      	at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
      	at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1231)
      	at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1812)
      	at org.apache.hadoop.hbase.TestIOFencing$CompactionBlockerRegion.compact(TestIOFencing.java:127)
      	at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc
      	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
      	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
      	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
      
      	at org.apache.hadoop.ipc.Client.call(Client.java:1411)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1364)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
      	at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
      	at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      	at com.sun.proxy.$Proxy21.getBlockLocations(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
      	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
      	at com.sun.proxy.$Proxy22.getBlockLocations(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
      	... 29 more
      

      Attachments

        1. HBASE-14327.patch
          2 kB
          Heng Chen

        Activity

          People

            chenheng Heng Chen
            dimaspivak Dima Spivak
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: