HBase
  1. HBase
  2. HBASE-6479

HFileReaderV1 caching the same parent META block could cause server abort when splitting

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.0
    • Fix Version/s: 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If the hfile's version is 1 now, when splitting, two daughters would loadBloomfilter concurrently in the open progress. Because their META block is the same one(parent's META block), the following expection would be thrown when doing HFileReaderV1#getMetaBlock

      java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453)
      	at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225)
      	at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
      	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
      	at org.junit.rules.RunRules.evaluate(RunRules.java:18)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
      	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
      	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
      	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
      	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
      	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
      	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
      Caused by: java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached an already cached block
      	at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540)
      	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
      	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already cached block
      	at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424)
      	at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:271)
      	at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918)
      	at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516)
      	at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	... 1 more
      Caused by: java.lang.RuntimeException: Cached an already cached block
      	at org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:271)
      	at org.apache.hadoop.hbase.io.hfile.HFileReaderV1.getMetaBlock(HFileReaderV1.java:258)
      	at org.apache.hadoop.hbase.io.hfile.HFileReaderV1.getGeneralBloomFilterMetadata(HFileReaderV1.java:689)
      	at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.loadBloomfilter(StoreFile.java:1564)
      	at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.access$1(StoreFile.java:1558)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:571)
      	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:606)
      	at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:395)
      	at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:1)
      	... 8 more
      
      

      We could reproduce the problem through the attached test patch,

      It would happen when cluster upgrading from 0.90.x to 0.94.x or 0.92.x

      1. test.patch
        3 kB
        chunhui shen
      2. HBASE-6479.patch
        0.8 kB
        chunhui shen
      3. HBASE-6479v2.patch
        5 kB
        chunhui shen
      4. 6479v2.txt
        5 kB
        stack

        Issue Links

          Activity

          Hide
          chunhui shen added a comment -

          An easy way to fix this case is disable cache meta block when loadBloomfilter()

          or don't throw the exception of Cached an already cached block

          Show
          chunhui shen added a comment - An easy way to fix this case is disable cache meta block when loadBloomfilter() or don't throw the exception of Cached an already cached block
          Hide
          Ted Yu added a comment -

          @Chunhui:
          Can you include testWholesomeSplitWithHFileV1 in your patch to show that the problem is fixed ?

          Thanks

          Show
          Ted Yu added a comment - @Chunhui: Can you include testWholesomeSplitWithHFileV1 in your patch to show that the problem is fixed ? Thanks
          Hide
          chunhui shen added a comment -

          Including test case in the patchV2

          Show
          chunhui shen added a comment - Including test case in the patchV2
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538861/HBASE-6479v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

          -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestFromClientSide
          org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538861/HBASE-6479v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2477//console This message is automatically generated.
          Hide
          Michael Drzal added a comment -

          chunhui shen any thoughts on the test failures from HadoopQA?

          Show
          Michael Drzal added a comment - chunhui shen any thoughts on the test failures from HadoopQA?
          Hide
          chunhui shen added a comment -

          Michael Drzal
          The failed test both passed on our QA environment, I think it's not related to this patch

          Show
          chunhui shen added a comment - Michael Drzal The failed test both passed on our QA environment, I think it's not related to this patch
          Hide
          stack added a comment -

          Committed to trunk. Thanks Chunhui (would need more work getting this into 0.94)

          Show
          stack added a comment - Committed to trunk. Thanks Chunhui (would need more work getting this into 0.94)
          Hide
          stack added a comment -

          Here is what I committed. Had to massage a little to get it in. Thanks for the patch Chunhui.

          Show
          stack added a comment - Here is what I committed. Had to massage a little to get it in. Thanks for the patch Chunhui.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3409 (See https://builds.apache.org/job/HBase-TRUNK/3409/)
          HBASE-6479 HFileReaderV1 caching the same parent META block could cause server abot when splitting (Revision 1393194)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3409 (See https://builds.apache.org/job/HBase-TRUNK/3409/ ) HBASE-6479 HFileReaderV1 caching the same parent META block could cause server abot when splitting (Revision 1393194) Result = FAILURE stack : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #204 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/204/)
          HBASE-6479 HFileReaderV1 caching the same parent META block could cause server abot when splitting (Revision 1393194)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #204 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/204/ ) HBASE-6479 HFileReaderV1 caching the same parent META block could cause server abot when splitting (Revision 1393194) Result = FAILURE stack : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #879 (See https://builds.apache.org/job/HBase-0.94/879/)
          HBASE-7991 Backport HBASE-6479 'HFileReaderV1 caching the same parent META block could cause server abort when splitting' to 0.94 (Revision 1452465)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #879 (See https://builds.apache.org/job/HBase-0.94/879/ ) HBASE-7991 Backport HBASE-6479 'HFileReaderV1 caching the same parent META block could cause server abort when splitting' to 0.94 (Revision 1452465) Result = FAILURE tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #12 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/12/)
          HBASE-7991 Backport HBASE-6479 'HFileReaderV1 caching the same parent META block could cause server abort when splitting' to 0.94 (Revision 1452465)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #12 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/12/ ) HBASE-7991 Backport HBASE-6479 'HFileReaderV1 caching the same parent META block could cause server abort when splitting' to 0.94 (Revision 1452465) Result = FAILURE tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #116 (See https://builds.apache.org/job/HBase-0.94-security/116/)
          HBASE-7991 Backport HBASE-6479 'HFileReaderV1 caching the same parent META block could cause server abort when splitting' to 0.94 (Revision 1452465)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #116 (See https://builds.apache.org/job/HBase-0.94-security/116/ ) HBASE-7991 Backport HBASE-6479 'HFileReaderV1 caching the same parent META block could cause server abort when splitting' to 0.94 (Revision 1452465) Result = SUCCESS tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransaction.java
          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.

            People

            • Assignee:
              chunhui shen
              Reporter:
              chunhui shen
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development