HBase
  1. HBase
  2. HBASE-10370

Compaction in out-of-date Store causes region split failure

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.94.3, 0.98.0, 0.99.0
    • Fix Version/s: 0.98.0, 0.96.2, 0.99.0
    • Component/s: Compaction
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException.

      2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
      java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
      at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
      at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
      at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
      ....

      The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever.

      The timeline is that

      Assumption: there are two hfiles: a, b in Store A in Region R
      t0: A compaction request of Store A(a+b) in Region R is sent.

      t1: First Split for Region R. But this split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b).

      t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c). Hfile a and b are archived.

      t3: Another Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b)

      t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException.

      I have add a test to identity this problem.

      After search the jira, maybe HBASE-8502 is the same problem. Dimitri Goldin

      1. 10370-v4.patch
        1.0 kB
        Ted Yu
      2. 10370v2.096.txt
        1.0 kB
        stack
      3. 10370-v3.patch
        4 kB
        Ted Yu
      4. HBASE-10370-v2.diff
        4 kB
        Liu Shaohui
      5. HBASE-10370-v1.diff
        4 kB
        Liu Shaohui

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #60 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/60/)
          HBASE-10377 Add test for HBASE-10370 Compaction in out-of-date Store causes region split failure (stack: rev 1559838)

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #60 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/60/ ) HBASE-10377 Add test for HBASE-10370 Compaction in out-of-date Store causes region split failure (stack: rev 1559838) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4838 (See https://builds.apache.org/job/HBase-TRUNK/4838/)
          HBASE-10377 Add test for HBASE-10370 Compaction in out-of-date Store causes region split failure (stack: rev 1559838)

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4838 (See https://builds.apache.org/job/HBase-TRUNK/4838/ ) HBASE-10377 Add test for HBASE-10370 Compaction in out-of-date Store causes region split failure (stack: rev 1559838) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Dimitri Goldin added a comment -

          Thanks for fixing - it's very likely, that it was the same issue going by the behaviour and the fix. There was not enough information/proof back then to debug/track the root of the problem down.

          Show
          Dimitri Goldin added a comment - Thanks for fixing - it's very likely, that it was the same issue going by the behaviour and the fix. There was not enough information/proof back then to debug/track the root of the problem down.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98 #93 (See https://builds.apache.org/job/HBase-0.98/93/)
          HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559276)

          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
            HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559274)
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #93 (See https://builds.apache.org/job/HBase-0.98/93/ ) HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559276) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559274) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4833 (See https://builds.apache.org/job/HBase-TRUNK/4833/)
          HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559277)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
            HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559275)
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4833 (See https://builds.apache.org/job/HBase-TRUNK/4833/ ) HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559277) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559275) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #84 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/84/)
          HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559276)

          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
            HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559274)
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #84 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/84/ ) HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559276) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559274) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #57 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/57/)
          HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559277)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
            HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559275)
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #57 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/57/ ) HBASE-10370 Compaction in out-of-date Store causes region split failure, fix only (Tedyu: rev 1559277) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java HBASE-10370 revert due to TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit failure (Tedyu: rev 1559275) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.96-hadoop2 #178 (See https://builds.apache.org/job/hbase-0.96-hadoop2/178/)
          HBASE-10370 Compaction in out-of-date Store causes region split failed (stack: rev 1559226)

          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #178 (See https://builds.apache.org/job/hbase-0.96-hadoop2/178/ ) HBASE-10370 Compaction in out-of-date Store causes region split failed (stack: rev 1559226) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #56 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/56/)
          HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559216)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #56 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/56/ ) HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559216) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Ted Yu added a comment -

          Re-integrated the fix into 0.98 and trunk.

          I created HBASE-10377 for restoring the test.

          Once Jenkins builds come back green, will resolve this issue.

          Show
          Ted Yu added a comment - Re-integrated the fix into 0.98 and trunk. I created HBASE-10377 for restoring the test. Once Jenkins builds come back green, will resolve this issue.
          Hide
          Ted Yu added a comment -

          Sounds good.

          Patch v4 is the fix without test.

          Show
          Ted Yu added a comment - Sounds good. Patch v4 is the fix without test.
          Hide
          Andrew Purtell added a comment -

          Can we commit this to 0.98 as was committed to 0.96 and open a followup JIRA for a test for both 0.98 and 0.96 (at Stack's option)?

          I will integrate one more time before 0.98.0 RC0 comes out.

          Ok, the only blocker remaining is HBASE-10322. I'm going to tag the RC as soon as that goes in.

          Show
          Andrew Purtell added a comment - Can we commit this to 0.98 as was committed to 0.96 and open a followup JIRA for a test for both 0.98 and 0.96 (at Stack's option)? I will integrate one more time before 0.98.0 RC0 comes out. Ok, the only blocker remaining is HBASE-10322 . I'm going to tag the RC as soon as that goes in.
          Hide
          Ted Yu added a comment -

          Okay.
          I will integrate one more time before 0.98.0 RC0 comes out.

          Let's give Shaohui some time to make the test robust.

          Show
          Ted Yu added a comment - Okay. I will integrate one more time before 0.98.0 RC0 comes out. Let's give Shaohui some time to make the test robust.
          Hide
          Andrew Purtell added a comment -

          Well this was committed elsewhere without the new test, so I'd prefer that if you don't mind.

          Show
          Andrew Purtell added a comment - Well this was committed elsewhere without the new test, so I'd prefer that if you don't mind.
          Hide
          Ted Yu added a comment -

          or with a fixed test

          I prefer the above.

          Thanks Andy.

          Show
          Ted Yu added a comment - or with a fixed test I prefer the above. Thanks Andy.
          Hide
          Andrew Purtell added a comment -

          This could also be handled with an addendum that drops the new test, since this was committed to 0.96 without the new test.

          Show
          Andrew Purtell added a comment - This could also be handled with an addendum that drops the new test, since this was committed to 0.96 without the new test.
          Hide
          Andrew Purtell added a comment -

          I will revert from 0.98 and trunk

          Thanks Ted.

          +1 to recommit without the new test, or with a fixed test.

          Show
          Andrew Purtell added a comment - I will revert from 0.98 and trunk Thanks Ted. +1 to recommit without the new test, or with a fixed test.
          Hide
          Ted Yu added a comment -

          From https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/82/testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testSplitFailedCompactionAndSplit/ :

          java.lang.AssertionError
          	at org.junit.Assert.fail(Assert.java:86)
          	at org.junit.Assert.assertTrue(Assert.java:41)
          	at org.junit.Assert.assertNotNull(Assert.java:621)
          	at org.junit.Assert.assertNotNull(Assert.java:631)
          	at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit(TestSplitTransactionOnCluster.java:335)
          

          The following assertion failed:

              CompactionContext cc = store.requestCompaction();
              assertNotNull(cc);
          

          Since the above is in the new test added, I will revert from 0.98 and trunk.

          Show
          Ted Yu added a comment - From https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/82/testReport/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testSplitFailedCompactionAndSplit/ : java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertNotNull(Assert.java:621) at org.junit.Assert.assertNotNull(Assert.java:631) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSplitFailedCompactionAndSplit(TestSplitTransactionOnCluster.java:335) The following assertion failed: CompactionContext cc = store.requestCompaction(); assertNotNull(cc); Since the above is in the new test added, I will revert from 0.98 and trunk.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4831 (See https://builds.apache.org/job/HBase-TRUNK/4831/)
          HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559216)

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4831 (See https://builds.apache.org/job/HBase-TRUNK/4831/ ) HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559216) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.98 #91 (See https://builds.apache.org/job/HBase-0.98/91/)
          HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559215)

          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.98 #91 (See https://builds.apache.org/job/HBase-0.98/91/ ) HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559215) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.96 #261 (See https://builds.apache.org/job/hbase-0.96/261/)
          HBASE-10370 Compaction in out-of-date Store causes region split failed (stack: rev 1559226)

          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.96 #261 (See https://builds.apache.org/job/hbase-0.96/261/ ) HBASE-10370 Compaction in out-of-date Store causes region split failed (stack: rev 1559226) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #82 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/82/)
          HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559215)

          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #82 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/82/ ) HBASE-10370 Compaction in out-of-date Store causes region split failed (Tedyu: rev 1559215) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12623710/10370v2.096.txt
          against trunk revision .
          ATTACHMENT ID: 12623710

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8463//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623710/10370v2.096.txt against trunk revision . ATTACHMENT ID: 12623710 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8463//console This message is automatically generated.
          Hide
          stack added a comment -

          What I applied to 0.96 (no unit test – it failed apply). Thanks Liu Shaohui

          Show
          stack added a comment - What I applied to 0.96 (no unit test – it failed apply). Thanks Liu Shaohui
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12623704/10370-v3.patch
          against trunk revision .
          ATTACHMENT ID: 12623704

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8462//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623704/10370-v3.patch against trunk revision . ATTACHMENT ID: 12623704 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8462//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Integrated to 0.98 and trunk.

          Thanks for the patch, Shaohui.

          Show
          Ted Yu added a comment - Integrated to 0.98 and trunk. Thanks for the patch, Shaohui.
          Hide
          Ted Yu added a comment -

          Patch v3 closes the region and table in TestSplitTransactionOnCluster#testSplitFailedCompactionAndSplit

          Show
          Ted Yu added a comment - Patch v3 closes the region and table in TestSplitTransactionOnCluster#testSplitFailedCompactionAndSplit
          Hide
          Andrew Purtell added a comment -

          +1 for 0.98

          Show
          Andrew Purtell added a comment - +1 for 0.98
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12623612/HBASE-10370-v2.diff
          against trunk revision .
          ATTACHMENT ID: 12623612

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623612/HBASE-10370-v2.diff against trunk revision . ATTACHMENT ID: 12623612 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8459//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Patch looks good to me. Did not spend time to check the root cause.

          Show
          ramkrishna.s.vasudevan added a comment - Patch looks good to me. Did not spend time to check the root cause.
          Hide
          Liu Shaohui added a comment -

          Update the log as chunhui shen's advice

          Show
          Liu Shaohui added a comment - Update the log as chunhui shen 's advice
          Hide
          chunhui shen added a comment - - edited

          Patch makes sense for this issue.
          It's bettern if log a clearer message.
          What about 'LOG.warn("Store " + store.getColumnFamilyName() + " on region "+ this + " has been re-instantiated, cacel this compaction request. It may be caused by the roll back of split transaction");' ?

          Show
          chunhui shen added a comment - - edited Patch makes sense for this issue. It's bettern if log a clearer message. What about 'LOG.warn("Store " + store.getColumnFamilyName() + " on region "+ this + " has been re-instantiated, cacel this compaction request. It may be caused by the roll back of split transaction");' ?
          Hide
          Liu Shaohui added a comment -

          chunhui shen
          Yes, caching the family name in CompactionRequest is another method.
          I tryed this method but there are some problems:
          compaction is requested in a store , run in another store, and finished in another store, which may causes the status in Store inconsistent
          getStore method in HRegion may return different objects in different calls.
          I don't think very clear about this, so i just add a check in compact.
          Thanks for your advices.

          Show
          Liu Shaohui added a comment - chunhui shen Yes, caching the family name in CompactionRequest is another method. I tryed this method but there are some problems: compaction is requested in a store , run in another store, and finished in another store, which may causes the status in Store inconsistent getStore method in HRegion may return different objects in different calls. I don't think very clear about this, so i just add a check in compact. Thanks for your advices.
          Hide
          chunhui shen added a comment -

          It seems the 'store' object in CompactionRequest is possible to be wrong one.

          Should we cache the family name in CompactionRequest rather than the store object?
          So we get the store object through HRegion#getStore when it is needed

          Nice found!

          Show
          chunhui shen added a comment - It seems the 'store' object in CompactionRequest is possible to be wrong one. Should we cache the family name in CompactionRequest rather than the store object? So we get the store object through HRegion#getStore when it is needed Nice found!
          Hide
          Liu Shaohui added a comment -

          stack
          Maybe this expression is not accurate.
          Please see the timeline in the description and you will understand the problem.

          Show
          Liu Shaohui added a comment - stack Maybe this expression is not accurate. Please see the timeline in the description and you will understand the problem.
          Hide
          Liu Shaohui added a comment -

          stack
          After the rollback of a failed region split, the region reinitializes and the stores in the region are new ones.
          But the compaction request refer to the old Store, which is 'out-of-date Store' .

          Show
          Liu Shaohui added a comment - stack After the rollback of a failed region split, the region reinitializes and the stores in the region are new ones. But the compaction request refer to the old Store, which is 'out-of-date Store' .
          Hide
          stack added a comment -

          ...what you mean?

          Sorry, I meant, "... could you explain some more... I don't follow too well."

          Show
          stack added a comment - ...what you mean? Sorry, I meant, "... could you explain some more... I don't follow too well."
          Hide
          stack added a comment -

          Liu Shaohui The test is great.

          When you say '.. in an out-of-date Store deletes the hfiles...', what you mean?

          Show
          stack added a comment - Liu Shaohui The test is great. When you say '.. in an out-of-date Store deletes the hfiles...', what you mean?
          Hide
          Liu Shaohui added a comment -

          Patch for trunk.
          This patch is not perfect and I think we should fix this problem in a high level.
          can someone familiar with region split can give some advices?
          stack

          Show
          Liu Shaohui added a comment - Patch for trunk. This patch is not perfect and I think we should fix this problem in a high level. can someone familiar with region split can give some advices? stack

            People

            • Assignee:
              Liu Shaohui
              Reporter:
              Liu Shaohui
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development