HBase
  1. HBase
  2. HBASE-5568

Multi concurrent flushcache() for one region could cause data loss

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
      However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong.
      At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region.

      Logs in RS for region e9d827913a056e696c39bc569ea3

      2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
      f99f., current region memstore size 128.0m
      2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
      a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m
      2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
      f99f., current region memstore size 134.8m
      2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
      a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m
      2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a
      056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false
      2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
      f99f., current region memstore size 6.8m
      2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
      a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m
      2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
      a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m
      2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a
      056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true
      2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
      a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k
      2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
      a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k
      2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05
      6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true

      1. HBASE-5568.patch
        2 kB
        Ted Yu
      2. HBASE-5568.patch
        2 kB
        chunhui shen
      3. HBASE-5568-90.patch
        3 kB
        chunhui shen
      4. HBASE-5568-92v2.patch
        2 kB
        chunhui shen
      5. HBASE-5568v2.patch
        2 kB
        chunhui shen

        Issue Links

          Activity

          Hide
          Ted Yu added a comment -

          Integrated to 0.92

          Show
          Ted Yu added a comment - Integrated to 0.92
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #329 (See https://builds.apache.org/job/HBase-0.92/329/)
          HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1302270)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #329 (See https://builds.apache.org/job/HBase-0.92/329/ ) HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1302270) Result = FAILURE tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          Hide
          Ted Yu added a comment -

          Integrated 0.92 patch to 0.92 branch.

          Show
          Ted Yu added a comment - Integrated 0.92 patch to 0.92 branch.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Sorry Ted. I did not notice that.

          Show
          ramkrishna.s.vasudevan added a comment - Sorry Ted. I did not notice that.
          Hide
          Ted Yu added a comment -

          Patch for 0.92 hasn't been integrated.

          Show
          Ted Yu added a comment - Patch for 0.92 hasn't been integrated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Resolving as committed to all versions.

          Show
          ramkrishna.s.vasudevan added a comment - Resolving as committed to all versions.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #140 (See https://builds.apache.org/job/HBase-TRUNK-security/140/)
          HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301639)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #140 (See https://builds.apache.org/job/HBase-TRUNK-security/140/ ) HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301639) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518769/HBASE-5568-92v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1215//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518769/HBASE-5568-92v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1215//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          patch for 92 version

          Show
          chunhui shen added a comment - patch for 92 version
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #34 (See https://builds.apache.org/job/HBase-0.94/34/)
          HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301676)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #34 (See https://builds.apache.org/job/HBase-0.94/34/ ) HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301676) Result = SUCCESS tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Hide
          Ted Yu added a comment -

          Integrated to 0.90 branch as well.

          Show
          Ted Yu added a comment - Integrated to 0.90 branch as well.
          Hide
          stack added a comment -

          +1 Good find Chunhui.

          Show
          stack added a comment - +1 Good find Chunhui.
          Hide
          Lars Hofhansl added a comment -

          Thanks Ted.

          Show
          Lars Hofhansl added a comment - Thanks Ted.
          Hide
          Ted Yu added a comment -

          Integrated to 0.94 as well.

          Show
          Ted Yu added a comment - Integrated to 0.94 as well.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2684 (See https://builds.apache.org/job/HBase-TRUNK/2684/)
          HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301639)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2684 (See https://builds.apache.org/job/HBase-TRUNK/2684/ ) HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301639) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Hide
          Ted Yu added a comment - - edited

          Integrated to TRUNK.

          Thanks for the patch Chunhui.

          Thanks for the review Ramkrishna and Lars.

          Show
          Ted Yu added a comment - - edited Integrated to TRUNK. Thanks for the patch Chunhui. Thanks for the review Ramkrishna and Lars.
          Hide
          Ted Yu added a comment -

          @Chunhui:
          A patch for 0.92 is needed:

          p0 HBASE-5568v2.patch
          ...
          Hunk #1 FAILED at 152.
          1 out of 1 hunk FAILED -- saving rejects to file src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java.rej
          

          Please also update patch for 0.90 as well.

          Show
          Ted Yu added a comment - @Chunhui: A patch for 0.92 is needed: p0 HBASE-5568v2.patch ... Hunk #1 FAILED at 152. 1 out of 1 hunk FAILED -- saving rejects to file src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java.rej Please also update patch for 0.90 as well.
          Hide
          Ted Yu added a comment -

          Patch v2 looks good.
          Will integrate if there is no objection.

          Show
          Ted Yu added a comment - Patch v2 looks good. Will integrate if there is no objection.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518621/HBASE-5568v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518621/HBASE-5568v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518615/HBASE-5568v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1197//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1197//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1197//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518615/HBASE-5568v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1197//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1197//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1197//console This message is automatically generated.
          Hide
          chunhui shen added a comment -

          patch v2 for test case:
          change TestStore#testDeleteExpiredStoreFiles#ttl from 1 to 4

          Show
          chunhui shen added a comment - patch v2 for test case: change TestStore#testDeleteExpiredStoreFiles#ttl from 1 to 4
          Hide
          chunhui shen added a comment -

          I think it's the problem of test case TestStore#testDeleteExpiredStoreFiles.
          See its code:

          int storeFileNum = 4;
          int ttl = 1;
          ...    
          hcd.setTimeToLive(ttl);
          init(getName(), conf, hcd);
          
              long sleepTime = this.store.scanInfo.getTtl() / storeFileNum;
              ...
              for (int i = 1; i <= storeFileNum; i++) {
                LOG.info("Adding some data for the store file #" + i);
                timeStamp = EnvironmentEdgeManager.currentTimeMillis();
                this.store.add(new KeyValue(row, family, qf1, timeStamp, (byte[]) null));
                this.store.add(new KeyValue(row, family, qf2, timeStamp, (byte[]) null));
                this.store.add(new KeyValue(row, family, qf3, timeStamp, (byte[]) null));
                flush(i);
                Thread.sleep(sleepTime);
              }
          

          Because ttl=1, so sleepTime=250ms, so for the 4 store files , the maxExpiredTimeStamp discrepancy is only 250ms.

          so they may be expired at the same time if you run slowly and then CompactionRequest cr = this.store.requestCompaction(); cr.getFiles().size() return 3.
          We can ensure this from the logs:
          https://builds.apache.org/job/PreCommit-HBASE-Build/1187//testReport/org.apache.hadoop.hbase.regionserver/TestStore/testDeleteExpiredStoreFiles/

          2012-03-14 17:19:14,672 INFO  [pool-1-thread-1] regionserver.Store(1002): Completed compaction of 1 file(s) in family of table,,1331745551742.011a93cc4f763307dc36f577662db4b1. into none, size=none; total size for store is 2.4k
          2012-03-14 17:19:14,923 INFO  [pool-1-thread-1] compactions.CompactSelection(104): Deleting the expired store file by compaction: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/4d75667a-f352-40cc-978a-f36fec71baf2/TestStoretestDeleteExpiredStoreFiles/011a93cc4f763307dc36f577662db4b1/family/489673e6568241c6bb500b34d7ff29ad whose maxTimeStamp is 1331745552397 while the max expired timestamp is 1331745553923
          2012-03-14 17:19:14,923 INFO  [pool-1-thread-1] compactions.CompactSelection(104): Deleting the expired store file by compaction: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/4d75667a-f352-40cc-978a-f36fec71baf2/TestStoretestDeleteExpiredStoreFiles/011a93cc4f763307dc36f577662db4b1/family/7ac26f52fd214aca88bd555f3c82dd91 whose maxTimeStamp is 1331745552671 while the max expired timestamp is 1331745553923
          2012-03-14 17:19:14,923 INFO  [pool-1-thread-1] compactions.CompactSelection(104): Deleting the expired store file by compaction: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/4d75667a-f352-40cc-978a-f36fec71baf2/TestStoretestDeleteExpiredStoreFiles/011a93cc4f763307dc36f577662db4b1/family/7ebf7908f5044fcab65a7327d3e19405 whose maxTimeStamp is 1331745552939 while the max expired timestamp is 1331745553923
          
          for (int i = 1; i <= storeFileNum; i++) {
                // verify the expired store file.
                CompactionRequest cr = this.store.requestCompaction();
                assertEquals(1, cr.getFiles().size());
                assertTrue(cr.getFiles().get(0).getReader().getMaxTimestamp() < 
                    (System.currentTimeMillis() - this.store.scanInfo.getTtl()));
                // Verify that the expired the store has been deleted.
                this.store.compact(cr);
                assertEquals(storeFileNum - i, this.store.getStorefiles().size());
          
                // Let the next store file expired.
                Thread.sleep(sleepTime);
              }

          after the first compaction,
          the next call this.store.requestCompaction(), the other three store files are all expired at one time.

          Show
          chunhui shen added a comment - I think it's the problem of test case TestStore#testDeleteExpiredStoreFiles. See its code: int storeFileNum = 4; int ttl = 1; ... hcd.setTimeToLive(ttl); init(getName(), conf, hcd); long sleepTime = this .store.scanInfo.getTtl() / storeFileNum; ... for ( int i = 1; i <= storeFileNum; i++) { LOG.info( "Adding some data for the store file #" + i); timeStamp = EnvironmentEdgeManager.currentTimeMillis(); this .store.add( new KeyValue(row, family, qf1, timeStamp, ( byte []) null )); this .store.add( new KeyValue(row, family, qf2, timeStamp, ( byte []) null )); this .store.add( new KeyValue(row, family, qf3, timeStamp, ( byte []) null )); flush(i); Thread .sleep(sleepTime); } Because ttl=1, so sleepTime=250ms, so for the 4 store files , the maxExpiredTimeStamp discrepancy is only 250ms. so they may be expired at the same time if you run slowly and then CompactionRequest cr = this.store.requestCompaction(); cr.getFiles().size() return 3. We can ensure this from the logs: https://builds.apache.org/job/PreCommit-HBASE-Build/1187//testReport/org.apache.hadoop.hbase.regionserver/TestStore/testDeleteExpiredStoreFiles/ 2012-03-14 17:19:14,672 INFO [pool-1-thread-1] regionserver.Store(1002): Completed compaction of 1 file(s) in family of table,,1331745551742.011a93cc4f763307dc36f577662db4b1. into none, size=none; total size for store is 2.4k 2012-03-14 17:19:14,923 INFO [pool-1-thread-1] compactions.CompactSelection(104): Deleting the expired store file by compaction: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/4d75667a-f352-40cc-978a-f36fec71baf2/TestStoretestDeleteExpiredStoreFiles/011a93cc4f763307dc36f577662db4b1/family/489673e6568241c6bb500b34d7ff29ad whose maxTimeStamp is 1331745552397 while the max expired timestamp is 1331745553923 2012-03-14 17:19:14,923 INFO [pool-1-thread-1] compactions.CompactSelection(104): Deleting the expired store file by compaction: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/4d75667a-f352-40cc-978a-f36fec71baf2/TestStoretestDeleteExpiredStoreFiles/011a93cc4f763307dc36f577662db4b1/family/7ac26f52fd214aca88bd555f3c82dd91 whose maxTimeStamp is 1331745552671 while the max expired timestamp is 1331745553923 2012-03-14 17:19:14,923 INFO [pool-1-thread-1] compactions.CompactSelection(104): Deleting the expired store file by compaction: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/4d75667a-f352-40cc-978a-f36fec71baf2/TestStoretestDeleteExpiredStoreFiles/011a93cc4f763307dc36f577662db4b1/family/7ebf7908f5044fcab65a7327d3e19405 whose maxTimeStamp is 1331745552939 while the max expired timestamp is 1331745553923 for ( int i = 1; i <= storeFileNum; i++) { // verify the expired store file. CompactionRequest cr = this .store.requestCompaction(); assertEquals(1, cr.getFiles().size()); assertTrue(cr.getFiles().get(0).getReader().getMaxTimestamp() < ( System .currentTimeMillis() - this .store.scanInfo.getTtl())); // Verify that the expired the store has been deleted. this .store.compact(cr); assertEquals(storeFileNum - i, this .store.getStorefiles().size()); // Let the next store file expired. Thread .sleep(sleepTime); } after the first compaction, the next call this.store.requestCompaction(), the other three store files are all expired at one time.
          Hide
          Ted Yu added a comment -

          Yes.

          Show
          Ted Yu added a comment - Yes.
          Hide
          Lars Hofhansl added a comment -

          @Ted: You see the failure only with this patch applied?

          Show
          Lars Hofhansl added a comment - @Ted: You see the failure only with this patch applied?
          Hide
          chunhui shen added a comment -

          I have run org.apache.hadoop.hbase.regionserver.TestStore many times on my local pc, all passed.
          Is there any other problem?

          Show
          chunhui shen added a comment - I have run org.apache.hadoop.hbase.regionserver.TestStore many times on my local pc, all passed. Is there any other problem?
          Hide
          Ted Yu added a comment -

          I can reproduce TestStore failure on my laptop.

          Show
          Ted Yu added a comment - I can reproduce TestStore failure on my laptop.
          Hide
          Ted Yu added a comment -

          In getGetNumCurrentReplicas():

                LOG.info("getNumCurrentReplicas--HDFS-826 not available; hdfs_out=" +
                  os, exception);
          

          I think we should omit exception if exception is instanceof NoSuchMethodException because the log is at info level.

          Show
          Ted Yu added a comment - In getGetNumCurrentReplicas(): LOG.info( "getNumCurrentReplicas--HDFS-826 not available; hdfs_out=" + os, exception); I think we should omit exception if exception is instanceof NoSuchMethodException because the log is at info level.
          Hide
          ramkrishna.s.vasudevan added a comment -
          hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@61736e
          java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
          	at java.lang.Class.getDeclaredMethod(Class.java:1937)
          	at org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:460)
          	at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:443)
          	at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:341)
          	at org.apache.hadoop.hbase.regionserver.TestStore.init(TestStore.java:147)
          	at org.apache.hadoop.hbase.regionserver.TestStore.testDeleteExpiredStoreFiles(TestStore.java:162)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          

          In TestStore failures i saw above exception traces. Its not related to this fix but just wanted to ensure is this expected? or are we missing something?

          Show
          ramkrishna.s.vasudevan added a comment - hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@61736e java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas() at java.lang. Class .getDeclaredMethod( Class .java:1937) at org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:460) at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:443) at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:341) at org.apache.hadoop.hbase.regionserver.TestStore.init(TestStore.java:147) at org.apache.hadoop.hbase.regionserver.TestStore.testDeleteExpiredStoreFiles(TestStore.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) In TestStore failures i saw above exception traces. Its not related to this fix but just wanted to ensure is this expected? or are we missing something?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518339/HBASE-5568.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestStore
          org.apache.hadoop.hbase.replication.TestReplicationPeer
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1187//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1187//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1187//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518339/HBASE-5568.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestStore org.apache.hadoop.hbase.replication.TestReplicationPeer org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1187//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1187//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1187//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Re-attaching Chunhui's patch

          Show
          Ted Yu added a comment - Re-attaching Chunhui's patch
          Hide
          ramkrishna.s.vasudevan added a comment -

          Updated 0.90.7 also in fix versions.

          Show
          ramkrishna.s.vasudevan added a comment - Updated 0.90.7 also in fix versions.
          Hide
          chunhui shen added a comment -

          Submit a patch for 90 version

          Show
          chunhui shen added a comment - Submit a patch for 90 version
          Hide
          Lars Hofhansl added a comment -

          I think we want this in at least 0.92, 0.94, and 0.96.

          Show
          Lars Hofhansl added a comment - I think we want this in at least 0.92, 0.94, and 0.96.
          Hide
          Lars Hofhansl added a comment -

          @chunhui: looked at the patch again. Makes sense now. +1

          Show
          Lars Hofhansl added a comment - @chunhui: looked at the patch again. Makes sense now. +1
          Hide
          ramkrishna.s.vasudevan added a comment -

          As per the comments on HBASE-5312 i feel the patch makes sense.
          I am +1 on it. It is needed in 0.90 also?

          Show
          ramkrishna.s.vasudevan added a comment - As per the comments on HBASE-5312 i feel the patch makes sense. I am +1 on it. It is needed in 0.90 also?
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Chunhui

          Is this some how related with HBASE-5312?

          Show
          ramkrishna.s.vasudevan added a comment - @Chunhui Is this some how related with HBASE-5312 ?
          Hide
          chunhui shen added a comment -

          only set flushing to false if internalFlushCache returned true?

          internalFlushCache() may throws IOException, If we only flushing to false in the case internalFlushCache() returned true, I think no flushcache could happen if throws IOException.

          Show
          chunhui shen added a comment - only set flushing to false if internalFlushCache returned true? internalFlushCache() may throws IOException, If we only flushing to false in the case internalFlushCache() returned true, I think no flushcache could happen if throws IOException.
          Hide
          Lars Hofhansl added a comment -

          Thanks Chunhui. In that case, wouldn't a better solution be to pull <result> out of the try block and then only set flushing to false if internalFlushCache returned true?

          Show
          Lars Hofhansl added a comment - Thanks Chunhui. In that case, wouldn't a better solution be to pull <result> out of the try block and then only set flushing to false if internalFlushCache returned true?
          Hide
          chunhui shen added a comment -

          @Lars
          In current flushcache() logic, we consider three continuous call flushcache().
          1.The first, call internalFlushcache(status) successfully.
          2.The second, because writestate.flushing=true, so it will return false, but it will set writestate.flushing = false finally.
          3.The third, will call internalFlushcache(status) successfully again.

          Show
          chunhui shen added a comment - @Lars In current flushcache() logic, we consider three continuous call flushcache(). 1.The first, call internalFlushcache(status) successfully. 2.The second, because writestate.flushing=true, so it will return false, but it will set writestate.flushing = false finally. 3.The third, will call internalFlushcache(status) successfully again.
          Hide
          Lars Hofhansl added a comment -

          Hi Chunhui. You patch only moves the synchronized block out of the try block. Could you explain why this fixes the issue?

          Show
          Lars Hofhansl added a comment - Hi Chunhui. You patch only moves the synchronized block out of the try block. Could you explain why this fixes the issue?

            People

            • Assignee:
              chunhui shen
              Reporter:
              chunhui shen
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development