HBase
  1. HBase
  2. HBASE-4853

HBASE-4789 does overzealous pruning of seqids

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0, 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs'

      At a minimum, these lines need removing:

      diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
      index 623edbe..a0bbe01 100644
      --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
      +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
      @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
             // Cleaning up of lastSeqWritten is in the finally clause because we
             // don't want to confuse getOldestOutstandingSeqNum()
             this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
      -      Long l = this.lastSeqWritten.remove(encodedRegionName);
      -      if (l != null) {
      -        LOG.warn("Why is there a raw encodedRegionName in lastSeqWritten? name=" +
      -          Bytes.toString(encodedRegionName) + ", seqid=" + l);
      -       }
             this.cacheFlushLock.unlock();
           }
         }
      

      ... but above is no good w/o figuring why WALs are not being rotated off.

      1. 4853-v10.txt
        7 kB
        stack
      2. 4853-v9.txt
        8 kB
        stack
      3. 4853-v9.txt
        8 kB
        stack
      4. 4853-v8.txt
        8 kB
        stack
      5. 4853-v7.txt
        7 kB
        stack
      6. 4853-v6.txt
        4 kB
        stack
      7. 4853-v5.txt
        3 kB
        stack
      8. 4853-v4.txt
        9 kB
        stack
      9. 4853-trunk.txt
        2 kB
        stack
      10. 4853--no-prefix.txt
        2 kB
        stack
      11. 4853.txt
        2 kB
        stack

        Activity

        Hide
        Jean-Daniel Cryans added a comment -

        An example of data loss.

        First, we see a flush with seqid being 1389 but the message "Why is there a raw encodedRegionName in lastSeqWritten" means it deleted 1389.

        2011-11-22 11:21:08,526 WARN [RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] wal.HLog(1364): Why is there a raw encodedRegionName in lastSeqWritten? name=a9b1d96a4554f545a74142bc42f8c48e, seqid=1389
        2011-11-22 11:21:08,526 INFO [RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] regionserver.HRegion(1259): Finished memstore flush of ~654.3k for region test,,1321989625594.a9b1d96a4554f545a74142bc42f8c48e. in 493ms, sequenceid=1388, compaction requested=false

        HLog roll happens, it sees 0 last seq id so it clears all the WALs.

        2011-11-22 11:21:08,932 INFO [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(582): Roll /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490, entries=41, filesize=25596. for /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989668526
        2011-11-22 11:21:08,932 DEBUG [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(593): Last sequenceid written is empty. Deleting all old hlogs

        The last one had max seq id as 1424, which was never flushed.

        2011-11-22 11:21:08,967 INFO [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(823): moving old hlog file /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490 whose highest sequenceid is 1424 to /user/jdcryans/.oldlogs/hbasedev%2C54530%2C1321989619241.1321989667490

        Show
        Jean-Daniel Cryans added a comment - An example of data loss. First, we see a flush with seqid being 1389 but the message "Why is there a raw encodedRegionName in lastSeqWritten" means it deleted 1389. 2011-11-22 11:21:08,526 WARN [RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] wal.HLog(1364): Why is there a raw encodedRegionName in lastSeqWritten? name=a9b1d96a4554f545a74142bc42f8c48e, seqid=1389 2011-11-22 11:21:08,526 INFO [RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] regionserver.HRegion(1259): Finished memstore flush of ~654.3k for region test,,1321989625594.a9b1d96a4554f545a74142bc42f8c48e. in 493ms, sequenceid=1388, compaction requested=false HLog roll happens, it sees 0 last seq id so it clears all the WALs. 2011-11-22 11:21:08,932 INFO [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(582): Roll /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490, entries=41, filesize=25596. for /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989668526 2011-11-22 11:21:08,932 DEBUG [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(593): Last sequenceid written is empty. Deleting all old hlogs The last one had max seq id as 1424, which was never flushed. 2011-11-22 11:21:08,967 INFO [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(823): moving old hlog file /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490 whose highest sequenceid is 1424 to /user/jdcryans/.oldlogs/hbasedev%2C54530%2C1321989619241.1321989667490
        Hide
        stack added a comment -

        Here is the manifestation of our failing to clean up WALs:

        2011-11-23 00:08:31,155 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=36, maxlogs=32; forcing flush of 1 regions(s): 9e22583862f34489f04c42e1f3d16bce
        2011-11-23 00:08:31,155 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed to schedule flush of 9e22583862f34489f04c42e1f3d16bce, region=null, requester=null
        
        Show
        stack added a comment - Here is the manifestation of our failing to clean up WALs: 2011-11-23 00:08:31,155 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=36, maxlogs=32; forcing flush of 1 regions(s): 9e22583862f34489f04c42e1f3d16bce 2011-11-23 00:08:31,155 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed to schedule flush of 9e22583862f34489f04c42e1f3d16bce, region= null , requester= null
        Hide
        stack added a comment -

        Here is patch that changes the blanket purge of our region from lastseqwritten map on flush completion to only remove it when no edits have come in in meantime (as it used to be in 0.90.x).

        Show
        stack added a comment - Here is patch that changes the blanket purge of our region from lastseqwritten map on flush completion to only remove it when no edits have come in in meantime (as it used to be in 0.90.x).
        Hide
        Jean-Daniel Cryans added a comment -

        +1

        I'm still confused on how we end up with an id in there even tho we also remove it at the beginning of startCacheFlush.

        Show
        Jean-Daniel Cryans added a comment - +1 I'm still confused on how we end up with an id in there even tho we also remove it at the beginning of startCacheFlush.
        Hide
        Todd Lipcon added a comment -

        Why are our tests not freaking out with this bug?

        Show
        Todd Lipcon added a comment - Why are our tests not freaking out with this bug?
        Hide
        Jean-Daniel Cryans added a comment -

        TestReplication was but it's not its main purpose. It might be doing too much.

        Show
        Jean-Daniel Cryans added a comment - TestReplication was but it's not its main purpose. It might be doing too much.
        Hide
        stack added a comment -

        no prefix version

        Show
        stack added a comment - no prefix version
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12504841/4853--no-prefix.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/342//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504841/4853--no-prefix.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/342//console This message is automatically generated.
        Hide
        stack added a comment -

        Trunk is different here.

        Show
        stack added a comment - Trunk is different here.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12504846/4853-trunk.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -162 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestAdmin

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/344//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/344//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/344//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504846/4853-trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/344//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/344//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/344//console This message is automatically generated.
        Hide
        stack added a comment -

        Testing, this patch does not fix the issue of our not cleaning up WALs. I'm digging still.

        Show
        stack added a comment - Testing, this patch does not fix the issue of our not cleaning up WALs. I'm digging still.
        Hide
        stack added a comment -

        Working patch. Not done yet. Also has unit test to show hole (an edit is getting in and its seqid is sticking around in lastSeqid for the region w/o being cleared).

        Show
        stack added a comment - Working patch. Not done yet. Also has unit test to show hole (an edit is getting in and its seqid is sticking around in lastSeqid for the region w/o being cleared).
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12504860/4853-v4.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/346//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504860/4853-v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/346//console This message is automatically generated.
        Hide
        stack added a comment -

        Looks like this commit by me broke our memstore sizing: HBASE-4722. It takes memstore flush size outside of an update lock (more edits may have come in in meantime).

        Show
        stack added a comment - Looks like this commit by me broke our memstore sizing: HBASE-4722 . It takes memstore flush size outside of an update lock (more edits may have come in in meantime).
        Hide
        stack added a comment -

        Here's a fix. I need a review given how this patch is actually revert of two commits I've made – one recent and another a couple of months ago.

        Show
        stack added a comment - Here's a fix. I need a review given how this patch is actually revert of two commits I've made – one recent and another a couple of months ago.
        Hide
        stack added a comment -

        Here's some explaination:

        M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        On flush of memstores, we were decrementing the global region
        memory size by the size of the global memstore AT THE TIME OF
        THE DECREMENT rather than decrementing by the flush size (some
        edits may very well have come in in between the setup of flush
        and decrement time). This change undoes a brain-dead change
        of mine in hbase-4722. That broke this.
        M src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        Remove flagging of the original problem, our leaving an old
        edit id in the lastSeqWritten for a region that was offline.

        I tried to write a test but its too tough at mo. You need to get some edits into the memstore AFTER the update lock is freed down in internalFlushCache but BEFORE we decrement memstore size. Only way to make it work would be by mod'ing HRegion to insert a do-nothing method. Too dumb.

        Show
        stack added a comment - Here's some explaination: M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java On flush of memstores, we were decrementing the global region memory size by the size of the global memstore AT THE TIME OF THE DECREMENT rather than decrementing by the flush size (some edits may very well have come in in between the setup of flush and decrement time). This change undoes a brain-dead change of mine in hbase-4722. That broke this. M src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java Remove flagging of the original problem, our leaving an old edit id in the lastSeqWritten for a region that was offline. I tried to write a test but its too tough at mo. You need to get some edits into the memstore AFTER the update lock is freed down in internalFlushCache but BEFORE we decrement memstore size. Only way to make it work would be by mod'ing HRegion to insert a do-nothing method. Too dumb.
        Hide
        stack added a comment -

        Same patch with better variable naming.

        Show
        stack added a comment - Same patch with better variable naming.
        Hide
        Ted Yu added a comment -

        With patch v5, I got the following:

        testGlobalMemStore(org.apache.hadoop.hbase.TestGlobalMemStoreSize)  Time elapsed: 11.516 sec  <<< FAILURE!
        java.lang.AssertionError: Server=10.246.204.31,62993,1322086547613, i=0 expected:<0> but was:<608>
        

        Here is tail of test output:

        2011-11-23 14:15:55,955 INFO  [main] regionserver.Store(631): Added hdfs://localhost:62971/user/zhihyu/.META./1028785192/info/6d51d01d9498464eb025ca045e696ce4, entries=47, sequenceid=36, filesize=8.4k
        2011-11-23 14:15:55,956 INFO  [main] regionserver.HRegion(1396): Finished memstore flush of ~17.2k/17608 for region .META.,,1.1028785192 in 44ms, sequenceid=36, compaction requested=false
        2011-11-23 14:15:55,956 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush .META.,,1.1028785192 on 10.246.204.31,62993,1322086547613, false, size=608
        2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,,1322086555196.e2b7276e785c7f6213a5bdd08a54cf8e. on 10.246.204.31,62993,1322086547613, false, size=608
        2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,c,P\xE3+,1322086555201.2c847584e6af6e64f3bae631bd722934. on 10.246.204.31,62993,1322086547613, false, size=608
        2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,q\x83\xCC\xF1{,1322086555217.f5079469f9fa696de61b9db6364cd6e7. on 10.246.204.31,62993,1322086547613, false, size=608
        2011-11-23 14:15:55,957 INFO  [main] hbase.TestGlobalMemStoreSize(101): Post flush on 10.246.204.31,62993,1322086547613
        

        Basically there was no mentioning of flush completion for TestGlobalMemStoreSize table.

        I think we should add a log before the assertion so that we know how long we spent waiting in the while loop:

              assertEquals("Server=" + server.getServerName() + ", i=" + i++, 0,
                server.getRegionServerAccounting().getGlobalMemstoreSize());
        

        We should increase the wait time beyond 3 seconds.

        Show
        Ted Yu added a comment - With patch v5, I got the following: testGlobalMemStore(org.apache.hadoop.hbase.TestGlobalMemStoreSize) Time elapsed: 11.516 sec <<< FAILURE! java.lang.AssertionError: Server=10.246.204.31,62993,1322086547613, i=0 expected:<0> but was:<608> Here is tail of test output: 2011-11-23 14:15:55,955 INFO [main] regionserver.Store(631): Added hdfs: //localhost:62971/user/zhihyu/.META./1028785192/info/6d51d01d9498464eb025ca045e696ce4, entries=47, sequenceid=36, filesize=8.4k 2011-11-23 14:15:55,956 INFO [main] regionserver.HRegion(1396): Finished memstore flush of ~17.2k/17608 for region .META.,,1.1028785192 in 44ms, sequenceid=36, compaction requested= false 2011-11-23 14:15:55,956 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush .META.,,1.1028785192 on 10.246.204.31,62993,1322086547613, false , size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,,1322086555196.e2b7276e785c7f6213a5bdd08a54cf8e. on 10.246.204.31,62993,1322086547613, false , size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,c,P\xE3+,1322086555201.2c847584e6af6e64f3bae631bd722934. on 10.246.204.31,62993,1322086547613, false , size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,q\x83\xCC\xF1{,1322086555217.f5079469f9fa696de61b9db6364cd6e7. on 10.246.204.31,62993,1322086547613, false , size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(101): Post flush on 10.246.204.31,62993,1322086547613 Basically there was no mentioning of flush completion for TestGlobalMemStoreSize table. I think we should add a log before the assertion so that we know how long we spent waiting in the while loop: assertEquals( "Server=" + server.getServerName() + ", i=" + i++, 0, server.getRegionServerAccounting().getGlobalMemstoreSize()); We should increase the wait time beyond 3 seconds.
        Hide
        stack added a comment -

        hmm... that don't fail for me and the change shouldn't effect this test.

        Show
        stack added a comment - hmm... that don't fail for me and the change shouldn't effect this test.
        Hide
        Ted Yu added a comment -

        We should let TestGlobalMemStoreSize pass consistently. HBASE-4722 tried to solve this issue.

        Show
        Ted Yu added a comment - We should let TestGlobalMemStoreSize pass consistently. HBASE-4722 tried to solve this issue.
        Hide
        stack added a comment -

        Nvm. I ran it a few times. There is something going on here. Says N bytes but we are not flushing. Let me take a look.

        Show
        stack added a comment - Nvm. I ran it a few times. There is something going on here. Says N bytes but we are not flushing. Let me take a look.
        Hide
        stack added a comment -

        I think its something other than timeout and my patch (or my patch uncovers messed-up ness that was going on here). Let me keep at it.

        Show
        stack added a comment - I think its something other than timeout and my patch (or my patch uncovers messed-up ness that was going on here). Let me keep at it.
        Hide
        stack added a comment -

        v7 still needs work; i'd like to nail TestGlobalMemStore before putting it up for review (worst comes to worst, I'll take Ted's trick)

        Show
        stack added a comment - v7 still needs work; i'd like to nail TestGlobalMemStore before putting it up for review (worst comes to worst, I'll take Ted's trick)
        Hide
        stack added a comment -

        The odd failing of TestGlobalMemStoreSize was edits going into meta while other regions were flushing. Moved things around so we flush meta last. That should do it.

        Show
        stack added a comment - The odd failing of TestGlobalMemStoreSize was edits going into meta while other regions were flushing. Moved things around so we flush meta last. That should do it.
        Hide
        Ted Yu added a comment -

        With patch v8, I got the following after 4 iterations:

        testGlobalMemStore(org.apache.hadoop.hbase.TestGlobalMemStoreSize)  Time elapsed: 9.521 sec  <<< FAILURE!
        java.lang.AssertionError: Server=10.246.204.31,53367,1322094489131, i=2 expected:<0> but was:<608>
          at org.junit.Assert.fail(Assert.java:93)
          at org.junit.Assert.failNotEquals(Assert.java:647)
          at org.junit.Assert.assertEquals(Assert.java:128)
          at org.junit.Assert.assertEquals(Assert.java:472)
          at org.apache.hadoop.hbase.TestGlobalMemStoreSize.testGlobalMemStore(TestGlobalMemStoreSize.java:128)
        

        Here is tail of test output:

        2011-11-23 16:28:15,599 INFO  [main] regionserver.StoreFile$Reader(1455): Loaded Delete Family Bloom (CompoundBloomFilter) metadata for d080dcfc9b854237b0e8bd884f1e395b
        2011-11-23 16:28:15,600 DEBUG [main] regionserver.Store(610): Renaming flushed file at hdfs://localhost:53343/user/zhihyu/.META./1028785192/.tmp/d080dcfc9b854237b0e8bd884f1e395b to hdfs://localhost:53343/user/zhihyu/.META./1028785192/info/d080dcfc9b854237b0e8bd884f1e395b
        2011-11-23 16:28:15,606 INFO  [main] regionserver.StoreFile$Reader(1455): Loaded Delete Family Bloom (CompoundBloomFilter) metadata for d080dcfc9b854237b0e8bd884f1e395b
        2011-11-23 16:28:15,606 INFO  [main] regionserver.Store(631): Added hdfs://localhost:53343/user/zhihyu/.META./1028785192/info/d080dcfc9b854237b0e8bd884f1e395b, entries=47, sequenceid=36, filesize=8.4k
        2011-11-23 16:28:15,607 INFO  [main] regionserver.HRegion(1394): Finished memstore flush of ~16.6k/17000 for region .META.,,1.1028785192 in 437ms, sequenceid=36, compaction requested=false
        2011-11-23 16:28:15,607 INFO  [main] hbase.TestGlobalMemStoreSize(142): Flush .META.,,1.1028785192 on 10.246.204.31,53367,1322094489131, false, size=608
        2011-11-23 16:28:15,607 INFO  [main] hbase.TestGlobalMemStoreSize(112): Post flush on 10.246.204.31,53367,1322094489131
        2011-11-23 16:28:15,607 INFO  [main] hbase.TestGlobalMemStoreSize(125): .META.,,1.1028785192 608
        2011-11-23 16:28:15,607 INFO  [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,aaaaa,1322094494548.21d8ad184c650c0be744b624a11078c1. 0
        2011-11-23 16:28:15,607 INFO  [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,f\xC2/\xE6\xBF,1322094494555.9db9aa6f0a4f29d6f449209c319cd77a. 0
        2011-11-23 16:28:15,607 INFO  [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,l"\xFEl\x1D,1322094494561.0182399055dc627bb152408d8d2234dd. 0
        2011-11-23 16:28:15,608 INFO  [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,m\xED\xED\xED\xE7,1322094494563.6639f54133a3ef8c6b11a9491030f7b6. 0
        2011-11-23 16:28:15,608 INFO  [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,zzzzz,1322094494579.48c6f4e6614fc095fdfe44469df28b5a. 0
        

        Looks like the following should be performed outside the for loop:

              // If meta, flush it last
              if (meta != null) flush(meta, server);
        

        This means the assertion should be done in a separate for loop.

        Show
        Ted Yu added a comment - With patch v8, I got the following after 4 iterations: testGlobalMemStore(org.apache.hadoop.hbase.TestGlobalMemStoreSize) Time elapsed: 9.521 sec <<< FAILURE! java.lang.AssertionError: Server=10.246.204.31,53367,1322094489131, i=2 expected:<0> but was:<608> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hbase.TestGlobalMemStoreSize.testGlobalMemStore(TestGlobalMemStoreSize.java:128) Here is tail of test output: 2011-11-23 16:28:15,599 INFO [main] regionserver.StoreFile$Reader(1455): Loaded Delete Family Bloom (CompoundBloomFilter) metadata for d080dcfc9b854237b0e8bd884f1e395b 2011-11-23 16:28:15,600 DEBUG [main] regionserver.Store(610): Renaming flushed file at hdfs: //localhost:53343/user/zhihyu/.META./1028785192/.tmp/d080dcfc9b854237b0e8bd884f1e395b to hdfs://localhost:53343/user/zhihyu/.META./1028785192/info/d080dcfc9b854237b0e8bd884f1e395b 2011-11-23 16:28:15,606 INFO [main] regionserver.StoreFile$Reader(1455): Loaded Delete Family Bloom (CompoundBloomFilter) metadata for d080dcfc9b854237b0e8bd884f1e395b 2011-11-23 16:28:15,606 INFO [main] regionserver.Store(631): Added hdfs: //localhost:53343/user/zhihyu/.META./1028785192/info/d080dcfc9b854237b0e8bd884f1e395b, entries=47, sequenceid=36, filesize=8.4k 2011-11-23 16:28:15,607 INFO [main] regionserver.HRegion(1394): Finished memstore flush of ~16.6k/17000 for region .META.,,1.1028785192 in 437ms, sequenceid=36, compaction requested= false 2011-11-23 16:28:15,607 INFO [main] hbase.TestGlobalMemStoreSize(142): Flush .META.,,1.1028785192 on 10.246.204.31,53367,1322094489131, false , size=608 2011-11-23 16:28:15,607 INFO [main] hbase.TestGlobalMemStoreSize(112): Post flush on 10.246.204.31,53367,1322094489131 2011-11-23 16:28:15,607 INFO [main] hbase.TestGlobalMemStoreSize(125): .META.,,1.1028785192 608 2011-11-23 16:28:15,607 INFO [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,aaaaa,1322094494548.21d8ad184c650c0be744b624a11078c1. 0 2011-11-23 16:28:15,607 INFO [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,f\xC2/\xE6\xBF,1322094494555.9db9aa6f0a4f29d6f449209c319cd77a. 0 2011-11-23 16:28:15,607 INFO [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,l"\xFEl\x1D,1322094494561.0182399055dc627bb152408d8d2234dd. 0 2011-11-23 16:28:15,608 INFO [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,m\xED\xED\xED\xE7,1322094494563.6639f54133a3ef8c6b11a9491030f7b6. 0 2011-11-23 16:28:15,608 INFO [main] hbase.TestGlobalMemStoreSize(125): TestGlobalMemStoreSize,zzzzz,1322094494579.48c6f4e6614fc095fdfe44469df28b5a. 0 Looks like the following should be performed outside the for loop: // If meta, flush it last if (meta != null ) flush(meta, server); This means the assertion should be done in a separate for loop.
        Hide
        stack added a comment -

        Thanks for the testing Ted. Yeah, I noticed that after uploading patch myself. Here is another patch that waits even less time than the 3 seconds that used to be in place but if we find that not all memstores are flushed, we assert the problem region is meta, tna then we give the meta another flush on way out to see if that'll clear mem. Adds better debugging output too. Works for me ten times in a row.

        Show
        stack added a comment - Thanks for the testing Ted. Yeah, I noticed that after uploading patch myself. Here is another patch that waits even less time than the 3 seconds that used to be in place but if we find that not all memstores are flushed, we assert the problem region is meta, tna then we give the meta another flush on way out to see if that'll clear mem. Adds better debugging output too. Works for me ten times in a row.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12504941/4853-v6.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -162 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestAdmin

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/353//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/353//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/353//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504941/4853-v6.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/353//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/353//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/353//console This message is automatically generated.
        Hide
        stack added a comment -

        Same patch. Trigger another build.

        Show
        stack added a comment - Same patch. Trigger another build.
        Hide
        stack added a comment -

        Trigger build on last version.

        Show
        stack added a comment - Trigger build on last version.
        Hide
        Ted Yu added a comment -

        Patch v9 is good.

        Minor comment:

              // it could get edits while other flushes are running since this a
        

        I think the tail should read 'since this is a'

        Show
        Ted Yu added a comment - Patch v9 is good. Minor comment: // it could get edits while other flushes are running since this a I think the tail should read 'since this is a'
        Hide
        stack added a comment -

        What I applied (had to do little wiggling to get it into 0.92). Applied branch and 0.92.

        Show
        stack added a comment - What I applied (had to do little wiggling to get it into 0.92). Applied branch and 0.92.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12504984/4853-v10.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/361//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504984/4853-v10.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/361//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12504960/4853-v9.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -162 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestAdmin
        org.apache.hadoop.hbase.client.TestInstantSchemaChange

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/359//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/359//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/359//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504960/4853-v9.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestInstantSchemaChange Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/359//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/359//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/359//console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #12 (See https://builds.apache.org/job/HBase-0.92-security/12/)
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids

        stack :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #12 (See https://builds.apache.org/job/HBase-0.92-security/12/ ) HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #7 (See https://builds.apache.org/job/HBase-TRUNK-security/7/)
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #7 (See https://builds.apache.org/job/HBase-TRUNK-security/7/ ) HBASE-4853 HBASE-4789 does overzealous pruning of seqids HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2477 (See https://builds.apache.org/job/HBase-TRUNK/2477/)
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2477 (See https://builds.apache.org/job/HBase-TRUNK/2477/ ) HBASE-4853 HBASE-4789 does overzealous pruning of seqids HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #163 (See https://builds.apache.org/job/HBase-0.92/163/)
        HBASE-4853 HBASE-4789 does overzealous pruning of seqids

        stack :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #163 (See https://builds.apache.org/job/HBase-0.92/163/ ) HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java
        Hide
        Jean-Daniel Cryans added a comment -

        Closing as it was committed.

        Show
        Jean-Daniel Cryans added a comment - Closing as it was committed.

          People

          • Assignee:
            stack
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development