HBase
  1. HBase
  2. HBASE-5778

Fix HLog compression's incompatibilities

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.4, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I ran some tests to verify if WAL compression should be turned on by default.

      For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL).

      When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values).

      Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs.

      1. HBASE-5778-trunk-v7.patch
        25 kB
        Jean-Daniel Cryans
      2. HBASE-5778-trunk-v6.patch
        22 kB
        Jean-Daniel Cryans
      3. HBASE-5778-0.94-v7.patch
        26 kB
        Jean-Daniel Cryans
      4. HBASE-5778-0.94-v6.patch
        21 kB
        Jean-Daniel Cryans
      5. HBASE-5778-0.94-v5.patch
        21 kB
        Jean-Daniel Cryans
      6. HBASE-5778-0.94-v4.patch
        19 kB
        Jean-Daniel Cryans
      7. HBASE-5778-0.94-v3.patch
        20 kB
        Jean-Daniel Cryans
      8. HBASE-5778-0.94-v2.patch
        13 kB
        Jean-Daniel Cryans
      9. HBASE-5778-0.94.patch
        11 kB
        Jean-Daniel Cryans
      10. HBASE-5778.patch
        0.8 kB
        Jean-Daniel Cryans
      11. 5778-addendum.txt
        3 kB
        Lars Hofhansl
      12. 5778.addendum
        1.0 kB
        Ted Yu

        Issue Links

          Activity

          Hide
          Jean-Daniel Cryans added a comment -

          I believe only this line is needed.

          Show
          Jean-Daniel Cryans added a comment - I believe only this line is needed.
          Hide
          Todd Lipcon added a comment -

          Do we have this in hbase-default.xml as well? if not, +1

          Show
          Todd Lipcon added a comment - Do we have this in hbase-default.xml as well? if not, +1
          Hide
          Jean-Daniel Cryans added a comment -

          It's not in there, do we want it since we turn it on? Or do we act like we always had it?

          Show
          Jean-Daniel Cryans added a comment - It's not in there, do we want it since we turn it on? Or do we act like we always had it?
          Hide
          Lars Hofhansl added a comment -

          +1 on patch

          Show
          Lars Hofhansl added a comment - +1 on patch
          Hide
          stack added a comment -

          +1 Add release note w/ how to turn it off

          Show
          stack added a comment - +1 Add release note w/ how to turn it off
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2749 (See https://builds.apache.org/job/HBase-TRUNK/2749/)
          HBASE-5778 Turn on WAL compression by default (Revision 1325566)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2749 (See https://builds.apache.org/job/HBase-TRUNK/2749/ ) HBASE-5778 Turn on WAL compression by default (Revision 1325566) Result = FAILURE jdcryans : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #109 (See https://builds.apache.org/job/HBase-0.94/109/)
          HBASE-5778 Turn on WAL compression by default (Revision 1325567)

          Result = SUCCESS
          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #109 (See https://builds.apache.org/job/HBase-0.94/109/ ) HBASE-5778 Turn on WAL compression by default (Revision 1325567) Result = SUCCESS jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Hide
          Lars Hofhansl added a comment -

          I see a bunch of suspicious test failures now:

          java.lang.NegativeArraySizeException
          	at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:305)
          
          Show
          Lars Hofhansl added a comment - I see a bunch of suspicious test failures now: java.lang.NegativeArraySizeException at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:305)
          Hide
          Lars Hofhansl added a comment -

          Yeah... The failures in TestHLog are because of this. Need to rollback or figure out what the problem is. Probably test related.

          Show
          Lars Hofhansl added a comment - Yeah... The failures in TestHLog are because of this. Need to rollback or figure out what the problem is. Probably test related.
          Hide
          Lars Hofhansl added a comment -

          TestHLog.testAppendClose() uses this to read back the WALEdits:

          // Make sure you can read all the content
              SequenceFile.Reader reader
                = new SequenceFile.Reader(this.fs, walPath, this.conf);
          

          Well, dah, that does not work.

          Show
          Lars Hofhansl added a comment - TestHLog.testAppendClose() uses this to read back the WALEdits: // Make sure you can read all the content SequenceFile.Reader reader = new SequenceFile.Reader( this .fs, walPath, this .conf); Well, dah, that does not work.
          Hide
          Lars Hofhansl added a comment -

          Then there's FaultySequenceFileLogReader, which does not do the right thing.

          Show
          Lars Hofhansl added a comment - Then there's FaultySequenceFileLogReader, which does not do the right thing.
          Hide
          Lars Hofhansl added a comment -

          Have a fix for TestHLog, working on TestHLogSplit

          Show
          Lars Hofhansl added a comment - Have a fix for TestHLog, working on TestHLogSplit
          Hide
          Ted Yu added a comment -

          Is this what you had for TestHLog ?

          Show
          Ted Yu added a comment - Is this what you had for TestHLog ?
          Hide
          Lars Hofhansl added a comment -

          Similar... There're some other issues. I'll have a patch soon.

          Show
          Lars Hofhansl added a comment - Similar... There're some other issues. I'll have a patch soon.
          Hide
          Lars Hofhansl added a comment -

          Fixes the issues I found. It's not too surprising that a compressed HLog is a bit more suseptible to corruption as there is less redundancy.

          Show
          Lars Hofhansl added a comment - Fixes the issues I found. It's not too surprising that a compressed HLog is a bit more suseptible to corruption as there is less redundancy.
          Hide
          Lars Hofhansl added a comment -

          Running through HadoopQA to see if there are other problems left.

          Show
          Lars Hofhansl added a comment - Running through HadoopQA to see if there are other problems left.
          Hide
          Ted Yu added a comment -
          +      } catch (IndexOutOfBoundsException iobe) {
          +        // this can happen with a corrupted file, fall through
          +      }
          

          I think we should note down the cause of failure to retrieve dictionary entry and provide clearer message in the IOE below:

                 if (entry == null) {
                   throw new IOException("Missing dictionary entry for index "
                       + dictIdx);
          
          Show
          Ted Yu added a comment - + } catch (IndexOutOfBoundsException iobe) { + // this can happen with a corrupted file, fall through + } I think we should note down the cause of failure to retrieve dictionary entry and provide clearer message in the IOE below: if (entry == null ) { throw new IOException( "Missing dictionary entry for index " + dictIdx);
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522527/5778-addendum.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
          org.apache.hadoop.hbase.replication.TestMasterReplication

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1507//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1507//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1507//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522527/5778-addendum.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1507//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1507//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1507//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          mvn failed with an OOME. Let's revert this change, until we track these issues down.

          Show
          Lars Hofhansl added a comment - mvn failed with an OOME. Let's revert this change, until we track these issues down.
          Hide
          Ted Yu added a comment -

          The remaining issue is about how the replication sink correctly decompresses WAL.
          From test output, I saw:

          java.io.EOFException
            at java.io.DataInputStream.readFully(DataInputStream.java:180)
            at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2243)
            at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2249)
            at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:129)
            at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1700)
          

          For replication sink, there is no CompressionContext in HLog$Entry which can be used to perform decompression.

          I agree the change should be reverted.

          Show
          Ted Yu added a comment - The remaining issue is about how the replication sink correctly decompresses WAL. From test output, I saw: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2243) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2249) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:129) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1700) For replication sink, there is no CompressionContext in HLog$Entry which can be used to perform decompression. I agree the change should be reverted.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #169 (See https://builds.apache.org/job/HBase-TRUNK-security/169/)
          HBASE-5778 Turn on WAL compression by default (Revision 1325566)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #169 (See https://builds.apache.org/job/HBase-TRUNK-security/169/ ) HBASE-5778 Turn on WAL compression by default (Revision 1325566) Result = FAILURE jdcryans : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Hide
          stack added a comment -

          I backed it out of 0.94 and trunk.

          Show
          stack added a comment - I backed it out of 0.94 and trunk.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2752 (See https://builds.apache.org/job/HBase-TRUNK/2752/)
          HBASE-5778 Turn on WAL compression by default (Revision 1325801)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2752 (See https://builds.apache.org/job/HBase-TRUNK/2752/ ) HBASE-5778 Turn on WAL compression by default (Revision 1325801) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #113 (See https://builds.apache.org/job/HBase-0.94/113/)
          HBASE-5778 Turn on WAL compression by default: REVERT (Revision 1325803)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #113 (See https://builds.apache.org/job/HBase-0.94/113/ ) HBASE-5778 Turn on WAL compression by default: REVERT (Revision 1325803) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Hide
          Jean-Daniel Cryans added a comment -

          Sorry for all the trouble guys, I thought the feature was more tested than that

          Show
          Jean-Daniel Cryans added a comment - Sorry for all the trouble guys, I thought the feature was more tested than that
          Hide
          Lars Hofhansl added a comment -

          NP... I thought it was ready.
          I am surprised that ReplicationSink has a problem, as it is the ReplicationSource that reads from the HLog.

          Show
          Lars Hofhansl added a comment - NP... I thought it was ready. I am surprised that ReplicationSink has a problem, as it is the ReplicationSource that reads from the HLog.
          Hide
          Ted Yu added a comment -

          In the release notes for 0.94.0, we need to note that WAL compression must be disabled in order for replication to work.

          Show
          Ted Yu added a comment - In the release notes for 0.94.0, we need to note that WAL compression must be disabled in order for replication to work.
          Hide
          Lars Hofhansl added a comment -

          I still don't understand why this is a problem with replication. J-D do you have any insights?

          Show
          Lars Hofhansl added a comment - I still don't understand why this is a problem with replication. J-D do you have any insights?
          Hide
          Jean-Daniel Cryans added a comment -

          I haven't had a look, but I'd guess that if we're reading files that are being written then we don't have access to the dict.

          Show
          Jean-Daniel Cryans added a comment - I haven't had a look, but I'd guess that if we're reading files that are being written then we don't have access to the dict.
          Hide
          Lars Hofhansl added a comment -

          Oh I see. The KVs are only decompressed when read.

          Show
          Lars Hofhansl added a comment - Oh I see. The KVs are only decompressed when read.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #9 (See https://builds.apache.org/job/HBase-0.94-security/9/)
          HBASE-5778 Turn on WAL compression by default: REVERT (Revision 1325803)
          HBASE-5778 Turn on WAL compression by default (Revision 1325567)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java

          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #9 (See https://builds.apache.org/job/HBase-0.94-security/9/ ) HBASE-5778 Turn on WAL compression by default: REVERT (Revision 1325803) HBASE-5778 Turn on WAL compression by default (Revision 1325567) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12522527/5778-addendum.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestWALPlayer
          org.apache.hadoop.hbase.coprocessor.TestClassLoading

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1515//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1515//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1515//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522527/5778-addendum.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestWALPlayer org.apache.hadoop.hbase.coprocessor.TestClassLoading Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1515//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1515//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1515//console This message is automatically generated.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #170 (See https://builds.apache.org/job/HBase-TRUNK-security/170/)
          HBASE-5778 Turn on WAL compression by default (Revision 1325801)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #170 (See https://builds.apache.org/job/HBase-TRUNK-security/170/ ) HBASE-5778 Turn on WAL compression by default (Revision 1325801) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          Hide
          Lars Hofhansl added a comment -

          This fundamentally break replication.

          The problem above is actually that the HLogKey and WALEdit after being read from a compressed HLog have the compression context set, and hence this will be used to compress them when sent over the wire to the sink. Of course the sink does not know how to uncompress.

          So I just set the compression context to null in ReplicationSource.

          With that hurdle out of the way, I find that seeking to a specific position in the HLog (the position stored in ZK) does not work, because the dictionary is not build up (compressed HLogs always need to read from the beginning).

          Not sure how to fix the 2nd part.

          Show
          Lars Hofhansl added a comment - This fundamentally break replication. The problem above is actually that the HLogKey and WALEdit after being read from a compressed HLog have the compression context set, and hence this will be used to compress them when sent over the wire to the sink. Of course the sink does not know how to uncompress. So I just set the compression context to null in ReplicationSource. With that hurdle out of the way, I find that seeking to a specific position in the HLog (the position stored in ZK) does not work, because the dictionary is not build up (compressed HLogs always need to read from the beginning). Not sure how to fix the 2nd part.
          Hide
          Ted Yu added a comment -

          I think ReplicationSource now has the additional responsibility of shipping dictionaries to replication sink.
          We just need to find a clean way of exposing SequenceFileLogWriter.compressionContext to ReplicationSource.

          Show
          Ted Yu added a comment - I think ReplicationSource now has the additional responsibility of shipping dictionaries to replication sink. We just need to find a clean way of exposing SequenceFileLogWriter.compressionContext to ReplicationSource.
          Hide
          Lars Hofhansl added a comment -

          @Ted: Unfortunately it is not as simple as that. As I tried to explain above, the ReplicationSource reads from the WAL files at offsets that are stored in ZK. This does not work any longer, as you can no longer start reading the WAL at an offset. The files need to be read from the beginning to build up the dictionary.

          Show
          Lars Hofhansl added a comment - @Ted: Unfortunately it is not as simple as that. As I tried to explain above, the ReplicationSource reads from the WAL files at offsets that are stored in ZK. This does not work any longer, as you can no longer start reading the WAL at an offset. The files need to be read from the beginning to build up the dictionary.
          Hide
          Jean-Daniel Cryans added a comment -

          The files need to be read from the beginning to build up the dictionary.

          Aren't the dictionary entries spread out in the log? If so, it should be possible to slowly build it up as we tail the log (that's another feature that's broken, tailing).

          Then if you replay so WAL from another region server, for the first log you'd read from the beginning in order to build up the dict then when you hit the offset that's in ZK you start shipping.

          Show
          Jean-Daniel Cryans added a comment - The files need to be read from the beginning to build up the dictionary. Aren't the dictionary entries spread out in the log? If so, it should be possible to slowly build it up as we tail the log (that's another feature that's broken, tailing). Then if you replay so WAL from another region server, for the first log you'd read from the beginning in order to build up the dict then when you hit the offset that's in ZK you start shipping.
          Hide
          Lars Hofhansl added a comment -

          If we could tail the logs it would work. We just cannot seek into an HLog in the middle and start reading from it.

          Show
          Lars Hofhansl added a comment - If we could tail the logs it would work. We just cannot seek into an HLog in the middle and start reading from it.
          Hide
          Jean-Daniel Cryans added a comment -

          I think everything is fine then

          Show
          Jean-Daniel Cryans added a comment - I think everything is fine then
          Hide
          Lars Hofhansl added a comment -

          Hmm... Then that does not explain what I saw. I saw the ReplicationSource trying to read from a position in the file (indicated by ZK) and then the read failing because the dictionary was not built up.

          Show
          Lars Hofhansl added a comment - Hmm... Then that does not explain what I saw. I saw the ReplicationSource trying to read from a position in the file (indicated by ZK) and then the read failing because the dictionary was not built up.
          Hide
          Lars Hofhansl added a comment -

          Unscheduling for now.

          Show
          Lars Hofhansl added a comment - Unscheduling for now.
          Hide
          Li Pi added a comment -

          How far can replication lag behind our writes? If we can guarantee that an entry won't be evicted before replication, we can simply consult the main dictionary to decompress it.

          Show
          Li Pi added a comment - How far can replication lag behind our writes? If we can guarantee that an entry won't be evicted before replication, we can simply consult the main dictionary to decompress it.
          Hide
          Lars Hofhansl added a comment -

          We do that guarantee (J-D, please correct me if I'm wrong).
          The problem - I think - is that replication directly seeks to the position indicated in ZK and starts playing logs from there. That would not longer be possible, instead we'd have to start from the beginning of the WAL file and scan all the way to the position that we want to replicate.
          Again, I think that is what the problem is, J-D will probably know more here.

          Show
          Lars Hofhansl added a comment - We do that guarantee (J-D, please correct me if I'm wrong). The problem - I think - is that replication directly seeks to the position indicated in ZK and starts playing logs from there. That would not longer be possible, instead we'd have to start from the beginning of the WAL file and scan all the way to the position that we want to replicate. Again, I think that is what the problem is, J-D will probably know more here.
          Hide
          Jean-Daniel Cryans added a comment -

          I don't see how in theory the seek can be a problem when tail'ing a log from the start since we read the whole file. The only case where it will need to be handled differently is when a region server needs to replicate a log that another RS started working on but died. In that case we can just read the file up to the last seek position but don't replicate anything.

          Show
          Jean-Daniel Cryans added a comment - I don't see how in theory the seek can be a problem when tail'ing a log from the start since we read the whole file. The only case where it will need to be handled differently is when a region server needs to replicate a log that another RS started working on but died. In that case we can just read the file up to the last seek position but don't replicate anything.
          Hide
          Jean-Daniel Cryans added a comment -

          Attaching a first pass on making replication and HLog compression best buddies (here against 0.94).

          Most of the changes are leaks since I need the context all over the place.

          The meatier part is keeping track of the context in ReplicationSource. Basically we get a new one the first time we read the HLog then we just keep passing it back. I made sure to set the context to null when sending WALEdits to the sink.

          The second part was managing the missing dict entries when recovering a log with a last known position. I could think of a few solutions:

          • Reset the last known position back to 0 and resend all the edits. Basically this ignores the problem.
          • Add a "fast forward" method in the code to just read the file up to the last known position.
          • Introduce new checks in order to read the log from 0 (using the normal code path) but then skip all the entries until we get to the last known position.

          I implemented the last one. It adds a lot of new things to track which I don't like but it should be "correct".

          I also added a new test which is just enabling WAL compression on TestReplication's master cluster. Everything passes.

          Show
          Jean-Daniel Cryans added a comment - Attaching a first pass on making replication and HLog compression best buddies (here against 0.94). Most of the changes are leaks since I need the context all over the place. The meatier part is keeping track of the context in ReplicationSource. Basically we get a new one the first time we read the HLog then we just keep passing it back. I made sure to set the context to null when sending WALEdits to the sink. The second part was managing the missing dict entries when recovering a log with a last known position. I could think of a few solutions: Reset the last known position back to 0 and resend all the edits. Basically this ignores the problem. Add a "fast forward" method in the code to just read the file up to the last known position. Introduce new checks in order to read the log from 0 (using the normal code path) but then skip all the entries until we get to the last known position. I implemented the last one. It adds a lot of new things to track which I don't like but it should be "correct". I also added a new test which is just enabling WAL compression on TestReplication's master cluster. Everything passes.
          Hide
          Ted Yu added a comment -

          Is the goal to turn on WAL compression by default ?
          If so, do you plan to address the test failure mentioned @ 13/Apr/12 02:53 ?

          testAppendClose(org.apache.hadoop.hbase.regionserver.wal.TestHLog)  Time elapsed: 0.104 sec  <<< ERROR!
          java.lang.NegativeArraySizeException
            at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:303)
            at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1894)
            at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1934)
            at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppendClose(TestHLog.java:483)
          
          Show
          Ted Yu added a comment - Is the goal to turn on WAL compression by default ? If so, do you plan to address the test failure mentioned @ 13/Apr/12 02:53 ? testAppendClose(org.apache.hadoop.hbase.regionserver.wal.TestHLog) Time elapsed: 0.104 sec <<< ERROR! java.lang.NegativeArraySizeException at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:303) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1894) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1934) at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testAppendClose(TestHLog.java:483)
          Hide
          Jean-Daniel Cryans added a comment -

          If so, do you plan to address the test failure mentioned @ 13/Apr/12 02:53 ?

          Eventually I'd like to turn it on by default but I was mostly interested in making replication work first.

          So I took a look at testAppendClose and it was a simple matter of changing the reader to use the one that HBase provides. In that regard I'd say that the test was doing something wrong. The effect was that the SF reader, knowing nothing about compression, couldn't read compressed HLog entries.

          This v2 patch fixes the test for when WAL compression is enabled.

          Show
          Jean-Daniel Cryans added a comment - If so, do you plan to address the test failure mentioned @ 13/Apr/12 02:53 ? Eventually I'd like to turn it on by default but I was mostly interested in making replication work first. So I took a look at testAppendClose and it was a simple matter of changing the reader to use the one that HBase provides. In that regard I'd say that the test was doing something wrong. The effect was that the SF reader, knowing nothing about compression, couldn't read compressed HLog entries. This v2 patch fixes the test for when WAL compression is enabled.
          Hide
          stack added a comment -

          Adding compression context to the general HLog Interface seems incorrect to me. This kinda of thing will not make sense for all implementations of HLog. We are going against the effort which tries to turn HLog into an Interface with this patch as is.

          Ditto on ReplicationSource having to know anything about HLog compression, carrying compression context (This seems 'off' having to do this in ReplicationSource --> +import org.apache.hadoop.hbase.regionserver.wal.CompressionContext. What happens if HLog has a different kind of compression than our current type? All will break?

          This seems wrong having to do this over in ReplicationSource:

          +        // If we're compressing logs and the oldest recovered log's last position is greater
          +        // than 0, we need to rebuild the dictionary up to that point without replicating
          +        // the edits again. The rebuilding part is simply done by reading the log.
          

          Why can't the internal implementation do the skipping if dictionary is empty and we are at an offset > 0?

          Rather than passing compression context to SequenceFileLogReader, can we not have a CompressedSequenceLogReader and internally it manages compression contexts not letting them outside of CSLR?

          Show
          stack added a comment - Adding compression context to the general HLog Interface seems incorrect to me. This kinda of thing will not make sense for all implementations of HLog. We are going against the effort which tries to turn HLog into an Interface with this patch as is. Ditto on ReplicationSource having to know anything about HLog compression, carrying compression context (This seems 'off' having to do this in ReplicationSource --> +import org.apache.hadoop.hbase.regionserver.wal.CompressionContext . What happens if HLog has a different kind of compression than our current type? All will break? This seems wrong having to do this over in ReplicationSource: + // If we're compressing logs and the oldest recovered log's last position is greater + // than 0, we need to rebuild the dictionary up to that point without replicating + // the edits again. The rebuilding part is simply done by reading the log. Why can't the internal implementation do the skipping if dictionary is empty and we are at an offset > 0? Rather than passing compression context to SequenceFileLogReader, can we not have a CompressedSequenceLogReader and internally it manages compression contexts not letting them outside of CSLR?
          Hide
          Jean-Daniel Cryans added a comment -

          Second pass, probably a stepping stone.

          Adds a ReplicationHLogReader that hides all the dirtiness from HLog and its compression functionality. Right now it's only a dumb extraction of code from ReplicationSource and it doesn't take care of any exception handling. It also has weird semantics like finishCurrentFile not closing the reader.

          Still passes the tests.

          Show
          Jean-Daniel Cryans added a comment - Second pass, probably a stepping stone. Adds a ReplicationHLogReader that hides all the dirtiness from HLog and its compression functionality. Right now it's only a dumb extraction of code from ReplicationSource and it doesn't take care of any exception handling. It also has weird semantics like finishCurrentFile not closing the reader. Still passes the tests.
          Hide
          Jean-Daniel Cryans added a comment -

          Attaching a patch that includes the new files, doh.

          Show
          Jean-Daniel Cryans added a comment - Attaching a patch that includes the new files, doh.
          Hide
          stack added a comment -

          This is better. Here's some comments:

          Does CompressionContext class have to be public? Can it stay pkg private? You'll have to move your new class into wal package but that seems fine to me.

          Does the base Reader interface have to know about a compression context? Can this not be internal to the implementation?

          You call it ReplicationHLogReader but is it a replication only class? If so, it does not belong in regionserver package but over in replication package.

          My sense though is that this is a generally useful WAL reader? One that can do compressed or non-compressed WAL? One that can be used by replication but also by fellas who want to index hbase, etc.

          Missing a license

          Can it be in the wal package? Then don't have to open up so much of HLog?

          Its unfortunate that you can't tell its a compressed wal from reading say some magic or metadata at the head of the file. It seems a bit broke consulting configuration.

          Yeah, why can't an implementation of HLog.Reader manage the compression context internally? Why it have to be out here in this ReplicationHLogReader class? Afterall, isn't the dictionary reconstructed on read? You don't save it around?

          So, a HLog.ReaderFactory that looks at configuration and returns a HLog.Reader that either does compressed or not by looking at configs?

          Is this right:

          + if (entry != null) {
          + entry.setCompressionContext(null);

          Show
          stack added a comment - This is better. Here's some comments: Does CompressionContext class have to be public? Can it stay pkg private? You'll have to move your new class into wal package but that seems fine to me. Does the base Reader interface have to know about a compression context? Can this not be internal to the implementation? You call it ReplicationHLogReader but is it a replication only class? If so, it does not belong in regionserver package but over in replication package. My sense though is that this is a generally useful WAL reader? One that can do compressed or non-compressed WAL? One that can be used by replication but also by fellas who want to index hbase, etc. Missing a license Can it be in the wal package? Then don't have to open up so much of HLog? Its unfortunate that you can't tell its a compressed wal from reading say some magic or metadata at the head of the file. It seems a bit broke consulting configuration. Yeah, why can't an implementation of HLog.Reader manage the compression context internally? Why it have to be out here in this ReplicationHLogReader class? Afterall, isn't the dictionary reconstructed on read? You don't save it around? So, a HLog.ReaderFactory that looks at configuration and returns a HLog.Reader that either does compressed or not by looking at configs? Is this right: + if (entry != null) { + entry.setCompressionContext(null);
          Hide
          Jean-Daniel Cryans added a comment -

          Does CompressionContext class have to be public? Can it stay pkg private? You'll have to move your new class into wal package but that seems fine to me.

          My sense though is that this is a generally useful WAL reader? One that can do compressed or non-compressed WAL? One that can be used by replication but also by fellas who want to index hbase, etc.

          Can it be in the wal package? Then don't have to open up so much of HLog?

          You're right that it can be a generally useful WAL reader, for users that need to be able to seek directly into newly open files without having to scan everything that comes before if compression is on. Right now its API is tailored to replication's need, we could make it more general but, unless we have another use case for it right now, I don't see the point.

          So I'll move it to wal and rename the class/methods a bit.

          Does the base Reader interface have to know about a compression context? Can this not be internal to the implementation?

          Until HDFS lets us tail a file under construction we need to pass the dict back when opening the file.

          Yeah, why can't an implementation of HLog.Reader manage the compression context internally? Why it have to be out here in this ReplicationHLogReader class? Afterall, isn't the dictionary reconstructed on read? You don't save it around?

          It would be fine if we didn't have to:

          • seek into a file we never read before (after a region server died and we pick up the queue)
          • reopen files in order to tail them (when normally replicating)

          We could augment HLog.Reader to support reopening of files, basically push down the stuff is doing ReplicationHLogReader even more. That way we could hide all the dirty details? I haven't thought about modifying that before.

          Missing a license

          Oh thanks.

          Its unfortunate that you can't tell its a compressed wal from reading say some magic or metadata at the head of the file. It seems a bit broke consulting configuration.

          Good point, it would simplify a lot of things. HLog compression was implemented at the HLog.Entry level though so technically it's not even the WAL itself that's compressed. My next comment shows what that means.

          Is this right:

          Yes, if you keep the compression context in there it'll replicate the HLog.Entry[] compressed with a dictionary that the slave has no knowledge of. I had this comment in my first patch and I think I forgot to move it over:

          // Setting it to null prevents from sending compressed edits that the sink wouldn't parse

          Show
          Jean-Daniel Cryans added a comment - Does CompressionContext class have to be public? Can it stay pkg private? You'll have to move your new class into wal package but that seems fine to me. My sense though is that this is a generally useful WAL reader? One that can do compressed or non-compressed WAL? One that can be used by replication but also by fellas who want to index hbase, etc. Can it be in the wal package? Then don't have to open up so much of HLog? You're right that it can be a generally useful WAL reader, for users that need to be able to seek directly into newly open files without having to scan everything that comes before if compression is on. Right now its API is tailored to replication's need, we could make it more general but, unless we have another use case for it right now, I don't see the point. So I'll move it to wal and rename the class/methods a bit. Does the base Reader interface have to know about a compression context? Can this not be internal to the implementation? Until HDFS lets us tail a file under construction we need to pass the dict back when opening the file. Yeah, why can't an implementation of HLog.Reader manage the compression context internally? Why it have to be out here in this ReplicationHLogReader class? Afterall, isn't the dictionary reconstructed on read? You don't save it around? It would be fine if we didn't have to: seek into a file we never read before (after a region server died and we pick up the queue) reopen files in order to tail them (when normally replicating) We could augment HLog.Reader to support reopening of files, basically push down the stuff is doing ReplicationHLogReader even more. That way we could hide all the dirty details? I haven't thought about modifying that before. Missing a license Oh thanks. Its unfortunate that you can't tell its a compressed wal from reading say some magic or metadata at the head of the file. It seems a bit broke consulting configuration. Good point, it would simplify a lot of things. HLog compression was implemented at the HLog.Entry level though so technically it's not even the WAL itself that's compressed. My next comment shows what that means. Is this right: Yes, if you keep the compression context in there it'll replicate the HLog.Entry[] compressed with a dictionary that the slave has no knowledge of. I had this comment in my first patch and I think I forgot to move it over: // Setting it to null prevents from sending compressed edits that the sink wouldn't parse
          Hide
          Jean-Daniel Cryans added a comment -

          This v4 of the patch pushes down the handling of reopened compressed files down to SequenceFileLogReader. The two main changes:

          • SequenceFileLogReader needs a way to be reused across multiple open/seek/close cycles. For this I added a method called "reopen". The name might be confusing.
          • ReplicationSource used to just bluntly reopen whatever currentPath is, but now this doesn't work with SFLR being kept around. To fix it I had to add a little dance in ReplicationHLogReader to verify if the path given was different (although still for the same file that was moved to .oldlogs).

          The HLog and Replication tests pass.

          Show
          Jean-Daniel Cryans added a comment - This v4 of the patch pushes down the handling of reopened compressed files down to SequenceFileLogReader. The two main changes: SequenceFileLogReader needs a way to be reused across multiple open/seek/close cycles. For this I added a method called "reopen". The name might be confusing. ReplicationSource used to just bluntly reopen whatever currentPath is, but now this doesn't work with SFLR being kept around. To fix it I had to add a little dance in ReplicationHLogReader to verify if the path given was different (although still for the same file that was moved to .oldlogs). The HLog and Replication tests pass.
          Hide
          stack added a comment -

          reopen may not be too bad. You have to explain the difference between a reopen and a getReader somewhere... as is there is none. I don't think it would take much to explain why you'd reopen (would 'reset' be a better name as in 'resetting the reader'... as to what it does reseting is implementation specific... If it is a compressed WAL, then we'd reopen the file... if not compressed, the reset is a noop – right?)?

          ReplicationHLogReader does not implement WAL HLog.Reader interface. Should it?

          This javadoc is on the wrong method:

          + * if a positionToSkipTo was specified, this method will take care of seeking there

          I think this patch is almost there.

          Show
          stack added a comment - reopen may not be too bad. You have to explain the difference between a reopen and a getReader somewhere... as is there is none. I don't think it would take much to explain why you'd reopen (would 'reset' be a better name as in 'resetting the reader'... as to what it does reseting is implementation specific... If it is a compressed WAL, then we'd reopen the file... if not compressed, the reset is a noop – right?)? ReplicationHLogReader does not implement WAL HLog.Reader interface. Should it? This javadoc is on the wrong method: + * if a positionToSkipTo was specified, this method will take care of seeking there I think this patch is almost there.
          Hide
          Jean-Daniel Cryans added a comment -

          You have to explain the difference between a reopen and a getReader somewhere...

          Can do.

          If it is a compressed WAL, then we'd reopen the file... if not compressed, the reset is a noop

          Getting a new WALReader on a file that's being written to will let us see the new length, so it's not a noop. Will add comments on that.

          This javadoc is on the wrong method:

          Actually that method used to have a positionToSkipTo paramater but yeah, removing.

          I think this patch is almost there.

          Thanks for your patience and guidance.

          Show
          Jean-Daniel Cryans added a comment - You have to explain the difference between a reopen and a getReader somewhere... Can do. If it is a compressed WAL, then we'd reopen the file... if not compressed, the reset is a noop Getting a new WALReader on a file that's being written to will let us see the new length, so it's not a noop. Will add comments on that. This javadoc is on the wrong method: Actually that method used to have a positionToSkipTo paramater but yeah, removing. I think this patch is almost there. Thanks for your patience and guidance.
          Hide
          Jean-Daniel Cryans added a comment -

          Attaching new patch with the following:

          • Added the comments that were missing
          • Removed some stale javadoc
          • Renamed ReplicationHLogReader to ReplicationHLogReaderManager since it's really what it is and it doesn't need to implement HLog.Reader
          • Fixed an unrelated bug in TestReplication where in testDisableInactivePeer the second cluster was restarted with only 1 RS, it meant that in queueFailover() we couldn't kill a RS from that cluster. It was dying on IndexArrayOutOfBoundException but never killed the test.
          • Fixed another bug in TestReplication, I saw queueFailover() failing even tho replication was still happening. Strangely the "i" seemed to have 2 different lives. I extracted that variable and verified that it's working. The test does have a junit-level timeout so while(true) is safe here.
          Show
          Jean-Daniel Cryans added a comment - Attaching new patch with the following: Added the comments that were missing Removed some stale javadoc Renamed ReplicationHLogReader to ReplicationHLogReaderManager since it's really what it is and it doesn't need to implement HLog.Reader Fixed an unrelated bug in TestReplication where in testDisableInactivePeer the second cluster was restarted with only 1 RS, it meant that in queueFailover() we couldn't kill a RS from that cluster. It was dying on IndexArrayOutOfBoundException but never killed the test. Fixed another bug in TestReplication, I saw queueFailover() failing even tho replication was still happening. Strangely the "i" seemed to have 2 different lives. I extracted that variable and verified that it's working. The test does have a junit-level timeout so while(true) is safe here.
          Hide
          stack added a comment -

          On commit fix this comment:

          "+ * Get a reader for the WAL. If you are reading from a file that's being written to
          + * and need to reopen it multiple times, use

          {@link HLog.Reader#reset()}

          instead of this method
          + * then just seek back to the last known good position."

          It has too much about the implementation...

          This comment on reset is good... maybe use some of it:

          + // Resetting the reader lets us see newly added data if the file is being written to
          + // We also keep the same compressionContext which was previously populated for this file

          Or the stuff in openReader is good.... too... makes sense

          +1 on commit

          Show
          stack added a comment - On commit fix this comment: "+ * Get a reader for the WAL. If you are reading from a file that's being written to + * and need to reopen it multiple times, use {@link HLog.Reader#reset()} instead of this method + * then just seek back to the last known good position." It has too much about the implementation... This comment on reset is good... maybe use some of it: + // Resetting the reader lets us see newly added data if the file is being written to + // We also keep the same compressionContext which was previously populated for this file Or the stuff in openReader is good.... too... makes sense +1 on commit
          Hide
          Jean-Daniel Cryans added a comment -

          Patch I want to commit to 0.94, I refreshed it again and fixed one other thing regarding the ReplicationSource.currentSize attribute because it wasn't set properly and refreshed.

          The patch for trunk enables HLog compression by default and removes the compression-related tests. I'm currently running all the tests.

          Show
          Jean-Daniel Cryans added a comment - Patch I want to commit to 0.94, I refreshed it again and fixed one other thing regarding the ReplicationSource.currentSize attribute because it wasn't set properly and refreshed. The patch for trunk enables HLog compression by default and removes the compression-related tests. I'm currently running all the tests.
          Hide
          Ted Yu added a comment -

          TestWALReplayCompressed passes with the rest of trunk patch:

          Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
          2012-12-14 14:56:28.308 java[85149:1903] Unable to load realm mapping info from SCDynamicStore
          Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.388 sec
          

          Shall we keep this test ?

          Show
          Ted Yu added a comment - TestWALReplayCompressed passes with the rest of trunk patch: Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed 2012-12-14 14:56:28.308 java[85149:1903] Unable to load realm mapping info from SCDynamicStore Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.388 sec Shall we keep this test ?
          Hide
          Jean-Daniel Cryans added a comment -

          Shall we keep this test ?

          TestWALReplayCompressed is TestWALReplay with compression enabled which in trunk will now be default, so it would be redundant to keep it.

          Show
          Jean-Daniel Cryans added a comment - Shall we keep this test ? TestWALReplayCompressed is TestWALReplay with compression enabled which in trunk will now be default, so it would be redundant to keep it.
          Hide
          Ted Yu added a comment -

          Thanks for the reminder, J-D.
          My question becomes: shall we introduce TestWALReplayUncompressed ?
          Running the patch on Linux I got:

          testSimplePutDelete(org.apache.hadoop.hbase.replication.TestMasterReplication)  Time elapsed: 0.12 sec  <<< FAILURE!
          java.lang.AssertionError: Waited too much time for put replication
            at org.junit.Assert.fail(Assert.java:93)
            at org.apache.hadoop.hbase.replication.TestMasterReplication.putAndWait(TestMasterReplication.java:276)
            at org.apache.hadoop.hbase.replication.TestMasterReplication.testSimplePutDelete(TestMasterReplication.java:213)
          queueFailover(org.apache.hadoop.hbase.replication.TestReplication)  Time elapsed: 0.119 sec  <<< FAILURE!
          java.lang.AssertionError: Waited too much time for queueFailover replication. Waited 17533ms.
            at org.junit.Assert.fail(Assert.java:93)
            at org.apache.hadoop.hbase.replication.TestReplication.queueFailover(TestReplication.java:765)
          

          For ReplicationHLogReaderManager.java:

          +public class ReplicationHLogReaderManager {
          

          Please add annotation for audience and stability.
          For readNextAndSetPosition():

          +   * Get the next entry, returned and also added in the array
          

          Please phase the above line so that it is easier to understand.

          Show
          Ted Yu added a comment - Thanks for the reminder, J-D. My question becomes: shall we introduce TestWALReplayUncompressed ? Running the patch on Linux I got: testSimplePutDelete(org.apache.hadoop.hbase.replication.TestMasterReplication) Time elapsed: 0.12 sec <<< FAILURE! java.lang.AssertionError: Waited too much time for put replication at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.replication.TestMasterReplication.putAndWait(TestMasterReplication.java:276) at org.apache.hadoop.hbase.replication.TestMasterReplication.testSimplePutDelete(TestMasterReplication.java:213) queueFailover(org.apache.hadoop.hbase.replication.TestReplication) Time elapsed: 0.119 sec <<< FAILURE! java.lang.AssertionError: Waited too much time for queueFailover replication. Waited 17533ms. at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.replication.TestReplication.queueFailover(TestReplication.java:765) For ReplicationHLogReaderManager.java: + public class ReplicationHLogReaderManager { Please add annotation for audience and stability. For readNextAndSetPosition(): + * Get the next entry, returned and also added in the array Please phase the above line so that it is easier to understand.
          Hide
          Ted Yu added a comment -
          +  public HLog.Entry readNextAndSetPosition(HLog.Entry[] entriesArray, int currentNbEntries) throws IOException {
          ...
          +    HLog.Entry entry = this.repLogReader.readNextAndSetPosition(this.entriesArray, this.currentNbEntries);
          

          nit: line too long.

          Show
          Ted Yu added a comment - + public HLog.Entry readNextAndSetPosition(HLog.Entry[] entriesArray, int currentNbEntries) throws IOException { ... + HLog.Entry entry = this .repLogReader.readNextAndSetPosition( this .entriesArray, this .currentNbEntries); nit: line too long.
          Hide
          Jean-Daniel Cryans added a comment -

          My question becomes: shall we introduce TestWALReplayUncompressed

          That makes sense.

          Running the patch on Linux I got:

          If you change SLEEP_TIME for 1500, does it still fail? If not, that's the IPV6 problem.

          Please add annotation for audience and stability.

          Thanks, forgot about that.

          Please phase the above line so that it is easier to understand.

          It works the same way as reader.next, is there anything in particular you think needs more explanation?

          Show
          Jean-Daniel Cryans added a comment - My question becomes: shall we introduce TestWALReplayUncompressed That makes sense. Running the patch on Linux I got: If you change SLEEP_TIME for 1500, does it still fail? If not, that's the IPV6 problem. Please add annotation for audience and stability. Thanks, forgot about that. Please phase the above line so that it is easier to understand. It works the same way as reader.next, is there anything in particular you think needs more explanation?
          Hide
          Ted Yu added a comment -

          is there anything in particular you think needs more explanation?

          No.

          If you change SLEEP_TIME for 1500, does it still fail? If not, that's the IPV6 problem.

          Running org.apache.hadoop.hbase.replication.TestMasterReplication
          Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 80.499 sec
          
          Show
          Ted Yu added a comment - is there anything in particular you think needs more explanation? No. If you change SLEEP_TIME for 1500, does it still fail? If not, that's the IPV6 problem. Running org.apache.hadoop.hbase.replication.TestMasterReplication Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 80.499 sec
          Hide
          Jean-Daniel Cryans added a comment -

          Turns out the patch fails in trunk with TestHLogSplit on 4 tests. Probably code that is getting messy with the HLog internals and is not expecting compression. Investigating.

          Show
          Jean-Daniel Cryans added a comment - Turns out the patch fails in trunk with TestHLogSplit on 4 tests. Probably code that is getting messy with the HLog internals and is not expecting compression. Investigating.
          Hide
          Jean-Daniel Cryans added a comment -

          Figured it.

          First there's Compressor.uncompressIntoArray that doesn't protect itself against bad dict indexes. It comes out as an IndexOutOfBoundsException and kills log splitting.

          Then there's FaultySequenceFileLogReader that doesn't speak compression and basically just needs to pass the compressionContext down to the HLog.Entry else it fails on a NegativeArraySizeException.

          The test passes now with those fixes. Will post new patches later.

          Show
          Jean-Daniel Cryans added a comment - Figured it. First there's Compressor.uncompressIntoArray that doesn't protect itself against bad dict indexes. It comes out as an IndexOutOfBoundsException and kills log splitting. Then there's FaultySequenceFileLogReader that doesn't speak compression and basically just needs to pass the compressionContext down to the HLog.Entry else it fails on a NegativeArraySizeException. The test passes now with those fixes. Will post new patches later.
          Hide
          stack added a comment -

          Good finds Jean-Daniel Cryans. Sounds like WAL compression could do w/ some more testing especially if it becomes default in 0.96.

          Show
          stack added a comment - Good finds Jean-Daniel Cryans . Sounds like WAL compression could do w/ some more testing especially if it becomes default in 0.96.
          Hide
          Jean-Daniel Cryans added a comment -

          New patches for 0.94 and trunk that pass TestHLogSplit and I also added TestHLogSplitCompressed for 0.94. I also incorporated Ted's comments.

          Sounds like WAL compression could do w/ some more testing especially if it becomes default in 0.96.

          Agreed although the same can be said of many other features in trunk. May I suggest we commit this then open new jiras if other issues are found during further testing?

          Show
          Jean-Daniel Cryans added a comment - New patches for 0.94 and trunk that pass TestHLogSplit and I also added TestHLogSplitCompressed for 0.94. I also incorporated Ted's comments. Sounds like WAL compression could do w/ some more testing especially if it becomes default in 0.96. Agreed although the same can be said of many other features in trunk. May I suggest we commit this then open new jiras if other issues are found during further testing?
          Show
          ramkrishna.s.vasudevan added a comment - @JD http://mail-archives.apache.org/mod_mbox/hbase-dev/201205.mbox/%3C00bc01cd31e6$7caf1320$760d3960$%25vasudevan@huawei.com%3E . Do we still get the OOME with WAL compression?
          Hide
          stack added a comment -

          I'm ok w/ committing it but I think it should be off in 0.96. It looks too flakey to be on by default (thanks for the OOME reminder Ram).

          Show
          stack added a comment - I'm ok w/ committing it but I think it should be off in 0.96. It looks too flakey to be on by default (thanks for the OOME reminder Ram).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12561641/HBASE-5778-0.94-v7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 16 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3611//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561641/HBASE-5778-0.94-v7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 16 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3611//console This message is automatically generated.
          Hide
          Jean-Daniel Cryans added a comment -

          ramkrishna.s.vasudevan I wasn't aware of it so I guess it's still an issue. Will open a Jira.

          stack Ok thanks, also I'll change the title.

          Show
          Jean-Daniel Cryans added a comment - ramkrishna.s.vasudevan I wasn't aware of it so I guess it's still an issue. Will open a Jira. stack Ok thanks, also I'll change the title.
          Hide
          Jean-Daniel Cryans added a comment -

          Changing the title to reflect what's being committed. Trunk won't have compression on and it will keep the compression tests (plus new ones). I opened HBASE-7391 for the OOME.

          Show
          Jean-Daniel Cryans added a comment - Changing the title to reflect what's being committed. Trunk won't have compression on and it will keep the compression tests (plus new ones). I opened HBASE-7391 for the OOME.
          Hide
          Jean-Daniel Cryans added a comment -

          Committed to branch and trunk, will continue working on making HLog compression better in other jiras. Thanks everyone for the comments, testing, and reviewing.

          Show
          Jean-Daniel Cryans added a comment - Committed to branch and trunk, will continue working on making HLog compression better in other jiras. Thanks everyone for the comments, testing, and reviewing.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #646 (See https://builds.apache.org/job/HBase-0.94/646/)
          HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424172)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #646 (See https://builds.apache.org/job/HBase-0.94/646/ ) HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424172) Result = FAILURE jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Hide
          Lars Hofhansl added a comment -

          TestReplicationWithCompression is timing out it seems (also happens locally).

          Show
          Lars Hofhansl added a comment - TestReplicationWithCompression is timing out it seems (also happens locally).
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3640 (See https://builds.apache.org/job/HBase-TRUNK/3640/)
          HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424174)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3640 (See https://builds.apache.org/job/HBase-TRUNK/3640/ ) HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424174) Result = FAILURE jdcryans : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Hide
          Jean-Daniel Cryans added a comment -

          Lars Hofhansl It doesn't for me, also it passed on https://builds.apache.org/job/HBase-TRUNK/3640/

          It's like the ipv4 fix didn't work for you and Ted? Like if you set SLEEP_TIME to 1500 does it always pass?

          Show
          Jean-Daniel Cryans added a comment - Lars Hofhansl It doesn't for me, also it passed on https://builds.apache.org/job/HBase-TRUNK/3640/ It's like the ipv4 fix didn't work for you and Ted? Like if you set SLEEP_TIME to 1500 does it always pass?
          Hide
          Lars Hofhansl added a comment -

          What SLEEP_TIME is this?

          Also failed on 0.94 jenkins, let's wait for the next 0.94 build.
          TestReplication also fails locally. I'm trying on another machine now.

          Show
          Lars Hofhansl added a comment - What SLEEP_TIME is this? Also failed on 0.94 jenkins, let's wait for the next 0.94 build. TestReplication also fails locally. I'm trying on another machine now.
          Hide
          Andrew Purtell added a comment - - edited

          I've found I need to add -Djava.net.preferIPv4Stack=true on the Maven command line if running tests individually with 'mvn test -Dtest=foo'. Maybe that's it?

          Furthermore, with recent Hadoop 1.0.x or 1.1.x no test that requires the HDFS minicluster will start up without this on my Ubuntu 12 laptop. It looks to me like a rat hole of increasing depth.

          Edit: Wrong species of problem

          Show
          Andrew Purtell added a comment - - edited I've found I need to add -Djava.net.preferIPv4Stack=true on the Maven command line if running tests individually with 'mvn test -Dtest=foo'. Maybe that's it? Furthermore, with recent Hadoop 1.0.x or 1.1.x no test that requires the HDFS minicluster will start up without this on my Ubuntu 12 laptop. It looks to me like a rat hole of increasing depth. Edit: Wrong species of problem
          Hide
          Lars Hofhansl added a comment -

          I tried on two machines with two different JVMs with or without this setting passed on the command line, in all cases both TestReplicationWithCompression and TestReplication time out.

          Something's up

          Show
          Lars Hofhansl added a comment - I tried on two machines with two different JVMs with or without this setting passed on the command line, in all cases both TestReplicationWithCompression and TestReplication time out. Something's up
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #304 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/304/)
          HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424174)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #304 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/304/ ) HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424174) Result = FAILURE jdcryans : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Hide
          Jean-Daniel Cryans added a comment -

          Bumped TestReplication's SLEEP_TIME to 1.5s to see if this is really just the ipv4 issue.

          Show
          Jean-Daniel Cryans added a comment - Bumped TestReplication's SLEEP_TIME to 1.5s to see if this is really just the ipv4 issue.
          Hide
          Ted Yu added a comment -

          From https://builds.apache.org/job/HBase-TRUNK/3641/, TestReplicationWithCompression tests failed.

          Show
          Ted Yu added a comment - From https://builds.apache.org/job/HBase-TRUNK/3641/ , TestReplicationWithCompression tests failed.
          Hide
          Jean-Daniel Cryans added a comment -

          I have a long flight today, I'll try to repro, but it passes all the time for me.

          Show
          Jean-Daniel Cryans added a comment - I have a long flight today, I'll try to repro, but it passes all the time for me.
          Hide
          Lars Hofhansl added a comment -

          If I revert this change TestReplication always passes locally in 0.94, whereas with this patch is never passed (so far - and it did pass only once in many runs with the increased SLEEP_TIME).

          I would be more comfortable if the patch was reverted from 0.94. I know this is frustrating, but I would like to spin 0.94.4 soon (hopefully by tomorrow). We can put this back into 0.94.5.

          Show
          Lars Hofhansl added a comment - If I revert this change TestReplication always passes locally in 0.94, whereas with this patch is never passed (so far - and it did pass only once in many runs with the increased SLEEP_TIME). I would be more comfortable if the patch was reverted from 0.94. I know this is frustrating, but I would like to spin 0.94.4 soon (hopefully by tomorrow). We can put this back into 0.94.5.
          Hide
          Andrew Purtell added a comment -

          TestReplication was flapping on a private Jenkins at a previous employer over a span of 6 months. We triaged the problem by increasing SLEEP_TIME and by increasing the number of retries. The result still was not 100% effective.

          TestReplication sets up two miniclusters and runs replication between them. Whenever we change replication itself, the master, HTable/HConnection, etc., etc., the timing of various actions changes underneath it through complex interactions. Maybe we should move this out of LargeTests into an integration test instead?

          Show
          Andrew Purtell added a comment - TestReplication was flapping on a private Jenkins at a previous employer over a span of 6 months. We triaged the problem by increasing SLEEP_TIME and by increasing the number of retries. The result still was not 100% effective. TestReplication sets up two miniclusters and runs replication between them. Whenever we change replication itself, the master, HTable/HConnection, etc., etc., the timing of various actions changes underneath it through complex interactions. Maybe we should move this out of LargeTests into an integration test instead?
          Hide
          Ted Yu added a comment -

          I ran TestReplication locally and it failed on second run:

          testVerifyRepJob(org.apache.hadoop.hbase.replication.TestReplication)  Time elapsed: 16.781 sec  <<< FAILURE!
          java.lang.AssertionError: Waited too much time for truncate
            at org.junit.Assert.fail(Assert.java:93)
            at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:180)
          ...
          queueFailover(org.apache.hadoop.hbase.replication.TestReplication)  Time elapsed: 14.587 sec  <<< FAILURE!
          java.lang.AssertionError: Waited too much time for truncate
            at org.junit.Assert.fail(Assert.java:93)
            at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:180)
          

          If I remember correctly, testVerifyRepJob used to pass.

          Show
          Ted Yu added a comment - I ran TestReplication locally and it failed on second run: testVerifyRepJob(org.apache.hadoop.hbase.replication.TestReplication) Time elapsed: 16.781 sec <<< FAILURE! java.lang.AssertionError: Waited too much time for truncate at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:180) ... queueFailover(org.apache.hadoop.hbase.replication.TestReplication) Time elapsed: 14.587 sec <<< FAILURE! java.lang.AssertionError: Waited too much time for truncate at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:180) If I remember correctly, testVerifyRepJob used to pass.
          Hide
          Lars Hofhansl added a comment -

          Ok. Now it's failing locally again, even with patch reverted. Sigh.
          Let's not revert then. I triggered another 0.94 jenkins build.

          Show
          Lars Hofhansl added a comment - Ok. Now it's failing locally again, even with patch reverted. Sigh. Let's not revert then. I triggered another 0.94 jenkins build.
          Hide
          Lars Hofhansl added a comment -

          I also do not see anything obvious in the patch that would cause new problems in the tests.

          Show
          Lars Hofhansl added a comment - I also do not see anything obvious in the patch that would cause new problems in the tests.
          Hide
          Lars Hofhansl added a comment -

          Fact is, since this patch we did not have single jenkins run where these tests did not fail.

          So here's what I am going to do. I'll revert this from 0.94, to see whether the tests pass.
          If they don't we're none the wiser. If they do, we can regroup for 0.94.5.

          Unless I hear objections I'll do that within the next hour or so.

          Show
          Lars Hofhansl added a comment - Fact is, since this patch we did not have single jenkins run where these tests did not fail. So here's what I am going to do. I'll revert this from 0.94, to see whether the tests pass. If they don't we're none the wiser. If they do, we can regroup for 0.94.5. Unless I hear objections I'll do that within the next hour or so.
          Hide
          stack added a comment -

          +1 on trying anything to get a green test.

          +1 on test replication going over to integration tests. Even when it does fail, only J-D can make sense of it (smile). It has found issues in the past though....

          Show
          stack added a comment - +1 on trying anything to get a green test. +1 on test replication going over to integration tests. Even when it does fail, only J-D can make sense of it (smile). It has found issues in the past though....
          Hide
          Lars Hofhansl added a comment - - edited

          LOL... I reverted but now @%#^#$ Jenkins is down. I can't win.

          Show
          Lars Hofhansl added a comment - - edited LOL... I reverted but now @%#^#$ Jenkins is down. I can't win.
          Hide
          Ted Yu added a comment -

          I think Jenkins is on vacation again, Lars

          Show
          Ted Yu added a comment - I think Jenkins is on vacation again, Lars
          Hide
          Andrew Purtell added a comment -

          Given the above logic, I think the revert is the right call.

          Show
          Andrew Purtell added a comment - Given the above logic, I think the revert is the right call.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #650 (See https://builds.apache.org/job/HBase-0.94/650/)
          HBASE-5778 Revert, to check on test failures potentially caused by this. (Revision 1424702)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #650 (See https://builds.apache.org/job/HBase-0.94/650/ ) HBASE-5778 Revert, to check on test failures potentially caused by this. (Revision 1424702) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Hide
          Lars Hofhansl added a comment -

          Latest run still failed in TestReplication.queueFailover and is despite the increase SLEEP_TIME that I did not revert. Hmm...

          Show
          Lars Hofhansl added a comment - Latest run still failed in TestReplication.queueFailover and is despite the increase SLEEP_TIME that I did not revert. Hmm...
          Hide
          Lars Hofhansl added a comment -

          The other changes that coincided with the failures are: HBASE-7331 and HBASE-7374.
          I'll try reverting those locally.

          Show
          Lars Hofhansl added a comment - The other changes that coincided with the failures are: HBASE-7331 and HBASE-7374 . I'll try reverting those locally.
          Hide
          Andrew Purtell added a comment - - edited

          Lars Hofhansl I'm -1 on reverting those changes for a flapper. Of course if there's a real issue exposed here, that's a different story.

          Show
          Andrew Purtell added a comment - - edited Lars Hofhansl I'm -1 on reverting those changes for a flapper. Of course if there's a real issue exposed here, that's a different story.
          Hide
          Lars Hofhansl added a comment -

          Didn't mean to revert them from SVN, just trying it locally.

          TestReplication.queueFailover has now failed in 6 consecutive runs (since 645) and has not failed a single time in the 20 runs prior.

          Something's up and we should not just write it off at this being a flapper.

          I think this patch should be reapplied, though. It did not cause this problem I think.

          Show
          Lars Hofhansl added a comment - Didn't mean to revert them from SVN, just trying it locally. TestReplication.queueFailover has now failed in 6 consecutive runs (since 645) and has not failed a single time in the 20 runs prior. Something's up and we should not just write it off at this being a flapper. I think this patch should be reapplied, though. It did not cause this problem I think.
          Hide
          Andrew Purtell added a comment -

          I went one commit further back and TestReplication#queueFailover just hung for me:

          $ git checkout 0ee8b7b  # HBASE-7357
          $ mvn -PlocalTests clean test -Dtest=TestReplication -Djava.net.preferIPv4Stack=true
          [...]
          Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 448.042 sec <<< FAILURE!
          
          Results :
          
          Tests in error: 
            queueFailover(org.apache.hadoop.hbase.replication.TestReplication): test timed out after 300000 milliseconds
          

          jstack shows I think an HTable waiting for a region to open which doesn't.

          Show
          Andrew Purtell added a comment - I went one commit further back and TestReplication#queueFailover just hung for me: $ git checkout 0ee8b7b # HBASE-7357 $ mvn -PlocalTests clean test -Dtest=TestReplication -Djava.net.preferIPv4Stack=true [...] Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 448.042 sec <<< FAILURE! Results : Tests in error: queueFailover(org.apache.hadoop.hbase.replication.TestReplication): test timed out after 300000 milliseconds jstack shows I think an HTable waiting for a region to open which doesn't.
          Hide
          Andrew Purtell added a comment -

          Something's up and we should not just write it off at this being a flapper.

          Yes, I agree, that's why I clarified my comment with an edit. Sorry for any confusion.

          Show
          Andrew Purtell added a comment - Something's up and we should not just write it off at this being a flapper. Yes, I agree, that's why I clarified my comment with an edit. Sorry for any confusion.
          Hide
          Andrew Purtell added a comment -

          Let me try a bisect.

          Show
          Andrew Purtell added a comment - Let me try a bisect.
          Hide
          Lars Hofhansl added a comment -

          Thanks Andrew Purtell. Might even be an env issue on jenkins. I find that successful run on my machine takes 245s or so, I can see it taken longer on a virtual env. Maybe up the timeout from 300s to 360s?

          I assume there're no objections reapplying this patch.

          Show
          Lars Hofhansl added a comment - Thanks Andrew Purtell . Might even be an env issue on jenkins. I find that successful run on my machine takes 245s or so, I can see it taken longer on a virtual env. Maybe up the timeout from 300s to 360s? I assume there're no objections reapplying this patch.
          Hide
          Andrew Purtell added a comment -

          It may take a while. I got a fast failure on head of branch-0.94 but then a clean run on the HBASE-7331 after a failure on the HBASE-7357 one. Looks like I'll want up to 100 iterations for every checkout...

          Show
          Andrew Purtell added a comment - It may take a while. I got a fast failure on head of branch-0.94 but then a clean run on the HBASE-7331 after a failure on the HBASE-7357 one. Looks like I'll want up to 100 iterations for every checkout...
          Hide
          Lars Hofhansl added a comment -

          Wow, that'll take a bit.

          Show
          Lars Hofhansl added a comment - Wow, that'll take a bit.
          Hide
          Andrew Purtell added a comment -

          Yeah I will have to settle for just a few repetitions until there's a candidate.

          Show
          Andrew Purtell added a comment - Yeah I will have to settle for just a few repetitions until there's a candidate.
          Hide
          Lars Hofhansl added a comment -

          queueFailover does not appear to fail (for me, locally) when run its own

          Show
          Lars Hofhansl added a comment - queueFailover does not appear to fail (for me, locally) when run its own
          Hide
          Lars Hofhansl added a comment -

          I reapplied the patch.

          Show
          Lars Hofhansl added a comment - I reapplied the patch.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #652 (See https://builds.apache.org/job/HBase-0.94/652/)
          HBASE-5778 Reapply, Test failures not caused by this. Sorry for the noise. (Revision 1424810)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #652 (See https://builds.apache.org/job/HBase-0.94/652/ ) HBASE-5778 Reapply, Test failures not caused by this. Sorry for the noise. (Revision 1424810) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #87 (See https://builds.apache.org/job/HBase-0.94-security/87/)
          HBASE-5778 Reapply, Test failures not caused by this. Sorry for the noise. (Revision 1424810)
          HBASE-5778 Revert, to check on test failures potentially caused by this. (Revision 1424702)
          HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424172)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java

          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java

          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #87 (See https://builds.apache.org/job/HBase-0.94-security/87/ ) HBASE-5778 Reapply, Test failures not caused by this. Sorry for the noise. (Revision 1424810) HBASE-5778 Revert, to check on test failures potentially caused by this. (Revision 1424702) HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424172) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #10 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/10/)
          HBASE-5778 Reapply, Test failures not caused by this. Sorry for the noise. (Revision 1424810)
          HBASE-5778 Revert, to check on test failures potentially caused by this. (Revision 1424702)
          HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424172)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java

          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java

          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #10 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/10/ ) HBASE-5778 Reapply, Test failures not caused by this. Sorry for the noise. (Revision 1424810) HBASE-5778 Revert, to check on test failures potentially caused by this. (Revision 1424702) HBASE-5778 Fix HLog compression's incompatibilities (Revision 1424172) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationHLogReaderManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/FaultySequenceFileLogReader.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplitCompressed.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithCompression.java
          Hide
          Liu Shaohui added a comment -

          HLog compression is also incompatible in replication from a cluster with hLog compression to a peer cluster without HBASE-5778.

          How to solve this problem?

          Show
          Liu Shaohui added a comment - HLog compression is also incompatible in replication from a cluster with hLog compression to a peer cluster without HBASE-5778 . How to solve this problem?
          Hide
          Jean-Daniel Cryans added a comment -

          Upgrade?

          Show
          Jean-Daniel Cryans added a comment - Upgrade?
          Hide
          Lars Hofhansl added a comment -

          Heh

          Why would that be the case, though. We're reading the HLogs on the source and replicate to the slave via RPC, whether or not the logs were compressed at the source should make no difference, no?

          Show
          Lars Hofhansl added a comment - Heh Why would that be the case, though. We're reading the HLogs on the source and replicate to the slave via RPC, whether or not the logs were compressed at the source should make no difference, no?

            People

            • Assignee:
              Jean-Daniel Cryans
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development