Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Adds a custom dictionary-based compression on WAL. Off by default. To enable, set hbase.regionserver.wal.enablecompression to true in hbase-site.xml.
      Note that replication is currently broken when WAL compression is enabled.
      Show
      Adds a custom dictionary-based compression on WAL. Off by default. To enable, set hbase.regionserver.wal.enablecompression to true in hbase-site.xml. Note that replication is currently broken when WAL compression is enabled.

      Description

      The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

      1. hbase-4608-v28-delta.txt
        25 kB
        Todd Lipcon
      2. hbase-4608-v28.txt
        56 kB
        Todd Lipcon
      3. hbase-4608-v28.txt
        56 kB
        stack
      4. 4608v8fixed.txt
        37 kB
        Li Pi
      5. 4608v7.txt
        32 kB
        Li Pi
      6. 4608v6.txt
        32 kB
        Li Pi
      7. 4608v5.txt
        33 kB
        Li Pi
      8. 4608v30.txt
        57 kB
        stack
      9. 4608v29.txt
        56 kB
        stack
      10. 4608v27.txt
        52 kB
        stack
      11. 4608v25.txt
        52 kB
        stack
      12. 4608v24.txt
        52 kB
        stack
      13. 4608v23.txt
        51 kB
        stack
      14. 4608-v22.txt
        42 kB
        Ted Yu
      15. 4608-v20.txt
        42 kB
        Ted Yu
      16. 4608-v19.txt
        41 kB
        Ted Yu
      17. 4608v18.txt
        39 kB
        Ted Yu
      18. 4608v17.txt
        39 kB
        Ted Yu
      19. 4608v16.txt
        39 kB
        Ted Yu
      20. 4608v15.txt
        39 kB
        Ted Yu
      21. 4608v14.txt
        40 kB
        Li Pi
      22. 4608v13.txt
        39 kB
        Li Pi
      23. 4608v13.txt
        39 kB
        Li Pi
      24. 4608v1.txt
        11 kB
        Li Pi

        Issue Links

          Activity

          Hide
          Ted Yu added a comment -

          We need to figure out how compressed HLog.Entry is delivered to replication sink.

          Show
          Ted Yu added a comment - We need to figure out how compressed HLog.Entry is delivered to replication sink.
          Hide
          Lars Francke added a comment -

          This seems to be missing documentation, no?

          Shouldn't the hbase.regionserver.wal.enablecompression key at least be in hbase-default.xml?

          Show
          Lars Francke added a comment - This seems to be missing documentation, no? Shouldn't the hbase.regionserver.wal.enablecompression key at least be in hbase-default.xml?
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #139 (See https://builds.apache.org/job/HBase-TRUNK-security/139/)
          HBASE-4608 HLog Compression (Revision 1301165)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #139 (See https://builds.apache.org/job/HBase-TRUNK-security/139/ ) HBASE-4608 HLog Compression (Revision 1301165) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2683 (See https://builds.apache.org/job/HBase-TRUNK/2683/)
          HBASE-4608 HLog Compression (Revision 1301165)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2683 (See https://builds.apache.org/job/HBase-TRUNK/2683/ ) HBASE-4608 HLog Compression (Revision 1301165) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          Hide
          Li Pi added a comment -

          Woohoo! It's in!

          Show
          Li Pi added a comment - Woohoo! It's in!
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #32 (See https://builds.apache.org/job/HBase-0.94/32/)
          HBASE-4608 HLog Compression (Revision 1301167)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #32 (See https://builds.apache.org/job/HBase-0.94/32/ ) HBASE-4608 HLog Compression (Revision 1301167) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Bytes.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          Hide
          Lars Hofhansl added a comment -

          Yeah! And, yes, time for an RC.

          Show
          Lars Hofhansl added a comment - Yeah! And, yes, time for an RC.
          Hide
          stack added a comment -

          Now this is in, does that mean we can cut a 0.94RC0?

          Show
          stack added a comment - Now this is in, does that mean we can cut a 0.94RC0?
          Hide
          stack added a comment -

          Committed 0.94 branch and trunk. Thanks for the patch Li and to all others who hacked on this patch and reviewed it.

          Show
          stack added a comment - Committed 0.94 branch and trunk. Thanks for the patch Li and to all others who hacked on this patch and reviewed it.
          Hide
          stack added a comment -

          I'll commit v30 then.

          Thanks all for reviews, etc.

          Show
          stack added a comment - I'll commit v30 then. Thanks all for reviews, etc.
          Hide
          Ted Yu added a comment -

          w.r.t. Lars' comment: https://issues.apache.org/jira/browse/HBASE-4608?focusedCommentId=13229010&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229010

          I think it makes sense.
          How about introducing an enum CompressionType with values of NONE and DICTIONARY ?
          HConstants.ENABLE_WAL_COMPRESSION would be replaced by another String: "hbase.regionserver.wal.compressiontype"
          If "hbase.regionserver.wal.compressiontype" doesn't appear in conf, CompressionType.NONE is assumed.

          Show
          Ted Yu added a comment - w.r.t. Lars' comment: https://issues.apache.org/jira/browse/HBASE-4608?focusedCommentId=13229010&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229010 I think it makes sense. How about introducing an enum CompressionType with values of NONE and DICTIONARY ? HConstants.ENABLE_WAL_COMPRESSION would be replaced by another String: "hbase.regionserver.wal.compressiontype" If "hbase.regionserver.wal.compressiontype" doesn't appear in conf, CompressionType.NONE is assumed.
          Hide
          Li Pi added a comment -

          On other compression things. I looked into those.

          plugging into LZMA was the first thing I thought about doing - performance stops this one though.

          There are other optimization we can make, such as modifying the dictionary to take into account frequency, and assigning the highest probability entries to the lowest numbers, then using vints rather than 2 bytes for everything. Note that we shouldn't be able to beat LZMA, because we neither compress values, nor do we compress the SequenceFile overhead. On some workloads, those overheads might be substantial - although I haven't checked.

          This is actually pretty close to the challenge displayed by caching, in that we want to keep the most likely to be repeated entries in our dictionary, and evict the rest. I used LRU because LRU was simple, and like caching, pretty much anything results in a substantial performance increase over nothing.

          I'm pretty happy with cutting the WAL size in half on optimal workloads, though as always, it's nice to work towards future performance goals. I have other ideas, but they involve changing the HLog substantially in order to be more compact. In that case, we might end up abandoning the Hadoop Sequencefile format altogether, and this thing becomes a bit more complex.

          Show
          Li Pi added a comment - On other compression things. I looked into those. plugging into LZMA was the first thing I thought about doing - performance stops this one though. There are other optimization we can make, such as modifying the dictionary to take into account frequency, and assigning the highest probability entries to the lowest numbers, then using vints rather than 2 bytes for everything. Note that we shouldn't be able to beat LZMA, because we neither compress values, nor do we compress the SequenceFile overhead. On some workloads, those overheads might be substantial - although I haven't checked. This is actually pretty close to the challenge displayed by caching, in that we want to keep the most likely to be repeated entries in our dictionary, and evict the rest. I used LRU because LRU was simple, and like caching, pretty much anything results in a substantial performance increase over nothing. I'm pretty happy with cutting the WAL size in half on optimal workloads, though as always, it's nice to work towards future performance goals. I have other ideas, but they involve changing the HLog substantially in order to be more compact. In that case, we might end up abandoning the Hadoop Sequencefile format altogether, and this thing becomes a bit more complex.
          Hide
          Li Pi added a comment -

          If a dictionary file gets cut up, you'll be able to read all the way to the end.

          Show
          Li Pi added a comment - If a dictionary file gets cut up, you'll be able to read all the way to the end.
          Hide
          Li Pi added a comment -

          If a dictionary file gets cut up, you'll be able to read all the way to the end.

          Show
          Li Pi added a comment - If a dictionary file gets cut up, you'll be able to read all the way to the end.
          Hide
          Lars Hofhansl added a comment -

          I'm not wondering if this patch is worth adding? If compressible stuff is only shrinking by half, is that big enough win? What do you lot thing? LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size.

          You mean you are now wondering? IMHO: The WAL is probably the greatest source of synchronous IO that we generate, cutting this in half seems quite valuable (maybe this will be less valuable in the future if/when HDFS can do parallel replication instead of chaining - but it is now).
          I agree that none of the block based compression schemes would be good options... Was merely curious about HLog archiving, which is quite unrelated to this issue.

          +1, let's commit this.

          Show
          Lars Hofhansl added a comment - I'm not wondering if this patch is worth adding? If compressible stuff is only shrinking by half, is that big enough win? What do you lot thing? LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size. You mean you are now wondering? IMHO: The WAL is probably the greatest source of synchronous IO that we generate, cutting this in half seems quite valuable (maybe this will be less valuable in the future if/when HDFS can do parallel replication instead of chaining - but it is now). I agree that none of the block based compression schemes would be good options... Was merely curious about HLog archiving, which is quite unrelated to this issue. +1, let's commit this.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518416/4608v30.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestMetaScanner

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518416/4608v30.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMetaScanner Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1191//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          w.r.t. adding javadoc for offset and length of writeCompressed(), I searched our code base for '@param offset ' and found 48 occurrences.

          I like this snippet from HFileReaderV2.java:

               * @param key key byte array
               * @param offset key offset in the key byte array
               * @param length key length
          

          Even an empty javadoc is better than missing parameter:

             * @param offset
          
          Show
          Ted Yu added a comment - w.r.t. adding javadoc for offset and length of writeCompressed(), I searched our code base for '@param offset ' and found 48 occurrences. I like this snippet from HFileReaderV2.java: * @param key key byte array * @param offset key offset in the key byte array * @param length key length Even an empty javadoc is better than missing parameter: * @param offset
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java, line 28

          > <https://reviews.apache.org/r/4328/diff/3/?file=92429#file92429line28>

          >

          > Should we label this class @InterfaceAudience.Private ?

          Unless a class is public, it doesn't need an interface audience annotation

          • Todd

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5972
          -----------------------------------------------------------

          On 2012-03-14 22:26:34, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 22:26:34)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java, line 28 > < https://reviews.apache.org/r/4328/diff/3/?file=92429#file92429line28 > > > Should we label this class @InterfaceAudience.Private ? Unless a class is public, it doesn't need an interface audience annotation Todd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5972 ----------------------------------------------------------- On 2012-03-14 22:26:34, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 22:26:34) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          stack added a comment -

          @Ted Would suggest that in future you not piecemeal in your reviews. Bulk them up. When review comes in in dribs and drabs, the whole process takes way longer.

          @Lars "What portion of the WAL storage do the current WALs represent?"

          Do you mean, how much of our footprint is comprised of WAL logs? Not sure. I thought intent of this issue was to speed syncs because there'd be less bytes to shuttle across the datanode replicas pipeline.

          I'm not wondering if this patch is worth adding? If compressible stuff is only shrinking by half, is that big enough win? What do you lot thing? LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size.

          Let me try adding lzo numbers but we wouldn't want to use lzo anyways because we could lose a bunch of edits off the end if the compression block was not closed off (Thats my understanding. I could be wrong).

          Li, what happens if we cut the end off a dictionary-compressed file. Will we be able to read up to the last byte or word or so?

          Good stuff.

          Show
          stack added a comment - @Ted Would suggest that in future you not piecemeal in your reviews. Bulk them up. When review comes in in dribs and drabs, the whole process takes way longer. @Lars "What portion of the WAL storage do the current WALs represent?" Do you mean, how much of our footprint is comprised of WAL logs? Not sure. I thought intent of this issue was to speed syncs because there'd be less bytes to shuttle across the datanode replicas pipeline. I'm not wondering if this patch is worth adding? If compressible stuff is only shrinking by half, is that big enough win? What do you lot thing? LZMA is not viable because it takes for ever compressing though its turning SU WALs into 11-14% original size. Let me try adding lzo numbers but we wouldn't want to use lzo anyways because we could lose a bunch of edits off the end if the compression block was not closed off (Thats my understanding. I could be wrong). Li, what happens if we cut the end off a dictionary-compressed file. Will we be able to read up to the last byte or word or so? Good stuff.
          Hide
          stack added a comment -

          Accomodate about 50% of Ted's last review (ignoring the trivial).

          Show
          stack added a comment - Accomodate about 50% of Ted's last review (ignoring the trivial).
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 164

          > <https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line164>

          >

          > Please add javadoc for offset and length.

          Are you joking? On a protected method with parameter names such as these that follow a byte array argument?

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 36

          > <https://reviews.apache.org/r/4328/diff/3/?file=92427#file92427line36>

          >

          > IllegalArgumentException is not needed here.

          > I removed it, compiled and ran TestCompressor - it passed.

          Removed.

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108

          > <https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line108>

          >

          > A closing ) should be placed either on this line or on line 109.

          done

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 143

          > <https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line143>

          >

          > Should read 'byte of index to the ...'

          done

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 55

          > <https://reviews.apache.org/r/4328/diff/3/?file=92431#file92431line55>

          >

          > I don't quite get what the second sentence is supposed to convey ?

          > It seems to be same as first sentence.

          >

          > This version is the minimum version that supports compression.

          Leaving as is. The second sentence is to emphasize that only the dictionary compression was introduced in version -2.

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 33

          > <https://reviews.apache.org/r/4328/diff/3/?file=92433#file92433line33>

          >

          > Can we remove 'silly' here ?

          > Some user may actually reach this size.

          Then they are being silly.

          On 2012-03-14 23:54:37, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 202

          > <https://reviews.apache.org/r/4328/diff/3/?file=92434#file92434line202>

          >

          > Setting reader to null would be desirable after the close() call.

          Done

          • Michael

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5972
          -----------------------------------------------------------

          On 2012-03-14 22:26:34, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 22:26:34)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 164 > < https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line164 > > > Please add javadoc for offset and length. Are you joking? On a protected method with parameter names such as these that follow a byte array argument? On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 36 > < https://reviews.apache.org/r/4328/diff/3/?file=92427#file92427line36 > > > IllegalArgumentException is not needed here. > I removed it, compiled and ran TestCompressor - it passed. Removed. On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108 > < https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line108 > > > A closing ) should be placed either on this line or on line 109. done On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 143 > < https://reviews.apache.org/r/4328/diff/3/?file=92428#file92428line143 > > > Should read 'byte of index to the ...' done On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 55 > < https://reviews.apache.org/r/4328/diff/3/?file=92431#file92431line55 > > > I don't quite get what the second sentence is supposed to convey ? > It seems to be same as first sentence. > > This version is the minimum version that supports compression. Leaving as is. The second sentence is to emphasize that only the dictionary compression was introduced in version -2. On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 33 > < https://reviews.apache.org/r/4328/diff/3/?file=92433#file92433line33 > > > Can we remove 'silly' here ? > Some user may actually reach this size. Then they are being silly. On 2012-03-14 23:54:37, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 202 > < https://reviews.apache.org/r/4328/diff/3/?file=92434#file92434line202 > > > Setting reader to null would be desirable after the close() call. Done Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5972 ----------------------------------------------------------- On 2012-03-14 22:26:34, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 22:26:34) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518388/4608v29.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestImportTsv

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518388/4608v29.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1189//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          I'm still +1

          The lzma number are interesting. Maybe a nice (future) solution would be to dictionary compress the HLog while writing, and then when the log is rolled compress it with lzma (since we know the file won't change any more we can compress it wholesale).
          This begs the next question: What portion of the WAL storage do the current WALs represent?

          Show
          Lars Hofhansl added a comment - I'm still +1 The lzma number are interesting. Maybe a nice (future) solution would be to dictionary compress the HLog while writing, and then when the log is rolled compress it with lzma (since we know the file won't change any more we can compress it wholesale). This begs the next question: What portion of the WAL storage do the current WALs represent?
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5972
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          <https://reviews.apache.org/r/4328/#comment12946>

          IllegalArgumentException is not needed here.
          I removed it, compiled and ran TestCompressor - it passed.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/4328/#comment12947>

          A closing ) should be placed either on this line or on line 109.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/4328/#comment12948>

          Should read 'byte of index to the ...'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/4328/#comment12949>

          Should read 'an array of bytes'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/4328/#comment12950>

          Please add javadoc for offset and length.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
          <https://reviews.apache.org/r/4328/#comment12958>

          Should we label this class @InterfaceAudience.Private ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/4328/#comment12951>

          I don't quite get what the second sentence is supposed to convey ?
          It seems to be same as first sentence.

          This version is the minimum version that supports compression.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/4328/#comment12952>

          A (slightly) long line.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/4328/#comment12954>

          Long line.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/4328/#comment12955>

          Can we remove 'silly' here ?
          Some user may actually reach this size.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/4328/#comment12956>

          'initiate' is used to start an action or message.
          'initialize' should be used here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/4328/#comment12957>

          Setting reader to null would be desirable after the close() call.

          • Ted

          On 2012-03-14 22:26:34, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 22:26:34)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5972 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java < https://reviews.apache.org/r/4328/#comment12946 > IllegalArgumentException is not needed here. I removed it, compiled and ran TestCompressor - it passed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/4328/#comment12947 > A closing ) should be placed either on this line or on line 109. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/4328/#comment12948 > Should read 'byte of index to the ...' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/4328/#comment12949 > Should read 'an array of bytes' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/4328/#comment12950 > Please add javadoc for offset and length. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java < https://reviews.apache.org/r/4328/#comment12958 > Should we label this class @InterfaceAudience.Private ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/4328/#comment12951 > I don't quite get what the second sentence is supposed to convey ? It seems to be same as first sentence. This version is the minimum version that supports compression. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/4328/#comment12952 > A (slightly) long line. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/4328/#comment12954 > Long line. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/4328/#comment12955 > Can we remove 'silly' here ? Some user may actually reach this size. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/4328/#comment12956 > 'initiate' is used to start an action or message. 'initialize' should be used here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/4328/#comment12957 > Setting reader to null would be desirable after the close() call. Ted On 2012-03-14 22:26:34, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 22:26:34) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          stack added a comment -

          Here's some WALs to compared compressed w/ patch v29 vs lzma and then the dictionary compressed file itself lzma'd (Todd request). LZMA'ing the dictionary compressed file makes it smaller than the lzma'd original. lzma'ing the compressed file makes it 1/4 size of dictionary compressed file (roughly). I didn't get a chance to lzo it....

          ....
          -rw-r--r--   1 stack  staff  64589199 Mar 13 20:24 sv4r21s12%3A60020.1331685637452
          -rwxrwxrwx   1 stack  staff  28906432 Mar 14 15:34 sv4r21s12%3A60020.1331685637452.compressed
          -rw-r--r--   1 stack  staff   7417213 Mar 14 16:25 sv4r21s12%3A60020.1331685637452.compressed.lzma
          -rw-r--r--   1 stack  staff   8511618 Mar 14 16:24 sv4r21s12%3A60020.1331685637452.lzma
          -rw-r--r--   1 stack  staff  63755620 Mar 13 20:24 sv4r21s12%3A60020.1331687005652
          -rwxrwxrwx   1 stack  staff  28804928 Mar 14 15:34 sv4r21s12%3A60020.1331687005652.compressed
          -rw-r--r--   1 stack  staff   6866107 Mar 14 16:28 sv4r21s12%3A60020.1331687005652.compressed.lzma
          -rw-r--r--   1 stack  staff   8328771 Mar 14 16:27 sv4r21s12%3A60020.1331687005652.lzma
          -rw-r--r--   1 stack  staff  63755688 Mar 13 20:24 sv4r21s12%3A60020.1331688224458
          -rwxrwxrwx   1 stack  staff  27701052 Mar 14 15:34 sv4r21s12%3A60020.1331688224458.compressed
          -rw-r--r--   1 stack  staff   6614637 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.compressed.lzma
          -rw-r--r--   1 stack  staff   8462991 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.lzma
          -rw-r--r--   1 stack  staff  64024836 Mar 13 20:24 sv4r21s12%3A60020.1331689518188
          -rwxrwxrwx   1 stack  staff  28851435 Mar 14 15:34 sv4r21s12%3A60020.1331689518188.compressed
          -rw-r--r--   1 stack  staff   6677112 Mar 14 16:35 sv4r21s12%3A60020.1331689518188.compressed.lzma
          -rw-r--r--   1 stack  staff   8158847 Mar 14 16:34 sv4r21s12%3A60020.1331689518188.lzma
          -rw-r--r--   1 stack  staff  63757131 Mar 13 20:24 sv4r21s12%3A60020.1331690608900
          -rwxrwxrwx   1 stack  staff  28201506 Mar 14 15:34 sv4r21s12%3A60020.1331690608900.compressed
          -rw-r--r--   1 stack  staff   6941982 Mar 14 16:38 sv4r21s12%3A60020.1331690608900.compressed.lzma
          -rw-r--r--   1 stack  staff   8513895 Mar 14 16:37 sv4r21s12%3A60020.1331690608900.lzma
          -rw-r--r--   1 stack  staff  63754114 Mar 13 20:24 sv4r21s12%3A60020.1331691711502
          -rwxrwxrwx   1 stack  staff  28318314 Mar 14 15:34 sv4r21s12%3A60020.1331691711502.compressed
          -rw-r--r--   1 stack  staff   7392701 Mar 14 16:42 sv4r21s12%3A60020.1331691711502.compressed.lzma
          -rw-r--r--   1 stack  staff   9136798 Mar 14 16:41 sv4r21s12%3A60020.1331691711502.lzma
          -rw-r--r--   1 stack  staff  63756667 Mar 13 20:24 sv4r21s12%3A60020.1331692886725
          -rwxrwxrwx   1 stack  staff  28309792 Mar 14 15:34 sv4r21s12%3A60020.1331692886725.compressed
          -rw-r--r--   1 stack  staff   7139965 Mar 14 16:44 sv4r21s12%3A60020.1331692886725.compressed.lzma
          -rw-r--r--   1 stack  staff   8968155 Mar 14 16:43 sv4r21s12%3A60020.1331692886725.lzma
          -rw-r--r--   1 stack  staff  63755003 Mar 13 20:24 sv4r21s12%3A60020.1331694049033
          -rwxrwxrwx   1 stack  staff  28127053 Mar 14 15:35 sv4r21s12%3A60020.1331694049033.compressed
          -rw-r--r--   1 stack  staff   6498486 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.compressed.lzma
          -rw-r--r--   1 stack  staff   8175618 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.lzma
          -rw-r--r--   1 stack  staff  23441144 Mar 13 20:24 sv4r21s12%3A60020.1331695045194
          -rwxrwxrwx   1 stack  staff  10561645 Mar 14 15:35 sv4r21s12%3A60020.1331695045194.compressed
          -rw-r--r--   1 stack  staff   2922204 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.compressed.lzma
          -rw-r--r--   1 stack  staff   3228837 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.lzma
          
          Show
          stack added a comment - Here's some WALs to compared compressed w/ patch v29 vs lzma and then the dictionary compressed file itself lzma'd (Todd request). LZMA'ing the dictionary compressed file makes it smaller than the lzma'd original. lzma'ing the compressed file makes it 1/4 size of dictionary compressed file (roughly). I didn't get a chance to lzo it.... .... -rw-r--r-- 1 stack staff 64589199 Mar 13 20:24 sv4r21s12%3A60020.1331685637452 -rwxrwxrwx 1 stack staff 28906432 Mar 14 15:34 sv4r21s12%3A60020.1331685637452.compressed -rw-r--r-- 1 stack staff 7417213 Mar 14 16:25 sv4r21s12%3A60020.1331685637452.compressed.lzma -rw-r--r-- 1 stack staff 8511618 Mar 14 16:24 sv4r21s12%3A60020.1331685637452.lzma -rw-r--r-- 1 stack staff 63755620 Mar 13 20:24 sv4r21s12%3A60020.1331687005652 -rwxrwxrwx 1 stack staff 28804928 Mar 14 15:34 sv4r21s12%3A60020.1331687005652.compressed -rw-r--r-- 1 stack staff 6866107 Mar 14 16:28 sv4r21s12%3A60020.1331687005652.compressed.lzma -rw-r--r-- 1 stack staff 8328771 Mar 14 16:27 sv4r21s12%3A60020.1331687005652.lzma -rw-r--r-- 1 stack staff 63755688 Mar 13 20:24 sv4r21s12%3A60020.1331688224458 -rwxrwxrwx 1 stack staff 27701052 Mar 14 15:34 sv4r21s12%3A60020.1331688224458.compressed -rw-r--r-- 1 stack staff 6614637 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.compressed.lzma -rw-r--r-- 1 stack staff 8462991 Mar 14 16:31 sv4r21s12%3A60020.1331688224458.lzma -rw-r--r-- 1 stack staff 64024836 Mar 13 20:24 sv4r21s12%3A60020.1331689518188 -rwxrwxrwx 1 stack staff 28851435 Mar 14 15:34 sv4r21s12%3A60020.1331689518188.compressed -rw-r--r-- 1 stack staff 6677112 Mar 14 16:35 sv4r21s12%3A60020.1331689518188.compressed.lzma -rw-r--r-- 1 stack staff 8158847 Mar 14 16:34 sv4r21s12%3A60020.1331689518188.lzma -rw-r--r-- 1 stack staff 63757131 Mar 13 20:24 sv4r21s12%3A60020.1331690608900 -rwxrwxrwx 1 stack staff 28201506 Mar 14 15:34 sv4r21s12%3A60020.1331690608900.compressed -rw-r--r-- 1 stack staff 6941982 Mar 14 16:38 sv4r21s12%3A60020.1331690608900.compressed.lzma -rw-r--r-- 1 stack staff 8513895 Mar 14 16:37 sv4r21s12%3A60020.1331690608900.lzma -rw-r--r-- 1 stack staff 63754114 Mar 13 20:24 sv4r21s12%3A60020.1331691711502 -rwxrwxrwx 1 stack staff 28318314 Mar 14 15:34 sv4r21s12%3A60020.1331691711502.compressed -rw-r--r-- 1 stack staff 7392701 Mar 14 16:42 sv4r21s12%3A60020.1331691711502.compressed.lzma -rw-r--r-- 1 stack staff 9136798 Mar 14 16:41 sv4r21s12%3A60020.1331691711502.lzma -rw-r--r-- 1 stack staff 63756667 Mar 13 20:24 sv4r21s12%3A60020.1331692886725 -rwxrwxrwx 1 stack staff 28309792 Mar 14 15:34 sv4r21s12%3A60020.1331692886725.compressed -rw-r--r-- 1 stack staff 7139965 Mar 14 16:44 sv4r21s12%3A60020.1331692886725.compressed.lzma -rw-r--r-- 1 stack staff 8968155 Mar 14 16:43 sv4r21s12%3A60020.1331692886725.lzma -rw-r--r-- 1 stack staff 63755003 Mar 13 20:24 sv4r21s12%3A60020.1331694049033 -rwxrwxrwx 1 stack staff 28127053 Mar 14 15:35 sv4r21s12%3A60020.1331694049033.compressed -rw-r--r-- 1 stack staff 6498486 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.compressed.lzma -rw-r--r-- 1 stack staff 8175618 Mar 14 16:45 sv4r21s12%3A60020.1331694049033.lzma -rw-r--r-- 1 stack staff 23441144 Mar 13 20:24 sv4r21s12%3A60020.1331695045194 -rwxrwxrwx 1 stack staff 10561645 Mar 14 15:35 sv4r21s12%3A60020.1331695045194.compressed -rw-r--r-- 1 stack staff 2922204 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.compressed.lzma -rw-r--r-- 1 stack staff 3228837 Mar 14 16:46 sv4r21s12%3A60020.1331695045194.lzma
          Hide
          stack added a comment -

          Patch that addresses Ted and Lars' last set of comments (diff between v28 and v29 is just extra comments and javadoc)

          Show
          stack added a comment - Patch that addresses Ted and Lars' last set of comments (diff between v28 and v29 is just extra comments and javadoc)
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/
          -----------------------------------------------------------

          (Updated 2012-03-14 22:26:34.755767)

          Review request for hbase.

          Changes
          -------

          Updated patch. Adds comments and javadoc to address Ted and Lars' comments.

          Summary
          -------

          See issue

          This addresses bug hbase-4608.
          https://issues.apache.org/jira/browse/hbase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing
          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 22:26:34.755767) Review request for hbase. Changes ------- Updated patch. Adds comments and javadoc to address Ted and Lars' comments. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 107

          > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line107>

          >

          > Nit: Comment here that the status byte is the higher order byte of the dict entry.

          done in next version

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108

          > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108>

          >

          > I assume we're entirely sure that a dictionary will never have > 2^15 entries.

          Li Pi wrote:

          It'll start evicting once it hits its max size, which is currently 2 ^ 15.

          Added comment to LRUDictionary on what happens when it hits limit as well as a comment on max expected size of dictionary for any one WAL.

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 128

          > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line128>

          >

          > Nit: The naming convention is a bit strange.

          > This one is called uncompress... whereas the method returning a new byte[] is called readCompressed

          Its not the worst. Its descriptive I think.

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1678

          > <https://reviews.apache.org/r/4328/diff/2/?file=92104#file92104line1678>

          >

          > Have a constructor that takes a compression context too?

          > It seems like once anything has been written to the HLog this should be immutable.

          That won't work for writing case since WAL compression is internal to wal package and the HLog.Entry used writing is made outside of the HLog... which means, for writing case we need above method. Might work for read side though here we allow 'reuse' of the shell HLog.Entry so would need the above method read side too....

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53

          > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53>

          >

          > COMPRESSED is a bit of a strange name.

          > I happens to be a version of the WAL that supports compression, but it is not necessarily compressed.

          Added comment that these enum means 'The WAL version that first had compression'

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 303

          > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line303>

          >

          > ugly whitespace

          Fixed in next version.

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32

          > <https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32>

          >

          > I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?

          > I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.

          Li Pi wrote:

          65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes.

          If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes.

          Li Pi wrote:

          Actually halve those amounts, 2^15, not 2^16.

          Added above as class comment on class.

          • Michael

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5951
          -----------------------------------------------------------

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 107 > < https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line107 > > > Nit: Comment here that the status byte is the higher order byte of the dict entry. done in next version On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108 > < https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108 > > > I assume we're entirely sure that a dictionary will never have > 2^15 entries. Li Pi wrote: It'll start evicting once it hits its max size, which is currently 2 ^ 15. Added comment to LRUDictionary on what happens when it hits limit as well as a comment on max expected size of dictionary for any one WAL. On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 128 > < https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line128 > > > Nit: The naming convention is a bit strange. > This one is called uncompress... whereas the method returning a new byte[] is called readCompressed Its not the worst. Its descriptive I think. On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1678 > < https://reviews.apache.org/r/4328/diff/2/?file=92104#file92104line1678 > > > Have a constructor that takes a compression context too? > It seems like once anything has been written to the HLog this should be immutable. That won't work for writing case since WAL compression is internal to wal package and the HLog.Entry used writing is made outside of the HLog... which means, for writing case we need above method. Might work for read side though here we allow 'reuse' of the shell HLog.Entry so would need the above method read side too.... On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53 > < https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53 > > > COMPRESSED is a bit of a strange name. > I happens to be a version of the WAL that supports compression, but it is not necessarily compressed. Added comment that these enum means 'The WAL version that first had compression' On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 303 > < https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line303 > > > ugly whitespace Fixed in next version. On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32 > < https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32 > > > I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case? > I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all. Li Pi wrote: 65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes. If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes. Li Pi wrote: Actually halve those amounts, 2^15, not 2^16. Added above as class comment on class. Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 ----------------------------------------------------------- On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 11:46:10, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53

          > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53>

          >

          > Introducing enum is a good idea.

          > I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar.

          Michael Stack wrote:

          HLogKey does not need to know about 'type' of compression.

          Adding comments around the versions to give some context on why enums are named so.

          On 2012-03-14 11:46:10, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189

          > <https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189>

          >

          > Hiding LRUDictionary.class is desirable.

          > Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ?

          Michael Stack wrote:

          Out of scope.

          Yeah, adding a factory to choose between different compression context types when we have only one compression type available is out of scope for this issue.

          • Michael

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5929
          -----------------------------------------------------------

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 11:46:10, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53 > < https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53 > > > Introducing enum is a good idea. > I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar. Michael Stack wrote: HLogKey does not need to know about 'type' of compression. Adding comments around the versions to give some context on why enums are named so. On 2012-03-14 11:46:10, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189 > < https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189 > > > Hiding LRUDictionary.class is desirable. > Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ? Michael Stack wrote: Out of scope. Yeah, adding a factory to choose between different compression context types when we have only one compression type available is out of scope for this issue. Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5929 ----------------------------------------------------------- On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          stack added a comment -

          Li asked me lzma some of my logs from the wild. I did. W/ lzma --best, it compresses down to 12% of size.

          Show
          stack added a comment - Li asked me lzma some of my logs from the wild. I did. W/ lzma --best, it compresses down to 12% of size.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108

          > <https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108>

          >

          > I assume we're entirely sure that a dictionary will never have > 2^15 entries.

          It'll start evicting once it hits its max size, which is currently 2 ^ 15.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5951
          -----------------------------------------------------------

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108 > < https://reviews.apache.org/r/4328/diff/2/?file=92102#file92102line108 > > > I assume we're entirely sure that a dictionary will never have > 2^15 entries. It'll start evicting once it hits its max size, which is currently 2 ^ 15. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 ----------------------------------------------------------- On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32

          > <https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32>

          >

          > I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?

          > I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.

          Li Pi wrote:

          65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes.

          If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes.

          Actually halve those amounts, 2^15, not 2^16.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5951
          -----------------------------------------------------------

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32 > < https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32 > > > I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case? > I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all. Li Pi wrote: 65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes. If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes. Actually halve those amounts, 2^15, not 2^16. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 ----------------------------------------------------------- On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 17:42:21, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32

          > <https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32>

          >

          > I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?

          > I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.

          65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes.

          If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5951
          -----------------------------------------------------------

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 17:42:21, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 32 > < https://reviews.apache.org/r/4328/diff/2/?file=92107#file92107line32 > > > I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case? > I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all. 65536 * 5 ( Regionname, Row key, CF, Column qual, table) * 100 bytes (these are some big names) = 32768000 bytes. Or 32 megabytes. If you want to get silly, even at 1kb entries (wtf are you naming things?), it maxes out at 320 megabytes. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 ----------------------------------------------------------- On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          Ted Yu added a comment -

          I just thought we should encapsulate LRUDictionary in CompressionContext:

          +    boolean compression = reader.isWALCompressionEnabled();
          +    if (compression) {
          +      try {
          +        if (compressionContext == null) {
          +          compressionContext = new CompressionContext(LRUDictionary.class);
          

          In my opinion CompressionContext shouldn't just be a holder of multiple dictionaries.

          Show
          Ted Yu added a comment - I just thought we should encapsulate LRUDictionary in CompressionContext: + boolean compression = reader.isWALCompressionEnabled(); + if (compression) { + try { + if (compressionContext == null ) { + compressionContext = new CompressionContext(LRUDictionary.class); In my opinion CompressionContext shouldn't just be a holder of multiple dictionaries.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5951
          -----------------------------------------------------------

          Ship it!

          Some comments and nits inside. Some extraneous whitespace (can be fixed at commit).

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/4328/#comment12915>

          Nit: Comment here that the status byte is the higher order byte of the dict entry.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/4328/#comment12916>

          I assume we're entirely sure that a dictionary will never have > 2^15 entries.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/4328/#comment12914>

          Nit: The naming convention is a bit strange.
          This one is called uncompress... whereas the method returning a new byte[] is called readCompressed

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/4328/#comment12917>

          Have a constructor that takes a compression context too?
          It seems like once anything has been written to the HLog this should be immutable.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/4328/#comment12919>

          COMPRESSED is a bit of a strange name.
          I happens to be a version of the WAL that supports compression, but it is not necessarily compressed.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/4328/#comment12920>

          ugly whitespace

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/4328/#comment12921>

          I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case?
          I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/4328/#comment12922>

          I'll trust you folks that a PriorityQueue would not work here.

          • Lars

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5951 ----------------------------------------------------------- Ship it! Some comments and nits inside. Some extraneous whitespace (can be fixed at commit). src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/4328/#comment12915 > Nit: Comment here that the status byte is the higher order byte of the dict entry. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/4328/#comment12916 > I assume we're entirely sure that a dictionary will never have > 2^15 entries. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/4328/#comment12914 > Nit: The naming convention is a bit strange. This one is called uncompress... whereas the method returning a new byte[] is called readCompressed src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/4328/#comment12917 > Have a constructor that takes a compression context too? It seems like once anything has been written to the HLog this should be immutable. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/4328/#comment12919 > COMPRESSED is a bit of a strange name. I happens to be a version of the WAL that supports compression, but it is not necessarily compressed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/4328/#comment12920 > ugly whitespace src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/4328/#comment12921 > I think I had that question to Li Pi... How much memory do we expect this dictionary to take worst case? I guess since there is one WAL per region server and it is rolled periodically it is not a problem at all. src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/4328/#comment12922 > I'll trust you folks that a PriorityQueue would not work here. Lars On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          Ted Yu added a comment -

          we don't compress values yet.

          Looks like we have something to do in V2

          Show
          Ted Yu added a comment - we don't compress values yet. Looks like we have something to do in V2
          Hide
          Li Pi added a comment -

          Not that we'd compress a random value well at all anyways.

          Show
          Li Pi added a comment - Not that we'd compress a random value well at all anyways.
          Hide
          Li Pi added a comment -

          Also, figured out why Ted's benchmarks differed from the rest of ours.

          PE tool tests with random writes to million rows, each row has a single column whose value is 1000 randomly-generated byte.

          This is pretty difficult to compress. The number of rows means that rownames won't fit in the dictionary, and we don't compress values yet.

          Show
          Li Pi added a comment - Also, figured out why Ted's benchmarks differed from the rest of ours. PE tool tests with random writes to million rows, each row has a single column whose value is 1000 randomly-generated byte. This is pretty difficult to compress. The number of rows means that rownames won't fit in the dictionary, and we don't compress values yet.
          Hide
          Li Pi added a comment -

          That code is just writing the output for the regionname, using the regiondict.

          I guess if the dictionary behavior were to change, it could be problematic. But when we have more than 1 dictionary, we can deal with it then.

          Show
          Li Pi added a comment - That code is just writing the output for the regionname, using the regiondict. I guess if the dictionary behavior were to change, it could be problematic. But when we have more than 1 dictionary, we can deal with it then.
          Hide
          Ted Yu added a comment -

          HLogKey does not need to know about 'type' of compression.

          I agree. But see this code:

          +      Compressor.writeCompressed(this.encodedRegionName, 0,
          +          this.encodedRegionName.length, out,
          +          compressionContext.regionDict);
          
          Show
          Ted Yu added a comment - HLogKey does not need to know about 'type' of compression. I agree. But see this code: + Compressor.writeCompressed( this .encodedRegionName, 0, + this .encodedRegionName.length, out, + compressionContext.regionDict);
          Hide
          Li Pi added a comment -

          +1 from here.

          Agree w/ Stack. Compression can be generalized later. We can just bump up the version in that case.

          Right now, this works, passes tests, and provides a very substantial improvement in certain cases. (See Stack's workload).

          Show
          Li Pi added a comment - +1 from here. Agree w/ Stack. Compression can be generalized later. We can just bump up the version in that case. Right now, this works, passes tests, and provides a very substantial improvement in certain cases. (See Stack's workload).
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-14 11:46:10, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53

          > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53>

          >

          > Introducing enum is a good idea.

          > I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar.

          HLogKey does not need to know about 'type' of compression.

          On 2012-03-14 11:46:10, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 306

          > <https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line306>

          >

          > How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ?

          Generalization is out of scope.

          On 2012-03-14 11:46:10, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189

          > <https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189>

          >

          > Hiding LRUDictionary.class is desirable.

          > Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ?

          Out of scope.

          On 2012-03-14 11:46:10, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 110

          > <https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line110>

          >

          > We introduced compression type in Metadata, how about allowing user to specify compression type using conf ?

          > Default is dictionary compression.

          Customization is out of scope. "How about..." should have attendant justification. You can justify generalization of this compression in a new jira.

          On 2012-03-14 11:46:10, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 139

          > <https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line139>

          >

          > Hiding LRUDictionary.class is desirable.

          > How about passing conf to CompressionContext ctor ?

          The generalization that would require hiding the type of compression being done is out of scope.

          This is not a software project that fellas are working on for casual amusement. New facility should be justified by real-world needs. This feature is experimental. It could help w/ our WAL writes. It may not. We need to get a basic facility into a release so we can try it. If it proves its worth, we can spend more time down this avenue.

          • Michael

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5929
          -----------------------------------------------------------

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-14 11:46:10, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 53 > < https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line53 > > > Introducing enum is a good idea. > I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar. HLogKey does not need to know about 'type' of compression. On 2012-03-14 11:46:10, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 306 > < https://reviews.apache.org/r/4328/diff/2/?file=92105#file92105line306 > > > How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ? Generalization is out of scope. On 2012-03-14 11:46:10, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 189 > < https://reviews.apache.org/r/4328/diff/2/?file=92108#file92108line189 > > > Hiding LRUDictionary.class is desirable. > Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ? Out of scope. On 2012-03-14 11:46:10, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 110 > < https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line110 > > > We introduced compression type in Metadata, how about allowing user to specify compression type using conf ? > Default is dictionary compression. Customization is out of scope. "How about..." should have attendant justification. You can justify generalization of this compression in a new jira. On 2012-03-14 11:46:10, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 139 > < https://reviews.apache.org/r/4328/diff/2/?file=92109#file92109line139 > > > Hiding LRUDictionary.class is desirable. > How about passing conf to CompressionContext ctor ? The generalization that would require hiding the type of compression being done is out of scope. This is not a software project that fellas are working on for casual amusement. New facility should be justified by real-world needs. This feature is experimental. It could help w/ our WAL writes. It may not. We need to get a basic facility into a release so we can try it. If it proves its worth, we can spend more time down this avenue. Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5929 ----------------------------------------------------------- On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/#review5929
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/4328/#comment12894>

          Introducing enum is a good idea.
          I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/4328/#comment12895>

          How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/4328/#comment12891>

          Hiding LRUDictionary.class is desirable.
          Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          <https://reviews.apache.org/r/4328/#comment12892>

          We introduced compression type in Metadata, how about allowing user to specify compression type using conf ?
          Default is dictionary compression.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          <https://reviews.apache.org/r/4328/#comment12893>

          Hiding LRUDictionary.class is desirable.
          How about passing conf to CompressionContext ctor ?

          • Ted

          On 2012-03-14 07:34:58, Michael Stack wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4328/

          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58)

          Review request for hbase.

          Summary

          -------

          See issue

          This addresses bug hbase-4608.

          https://issues.apache.org/jira/browse/hbase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing

          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/#review5929 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/4328/#comment12894 > Introducing enum is a good idea. I would suggest changing this to COMPRESSED_WITH_DICTIONARY or something similar. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/4328/#comment12895 > How about passing compressionContext and type of field we're reading to Compressor.readCompressed() ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/4328/#comment12891 > Hiding LRUDictionary.class is desirable. Shall we pass this.getMetadata() to CompressionContext ctor where selection of compression type is made ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java < https://reviews.apache.org/r/4328/#comment12892 > We introduced compression type in Metadata, how about allowing user to specify compression type using conf ? Default is dictionary compression. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java < https://reviews.apache.org/r/4328/#comment12893 > Hiding LRUDictionary.class is desirable. How about passing conf to CompressionContext ctor ? Ted On 2012-03-14 07:34:58, Michael Stack wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58) Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          Li Pi added a comment -

          @Stack nvm, just read upwards. That's inline with the other results by Todd and I.

          Show
          Li Pi added a comment - @Stack nvm, just read upwards. That's inline with the other results by Todd and I.
          Hide
          Li Pi added a comment -

          @Stack, what ratio's did you achieve?

          Show
          Li Pi added a comment - @Stack, what ratio's did you achieve?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518303/hbase-4608-v28.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.coprocessor.TestClassLoading
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518303/hbase-4608-v28.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1184//console This message is automatically generated.
          Hide
          stack added a comment -

          I reran compress, decompress, compress cycle over my 40 odd random WALs from prod and seems fine w/ v28. Sizes look right. No errors.

          Show
          stack added a comment - I reran compress, decompress, compress cycle over my 40 odd random WALs from prod and seems fine w/ v28. Sizes look right. No errors.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/
          -----------------------------------------------------------

          (Updated 2012-03-14 07:34:58.002687)

          Review request for hbase.

          Changes
          -------

          Uploading v28 for lars to take a looksee

          Summary
          -------

          See issue

          This addresses bug hbase-4608.
          https://issues.apache.org/jira/browse/hbase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing
          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- (Updated 2012-03-14 07:34:58.002687) Review request for hbase. Changes ------- Uploading v28 for lars to take a looksee Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestKeyValueCompression.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          stack added a comment -

          I like your changes Todd. Nice fixup. Lars, let me post v28 for you up on rb.

          Show
          stack added a comment - I like your changes Todd. Nice fixup. Lars, let me post v28 for you up on rb.
          Hide
          stack added a comment -

          Trying hadoopqa on v28.

          Show
          stack added a comment - Trying hadoopqa on v28.
          Hide
          stack added a comment -

          Reuploading Todds v28 so can run hadoopqa on it (needs to be most recent file posted)

          Show
          stack added a comment - Reuploading Todds v28 so can run hadoopqa on it (needs to be most recent file posted)
          Hide
          Todd Lipcon added a comment -

          btw, +1 on this new patch after you've double-checked with your logs and run it through the full suite. Lars, did you want to take a look tomorrow before it's committed?

          Show
          Todd Lipcon added a comment - btw, +1 on this new patch after you've double-checked with your logs and run it through the full suite. Lars, did you want to take a look tomorrow before it's committed?
          Hide
          Todd Lipcon added a comment -

          I reviewed the latest patch and made some improvements:

          • added a new true unit test for KeyValueCompression
          • addressed one of my pieces of review feedback from earlier about the API for uncompressIntoArray
          • renamed some methods for clarity
          • removed some spurious whitespace changes
          • added an enum for HLogKey version so that the comparisons are clearer
          • renamed MAXSIZE to MAX_SIZE
          • redid the linked list inside BidirectionalLRUMap, since the nomenclature was previously backwards and I found the code hard to read. ("next" is supposed to point towards the tail, not towards the head)
          • changed getEntry() to throw an error if you pass an index larger than the current size

          I ran the related unit tests and they passed, but did not try the full suite.

          Show
          Todd Lipcon added a comment - I reviewed the latest patch and made some improvements: added a new true unit test for KeyValueCompression addressed one of my pieces of review feedback from earlier about the API for uncompressIntoArray renamed some methods for clarity removed some spurious whitespace changes added an enum for HLogKey version so that the comparisons are clearer renamed MAXSIZE to MAX_SIZE redid the linked list inside BidirectionalLRUMap, since the nomenclature was previously backwards and I found the code hard to read. ("next" is supposed to point towards the tail, not towards the head) changed getEntry() to throw an error if you pass an index larger than the current size I ran the related unit tests and they passed, but did not try the full suite.
          Hide
          Lars Hofhansl added a comment -

          @Stack: fair enough. Let's get this one done. +1 on generalization only when needed and in another jira.

          Show
          Lars Hofhansl added a comment - @Stack: fair enough. Let's get this one done. +1 on generalization only when needed and in another jira.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4328/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          See issue

          This addresses bug hbase-4608.
          https://issues.apache.org/jira/browse/hbase-4608

          Diffs


          src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/4328/diff

          Testing
          -------

          Thanks,

          Michael

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4328/ ----------------------------------------------------------- Review request for hbase. Summary ------- See issue This addresses bug hbase-4608. https://issues.apache.org/jira/browse/hbase-4608 Diffs src/main/java/org/apache/hadoop/hbase/HConstants.java 045c6f3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b5049b1 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 311ea1b src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java ff63a5f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java 01ebb5c src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java d8f317c src/main/java/org/apache/hadoop/hbase/util/Bytes.java de8e40b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCompressor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/4328/diff Testing ------- Thanks, Michael
          Hide
          stack added a comment -

          Lars, I can't upload a patch to someone else's issue. Made a new rb at https://reviews.apache.org/r/4328/

          Show
          stack added a comment - Lars, I can't upload a patch to someone else's issue. Made a new rb at https://reviews.apache.org/r/4328/
          Hide
          stack added a comment -

          @Lars Generalizing the compression done here is out of scope for this issue. The patch was not written that way from the get go. The reviews done up to like v22odd made no mention of supporting other compression types. I'd suggest we do it in another issue if and when its wanted.

          Let me put v27 up on rb.

          I forget, do we also SNAPPY/LZO/GZ compress the HLogs?

          We don't do this because these compression algorithms work in blocks of 32k or so. If not tied off probably on the end we could lose up to 32k of edits.

          Show
          stack added a comment - @Lars Generalizing the compression done here is out of scope for this issue. The patch was not written that way from the get go. The reviews done up to like v22odd made no mention of supporting other compression types. I'd suggest we do it in another issue if and when its wanted. Let me put v27 up on rb. I forget, do we also SNAPPY/LZO/GZ compress the HLogs? We don't do this because these compression algorithms work in blocks of 32k or so. If not tied off probably on the end we could lose up to 32k of edits.
          Hide
          Lars Hofhansl added a comment -

          @Ted: The HLog compression we're doing here is less complicated and has (far) fewer implications on other modules compared to HBASE-4218. I don't think that is a good comparison.

          LRUDictionary.class is passed to the context

          You do have a point.
          Maybe instead of saying

          boolean compression = reader.isWALCompressionEnabled();
          if (compression) {
          ...
          

          it could be something like

          HLogCompressionType type = reader.getCompressionType();
          if (type == ...) {
          ...
          

          (just made that up, but you get the idea, and should be an easy change)

          @Stack: Is v27 up on RB? I looked at the earlier versions but haven't kept track recently. I'll promise I'll do a review tomorrow. I find it a bit big to just look at the diff.
          44% space saving is pretty awesome. I forget, do we also SNAPPY/LZO/GZ compress the HLogs?

          Show
          Lars Hofhansl added a comment - @Ted: The HLog compression we're doing here is less complicated and has (far) fewer implications on other modules compared to HBASE-4218 . I don't think that is a good comparison. LRUDictionary.class is passed to the context You do have a point. Maybe instead of saying boolean compression = reader.isWALCompressionEnabled(); if (compression) { ... it could be something like HLogCompressionType type = reader.getCompressionType(); if (type == ...) { ... (just made that up, but you get the idea, and should be an easy change) @Stack: Is v27 up on RB? I looked at the earlier versions but haven't kept track recently. I'll promise I'll do a review tomorrow. I find it a bit big to just look at the diff. 44% space saving is pretty awesome. I forget, do we also SNAPPY/LZO/GZ compress the HLogs?
          Hide
          stack added a comment -

          Can I get a +1 from someone else. Its not a big patch. Should be a quick review. Thanks.

          Show
          stack added a comment - Can I get a +1 from someone else. Its not a big patch. Should be a quick review. Thanks.
          Hide
          Ted Yu added a comment -

          Out of scope for this issue.

          This reminds me of HBASE-4218: from Aug 17th 2011 to Feb 17th 2012, the development took 6 months.

          This JIRA doesn't have as many algorithms as those in HBASE-4218. But we should follow similar goal:
          From Jacek @ 17/Aug/11 21:47:

          Once we have common interface you would be able to reuse some of my tests and benchmarks.

          Show
          Ted Yu added a comment - Out of scope for this issue. This reminds me of HBASE-4218 : from Aug 17th 2011 to Feb 17th 2012, the development took 6 months. This JIRA doesn't have as many algorithms as those in HBASE-4218 . But we should follow similar goal: From Jacek @ 17/Aug/11 21:47: Once we have common interface you would be able to reuse some of my tests and benchmarks.
          Hide
          stack added a comment -

          This would make developing a new compression scheme hard.

          Out of scope for this issue.

          Show
          stack added a comment - This would make developing a new compression scheme hard. Out of scope for this issue.
          Hide
          stack added a comment -

          This would make developing a new compression scheme hard.

          Out of scope for this issue.

          Show
          stack added a comment - This would make developing a new compression scheme hard. Out of scope for this issue.
          Hide
          stack added a comment -

          Address failed testlrudictionary test and include fix for javadoc Ted suggests.

          Show
          stack added a comment - Address failed testlrudictionary test and include fix for javadoc Ted suggests.
          Hide
          Ted Yu added a comment -

          I feel the dictionary compression implementation is pervasive throughout the patch.
          e.g.:

          +    boolean compression = reader.isWALCompressionEnabled();
          +    if (compression) {
          +      try {
          +        if (compressionContext == null) {
          +          compressionContext = new CompressionContext(LRUDictionary.class);
          

          while isWALCompressionEnabled() sounds general, LRUDictionary.class is passed to the context.
          This would make developing a new compression scheme hard.

          Show
          Ted Yu added a comment - I feel the dictionary compression implementation is pervasive throughout the patch. e.g.: + boolean compression = reader.isWALCompressionEnabled(); + if (compression) { + try { + if (compressionContext == null ) { + compressionContext = new CompressionContext(LRUDictionary.class); while isWALCompressionEnabled() sounds general, LRUDictionary.class is passed to the context. This would make developing a new compression scheme hard.
          Hide
          stack added a comment -

          We need facility in wal like we have in hfile for printing statistics on load carried. Our frontend is loads of counters. I've not verified. Should be random enough in table naming and region though so should be doing a bit of exercise of the compression code.

          I'm game for committing this as a first cut if I can get a +1.

          Show
          stack added a comment - We need facility in wal like we have in hfile for printing statistics on load carried. Our frontend is loads of counters. I've not verified. Should be random enough in table naming and region though so should be doing a bit of exercise of the compression code. I'm game for committing this as a first cut if I can get a +1.
          Hide
          stack added a comment -

          I took a random set of 40 logs off our front end and did a cycle of compress, decompress multiple times in a row and verified the compressed version always ends up the same size. No errors.

          I'm seeing a pretty consistent 44% of original size compression:

          pynchon:sv4r21s12,60020,1331025586905 stack$ echo "10561645/23441144"|bc -l
          .45056013477840501299
          pynchon:sv4r21s12,60020,1331025586905 stack$ echo "28127053/63755003"|bc -l
          .44117405186225150048
          pynchon:sv4r21s12,60020,1331025586905 stack$ echo "28309792/63756667"|bc -l
          .44402873192853697951
          pynchon:sv4r21s12,60020,1331025586905 stack$ echo "28318314/63754114"|bc -l
          .44418018263103773977
          ...
          
          Show
          stack added a comment - I took a random set of 40 logs off our front end and did a cycle of compress, decompress multiple times in a row and verified the compressed version always ends up the same size. No errors. I'm seeing a pretty consistent 44% of original size compression: pynchon:sv4r21s12,60020,1331025586905 stack$ echo "10561645/23441144" |bc -l .45056013477840501299 pynchon:sv4r21s12,60020,1331025586905 stack$ echo "28127053/63755003" |bc -l .44117405186225150048 pynchon:sv4r21s12,60020,1331025586905 stack$ echo "28309792/63756667" |bc -l .44402873192853697951 pynchon:sv4r21s12,60020,1331025586905 stack$ echo "28318314/63754114" |bc -l .44418018263103773977 ...
          Hide
          stack added a comment -

          But we don't know if the current dictionary compression API is general enough to cover the new compression type.

          Agree that we don't know what the future will bring. Not going to try.

          But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added.

          Yes, thats one possible scenario. There are others where we need to change the version. Can deal when we get there.

          Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ?

          If API doesn't change, no need to up the global file version. Could add new improved dictionary compression type.

          If we need to change the api, then we'll need to change the global version. At the same time we might add some other facility that has nought to do w/ compression – say, we might decide to intersperse markers for when we flush or compact. We'd likely bump the version one point only though. This new version would say indicate wal was now able to do extended compression api AND includes flush and compaction markers. We could bump the version once per feature added but that buys us nothing; its the version we ship that counts, the accumulation of features since last time we shipped.

          Show
          stack added a comment - But we don't know if the current dictionary compression API is general enough to cover the new compression type. Agree that we don't know what the future will bring. Not going to try. But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added. Yes, thats one possible scenario. There are others where we need to change the version. Can deal when we get there. Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ? If API doesn't change, no need to up the global file version. Could add new improved dictionary compression type. If we need to change the api, then we'll need to change the global version. At the same time we might add some other facility that has nought to do w/ compression – say, we might decide to intersperse markers for when we flush or compact. We'd likely bump the version one point only though. This new version would say indicate wal was now able to do extended compression api AND includes flush and compaction markers. We could bump the version once per feature added but that buys us nothing; its the version we ship that counts, the accumulation of features since last time we shipped.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518291/4608v25.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518291/4608v25.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1182//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          In HLogKey.java:

          +   * Enables compression.
          +   * 
          +   * @param tableDict
          +   *          dictionary used for compressing table
          +   * @param regionDict
          +   *          dictionary used for compressing region
          +   */
          +  public void setCompressionContext(CompressionContext compressionContext) {
          

          Please adjust the javadoc above.

          Show
          Ted Yu added a comment - In HLogKey.java: + * Enables compression. + * + * @param tableDict + * dictionary used for compressing table + * @param regionDict + * dictionary used for compressing region + */ + public void setCompressionContext(CompressionContext compressionContext) { Please adjust the javadoc above.
          Hide
          Ted Yu added a comment -

          and if all else is equal – same API, etc. – then we don't need to up the global version.

          True. But we don't know if the current dictionary compression API is general enough to cover the new compression type.

          wal version and compression type. They are not the same thing.

          Agreed. But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added.

          Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ?

          Show
          Ted Yu added a comment - and if all else is equal – same API, etc. – then we don't need to up the global version. True. But we don't know if the current dictionary compression API is general enough to cover the new compression type. wal version and compression type. They are not the same thing. Agreed. But the last paragraph above hinges on the scenario of keeping the same WAL version when new compression type is added. Suppose we find a way to improve dictionary compression after the integration of this JIRA. Would WAL version increase or stay at 1 ?
          Hide
          stack added a comment -

          It implies that WAL_VERSION is the same as COMPRESSION_VERSION.

          Yes. Thats right. The current global version is the version that introduces WAL compression.

          As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION

          You are conflating wal version and compression type. They are not the same thing.

          If we introduce a new compression type only, and if all else is equal – same API, etc. – then we don't need to up the global version. We are just adding a new compression type. Either we support it or we don't. If we don't we'll throw unsupported compression type (the dictionary compression type is currently called DICTIONARY_COMPRESSION_TYPE).

          Show
          stack added a comment - It implies that WAL_VERSION is the same as COMPRESSION_VERSION. Yes. Thats right. The current global version is the version that introduces WAL compression. As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION You are conflating wal version and compression type. They are not the same thing. If we introduce a new compression type only, and if all else is equal – same API, etc. – then we don't need to up the global version. We are just adding a new compression type. Either we support it or we don't. If we don't we'll throw unsupported compression type (the dictionary compression type is currently called DICTIONARY_COMPRESSION_TYPE).
          Hide
          stack added a comment -

          This includes Ted reviews (including suggestion that I shorten a line in HConstants). Also fixed an issue where an NPE was hiding real issue when bad paths passed Compressor tool.

          Show
          stack added a comment - This includes Ted reviews (including suggestion that I shorten a line in HConstants). Also fixed an issue where an NPE was hiding real issue when bad paths passed Compressor tool.
          Hide
          stack added a comment -

          I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339

          Because it has metadata the original doesn't have. When I compress it, it compresses down to same size. Notice that the decompressed and decompressed.again are same size because they both have the new meata data.

          The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23.

          Pardon me. Should have uploaded v24.

          Testing has turned up a minor issue... will upload v25 soon.

          In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37

          Nope. This is the global version that introduces compression. No need of major/minor granularity, and in particular major/minor on the compression feature itself. Its overkill.

          I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version.

          Nope. First figure if we have a file that does compression. Then figure what type of compression the file does.

          Show
          stack added a comment - I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339 Because it has metadata the original doesn't have. When I compress it, it compresses down to same size. Notice that the decompressed and decompressed.again are same size because they both have the new meata data. The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23. Pardon me. Should have uploaded v24. Testing has turned up a minor issue... will upload v25 soon. In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37 Nope. This is the global version that introduces compression. No need of major/minor granularity, and in particular major/minor on the compression feature itself. Its overkill. I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version. Nope. First figure if we have a file that does compression. Then figure what type of compression the file does.
          Hide
          Ted Yu added a comment - - edited

          Please wrap long line:

          +  public static final String ENABLE_WAL_COMPRESSION = "hbase.regionserver.wal.enablecompression";
          

          w.r.t. the following code:

          +  static final int VERSION = COMPRESSION_VERSION;
          +  static final Text WAL_VERSION = new Text("" + VERSION);
          

          It implies that WAL_VERSION is the same as COMPRESSION_VERSION.
          As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION e.g.
          Then we face a choice: what value would WAL_VERSION carry ?

          I propose naming the above COMPRESSION_VERSION constant DICTIONARY_COMPRESSION_VERSION and decouple it from WAL_VERSION.
          In the future, WAL_VERSION of 2 can carry either dictionary or prefix compression.

          Show
          Ted Yu added a comment - - edited Please wrap long line: + public static final String ENABLE_WAL_COMPRESSION = "hbase.regionserver.wal.enablecompression" ; w.r.t. the following code: + static final int VERSION = COMPRESSION_VERSION; + static final Text WAL_VERSION = new Text("" + VERSION); It implies that WAL_VERSION is the same as COMPRESSION_VERSION. As I explained earlier, we would likely have another compression scheme for WAL in the future, resulting in the introduction of PREFIX_COMPRESSION_VERSION e.g. Then we face a choice: what value would WAL_VERSION carry ? I propose naming the above COMPRESSION_VERSION constant DICTIONARY_COMPRESSION_VERSION and decouple it from WAL_VERSION. In the future, WAL_VERSION of 2 can carry either dictionary or prefix compression.
          Hide
          Ted Yu added a comment -

          I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339

          Show
          Ted Yu added a comment - I noticed the size of sv4r25s8%3A60020.1331661889339.decompressed is different from that of sv4r25s8%3A60020.1331661889339
          Hide
          Ted Yu added a comment -

          The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23.

          Let me elaborate more on my comment @ 14/Mar/12 00:26
          As you described, we would use a new constant (COMPRESSION_VERSION) to represent the minimum version that supports dictionary compression.
          In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37

          Say we later introduce prefix compression, we would introduce another constant representing the minimum version supporting prefix compression.

          I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version.

          Regards

          Show
          Ted Yu added a comment - The sentence involving COMPRESSION_VERSION was in past tense but I don't see it in patch v23. Let me elaborate more on my comment @ 14/Mar/12 00:26 As you described, we would use a new constant (COMPRESSION_VERSION) to represent the minimum version that supports dictionary compression. In my opinion, this version corresponds to the major version in my comment @ 13/Mar/12 01:37 Say we later introduce prefix compression, we would introduce another constant representing the minimum version supporting prefix compression. I agree that both version and compression type should be checked. However, the order should be checking compression type followed by checking version. Regards
          Hide
          stack added a comment -

          Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end:

          -rw-r--r--    1 stack  staff   64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339
          -rwxrwxrwx    1 stack  staff   28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed
          -rwxrwxrwx    1 stack  staff   28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again
          -rwxrwxrwx    1 stack  staff   28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again
          -rwxrwxrwx    1 stack  staff   64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed
          -rwxrwxrwx    1 stack  staff   64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again
          

          Its 44% of original size.

          Show
          stack added a comment - Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end: -rw-r--r-- 1 stack staff 64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339 -rwxrwxrwx 1 stack staff 28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed -rwxrwxrwx 1 stack staff 28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again -rwxrwxrwx 1 stack staff 28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again -rwxrwxrwx 1 stack staff 64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed -rwxrwxrwx 1 stack staff 64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again Its 44% of original size.
          Hide
          stack added a comment -

          Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end:

          -rw-r--r--    1 stack  staff   64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339
          -rwxrwxrwx    1 stack  staff   28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed
          -rwxrwxrwx    1 stack  staff   28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again
          -rwxrwxrwx    1 stack  staff   28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again
          -rwxrwxrwx    1 stack  staff   64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed
          -rwxrwxrwx    1 stack  staff   64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again
          

          Its 44% of original size.

          Show
          stack added a comment - Here's my compressing, decompressing, compressing again, decompressing again, then recompressing a random log file from our front end: -rw-r--r-- 1 stack staff 64928728 Mar 13 20:43 sv4r25s8%3A60020.1331661889339 -rwxrwxrwx 1 stack staff 28540761 Mar 13 20:48 sv4r25s8%3A60020.1331661889339.compressed -rwxrwxrwx 1 stack staff 28540761 Mar 13 20:58 sv4r25s8%3A60020.1331661889339.compressed.again -rwxrwxrwx 1 stack staff 28540761 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.compressed.again.again -rwxrwxrwx 1 stack staff 64945799 Mar 13 20:57 sv4r25s8%3A60020.1331661889339.decompressed -rwxrwxrwx 1 stack staff 64945799 Mar 13 21:02 sv4r25s8%3A60020.1331661889339.decompressed.again Its 44% of original size.
          Hide
          stack added a comment -

          Address Ted's comments.

          Show
          stack added a comment - Address Ted's comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518270/4608v23.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518270/4608v23.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLRUDictionary Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1180//console This message is automatically generated.
          Hide
          stack added a comment -

          What would happen when we have a newer version for WAL_VERSION_KEY ?

          You mean VERSION? You mean when new feature added? This code will change. We'll likely have a constant for the version that introduces compression, i.e. version 1, and we'll use it here in this expression instead (I went ahead and added a COMPRESSION_VERSION, the version that introduced compression and will use this new define instead).

          Looks like the following check should suffice for isWALCompressionEnabled()...

          Nah. Verify we have sufficient global version first, then check for the type.

          Will fix other issues in next version of patch.

          Show
          stack added a comment - What would happen when we have a newer version for WAL_VERSION_KEY ? You mean VERSION? You mean when new feature added? This code will change. We'll likely have a constant for the version that introduces compression, i.e. version 1, and we'll use it here in this expression instead (I went ahead and added a COMPRESSION_VERSION, the version that introduced compression and will use this new define instead). Looks like the following check should suffice for isWALCompressionEnabled()... Nah. Verify we have sufficient global version first, then check for the type. Will fix other issues in next version of patch.
          Hide
          Ted Yu added a comment -

          Minor comments:

          For findEntry():

          +  public short findEntry(byte[] data, int offset, int length) {
          +    short ret = backingStore.findIdx(data, offset, length);
          +    if (ret == -1) {
          +      addEntry(data, offset, length);
          +    }
          

          I think NOT_IN_DICTIONARY should be used in place of -1.

          There're a few white spaces, e.g. at the beginning of second and third lines below:

          +    if (compressionContext == null) {
          +     Bytes.writeByteArray(out, this.encodedRegionName);
          +     Bytes.writeByteArray(out, this.tablename);
          
          Show
          Ted Yu added a comment - Minor comments: For findEntry(): + public short findEntry( byte [] data, int offset, int length) { + short ret = backingStore.findIdx(data, offset, length); + if (ret == -1) { + addEntry(data, offset, length); + } I think NOT_IN_DICTIONARY should be used in place of -1. There're a few white spaces, e.g. at the beginning of second and third lines below: + if (compressionContext == null ) { + Bytes.writeByteArray(out, this .encodedRegionName); + Bytes.writeByteArray(out, this .tablename);
          Hide
          Ted Yu added a comment -

          In isWALCompressionEnabled():

          +    if (txt == null || Integer.parseInt(txt.toString()) < VERSION) return false;
          

          What would happen when we have a newer version for WAL_VERSION_KEY ?

          Looks like the following check should suffice for isWALCompressionEnabled():

          +    txt = metadata.get(WAL_COMPRESSION_TYPE_KEY);
          +    return txt != null && txt.equals(DICTIONARY_COMPRESSION_TYPE);
          
          Show
          Ted Yu added a comment - In isWALCompressionEnabled(): + if (txt == null || Integer .parseInt(txt.toString()) < VERSION) return false ; What would happen when we have a newer version for WAL_VERSION_KEY ? Looks like the following check should suffice for isWALCompressionEnabled(): + txt = metadata.get(WAL_COMPRESSION_TYPE_KEY); + return txt != null && txt.equals(DICTIONARY_COMPRESSION_TYPE);
          Hide
          stack added a comment -

          Renamed method enableCompression in all places to be setCompressionContext

          Made all instances of compression contexts have same name rather than a new name every time used.

          Cleaned up unused 'compression' data member flag or moved them local from being data members when only used by a single method.

          Removed define of TRUE and repeat of ENABLE_WAL_COMPRESSION key from
          SequenceFileLogReader. No longer needed.

          Rather than have the sequencefile metadata code making sprinkled over the reader and writer, instead do all in writer and have reader use write methods.

          Added a global WAL type as metadata.

          Added a compression type to metadata.

          Renamed method WALCompressionEnabled as isWALCompressionEnabled.

          Added some small tests to TestLRUDictionary and a new TestCompressor that taught me how this stuff works. Added documentation to methods where I was surprised; e.g. addEntry will happily add new entry even though already has dictionary entry.

          Miscellaneous cleanup.

          I ran this compression on one of our production logs and it halved its size. See below. I then decompressed and then recompressed and I got the same size back.

          -rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:47 sv4r25s8%3A60020.1331661889339.out.out.out
          -rwxrwxrwx   1 stack  staff  64945799 Mar 13 16:45 sv4r25s8%3A60020.1331661889339.out.out
          -rwxrwxrwx   1 stack  staff  28540761 Mar 13 16:44 sv4r25s8%3A60020.1331661889339.out
          -rw-r--r--   1 stack  staff  64928728 Mar 13 16:25 sv4r25s8%3A60020.1331661889339
          

          Will run more of our production logs through the compressor this evening to see if I can turn up bugs.

          Show
          stack added a comment - Renamed method enableCompression in all places to be setCompressionContext Made all instances of compression contexts have same name rather than a new name every time used. Cleaned up unused 'compression' data member flag or moved them local from being data members when only used by a single method. Removed define of TRUE and repeat of ENABLE_WAL_COMPRESSION key from SequenceFileLogReader. No longer needed. Rather than have the sequencefile metadata code making sprinkled over the reader and writer, instead do all in writer and have reader use write methods. Added a global WAL type as metadata. Added a compression type to metadata. Renamed method WALCompressionEnabled as isWALCompressionEnabled. Added some small tests to TestLRUDictionary and a new TestCompressor that taught me how this stuff works. Added documentation to methods where I was surprised; e.g. addEntry will happily add new entry even though already has dictionary entry. Miscellaneous cleanup. I ran this compression on one of our production logs and it halved its size. See below. I then decompressed and then recompressed and I got the same size back. -rwxrwxrwx 1 stack staff 28540761 Mar 13 16:47 sv4r25s8%3A60020.1331661889339.out.out.out -rwxrwxrwx 1 stack staff 64945799 Mar 13 16:45 sv4r25s8%3A60020.1331661889339.out.out -rwxrwxrwx 1 stack staff 28540761 Mar 13 16:44 sv4r25s8%3A60020.1331661889339.out -rw-r--r-- 1 stack staff 64928728 Mar 13 16:25 sv4r25s8%3A60020.1331661889339 Will run more of our production logs through the compressor this evening to see if I can turn up bugs.
          Hide
          Lars Hofhansl added a comment -

          @Stack: I brought HFile into this discussion, sorry about that.
          @Ted: The version you cite is for the HFile version not for the compression version, correct?
          @Li Pi: You make a good point. WAL_VERSION could imply the compression type. Could call it WAL_TYPE, that way we still have the flexibility to alter compression. We do not regularly change the HLog format, so that is reasonable.

          Show
          Lars Hofhansl added a comment - @Stack: I brought HFile into this discussion, sorry about that. @Ted: The version you cite is for the HFile version not for the compression version, correct? @Li Pi: You make a good point. WAL_VERSION could imply the compression type. Could call it WAL_TYPE, that way we still have the flexibility to alter compression. We do not regularly change the HLog format, so that is reasonable.
          Hide
          Ted Yu added a comment -

          Since Li Pi has done 90% of coding, I think this JIRA should bear his name at the time of integration.

          Show
          Ted Yu added a comment - Since Li Pi has done 90% of coding, I think this JIRA should bear his name at the time of integration.
          Hide
          stack added a comment -

          PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version.

          Ted, you misunderstood. The above was suggested name for a new compression type, a version two of prefix compression.

          Your bringing hfile compression versioning in here is an unnecessary complication, IMO. Compression will not have the variety here it does over in hfile (IMO).

          I think compression type versioning would allow us to perform migration with ease in the future.

          Not needed. We will have compression types and WAL file global versioning. That should be sufficient describing future evolutions, IMO.

          Show
          stack added a comment - PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version. Ted, you misunderstood. The above was suggested name for a new compression type, a version two of prefix compression. Your bringing hfile compression versioning in here is an unnecessary complication, IMO. Compression will not have the variety here it does over in hfile (IMO). I think compression type versioning would allow us to perform migration with ease in the future. Not needed. We will have compression types and WAL file global versioning. That should be sufficient describing future evolutions, IMO.
          Hide
          Ted Yu added a comment -

          From HFileBlock:

            int getMinorVersion() {
              return this.minorVersion;
            }
          

          From HFileReaderV2.java:

            private void validateMinorVersion(Path path, int minorVersion) {
              if (minorVersion < MIN_MINOR_VERSION ||
                  minorVersion > MAX_MINOR_VERSION) {
          

          I think compression type versioning would allow us to perform migration with ease in the future.

          PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version.

          Show
          Ted Yu added a comment - From HFileBlock: int getMinorVersion() { return this .minorVersion; } From HFileReaderV2.java: private void validateMinorVersion(Path path, int minorVersion) { if (minorVersion < MIN_MINOR_VERSION || minorVersion > MAX_MINOR_VERSION) { I think compression type versioning would allow us to perform migration with ease in the future. PREFIX_COMPRESSION_V2, first cited by Stack, is a combination of compression type + compression version.
          Hide
          Li Pi added a comment -

          Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram.

          There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this.

          My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression.

          Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later.

          Show
          Li Pi added a comment - Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram. There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this. My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression. Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later.
          Hide
          Li Pi added a comment -

          Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram.

          There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this.

          My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression.

          Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later.

          Show
          Li Pi added a comment - Yo, sorry I can't quite work on this. Finals are finished this week, and once that happens, I'll be able to scram. There doesn't seem to that much left - though I said that about 3 months ago. My bad! Feel free to do as you please, theres not much left on this, and I'm happy that work is getting done. I won't be offended at all if somebody else wants to take their hand at finishing this. My thoughts on it were this. WAL_VERSION is used to indicate compression type. This is pretty good, because enabling compression would immediately tell older versions that the version was wrong, while newer versions with compression disabled could function alongside older versions without support for compression. Also, I had my old benchmarks, and I was getting anywhere from a 20% increase to 40% increase on YCSB loads, depending on the testcase. This seemed pretty impressive to me. Not sure if a bug was introduced. I'll run a few more benchmarks later.
          Hide
          stack added a comment -

          It is rare that I saw review comments in such tone: condescending.

          Don't be silly. Frustrated, yes. Condescending no.

          And the same comment was posted twice.

          Sorry about that. Made a mistake.

          Show
          stack added a comment - It is rare that I saw review comments in such tone: condescending. Don't be silly. Frustrated, yes. Condescending no. And the same comment was posted twice. Sorry about that. Made a mistake.
          Hide
          Lars Hofhansl added a comment -

          Just my $0.02 here... I think having a compression type + compression version will be hard to grok for newcomers unfamiliar with this area, whereas having a single compression type fields is clear. A new version of a compression algorithm is a new type (IMHO). We do not have compression versions for the HFiles, just compression types.

          I think with WAL_VERSION and compression type we have enough flexibility (HLogKey version is really unrelated as it is for other serialization as well).

          What do you think Ted?

          I'll do some testing as to what the compression ratio is for a few of our scenarios tomorrow.

          Show
          Lars Hofhansl added a comment - Just my $0.02 here... I think having a compression type + compression version will be hard to grok for newcomers unfamiliar with this area, whereas having a single compression type fields is clear. A new version of a compression algorithm is a new type (IMHO). We do not have compression versions for the HFiles, just compression types. I think with WAL_VERSION and compression type we have enough flexibility (HLogKey version is really unrelated as it is for other serialization as well). What do you think Ted? I'll do some testing as to what the compression ratio is for a few of our scenarios tomorrow.
          Hide
          stack added a comment -

          Let me have a go at it since Li Pi can't finish it just yet.

          Show
          stack added a comment - Let me have a go at it since Li Pi can't finish it just yet.
          Hide
          Ted Yu added a comment -

          From what can one conclude who owns the issue ? Assignee ?

          I do have an opinion on compression type versioning. I would wait for a concrete design to form.

          Show
          Ted Yu added a comment - From what can one conclude who owns the issue ? Assignee ? I do have an opinion on compression type versioning. I would wait for a concrete design to form.
          Hide
          stack added a comment -

          @Ted I think you should resign ownership of this issue. You are just pushing its conclusion further out w/ your continual negotiation and what ifs.

          Show
          stack added a comment - @Ted I think you should resign ownership of this issue. You are just pushing its conclusion further out w/ your continual negotiation and what ifs.
          Hide
          Ted Yu added a comment -

          Stop making this more complicated than it need be Ted.

          It is rare that I saw review comments in such tone: condescending.

          And the same comment was posted twice.

          Show
          Ted Yu added a comment - Stop making this more complicated than it need be Ted. It is rare that I saw review comments in such tone: condescending. And the same comment was posted twice.
          Hide
          Ted Yu added a comment -

          Having PREFIX_COMPRESSION_V2 in the future is equivalent to having compression type version.
          It may make compression checking verbose: I think checking against one compression type is better than comparing with every PREFIX_COMPRESSION_Vx.

          I agree with the observation about PE data.

          Show
          Ted Yu added a comment - Having PREFIX_COMPRESSION_V2 in the future is equivalent to having compression type version. It may make compression checking verbose: I think checking against one compression type is better than comparing with every PREFIX_COMPRESSION_Vx. I agree with the observation about PE data.
          Hide
          stack added a comment -

          Stop making this more complicated than it need be Ted.

          WAL_VERSION is global version on WAL log.

          Adding a type metadata field for compression makes sense. If none, presume uncompressed.

          You don't need a compression type version. If we change the format, we can do PREFIX_COMPRESSION_V2.

          HLogKeys are serialized independent of their container. Don't conflate their versioning w/ the suggested WAL log versioning.

          Regards PE data, its data is not amenable to compression. Its keys are very basic. Its likely not a good test evaluating the viability of this feature.

          Show
          stack added a comment - Stop making this more complicated than it need be Ted. WAL_VERSION is global version on WAL log. Adding a type metadata field for compression makes sense. If none, presume uncompressed. You don't need a compression type version. If we change the format, we can do PREFIX_COMPRESSION_V2. HLogKeys are serialized independent of their container. Don't conflate their versioning w/ the suggested WAL log versioning. Regards PE data, its data is not amenable to compression. Its keys are very basic. Its likely not a good test evaluating the viability of this feature.
          Hide
          stack added a comment -

          Stop making this more complicated than it need be Ted.

          WAL_VERSION is global version on WAL log.

          Adding a type metadata field for compression makes sense. If none, presume uncompressed.

          You don't need a compression type version. If we change the format, we can do PREFIX_COMPRESSION_V2.

          HLogKeys are serialized independent of their container. Don't conflate their versioning w/ the suggested WAL log versioning.

          Regards PE data, its data is not amenable to compression. Its keys are very basic. Its likely not a good test evaluating the viability of this feature.

          Show
          stack added a comment - Stop making this more complicated than it need be Ted. WAL_VERSION is global version on WAL log. Adding a type metadata field for compression makes sense. If none, presume uncompressed. You don't need a compression type version. If we change the format, we can do PREFIX_COMPRESSION_V2. HLogKeys are serialized independent of their container. Don't conflate their versioning w/ the suggested WAL log versioning. Regards PE data, its data is not amenable to compression. Its keys are very basic. Its likely not a good test evaluating the viability of this feature.
          Hide
          Ted Yu added a comment -

          We're looking at several metadata fields for version:
          1. WAL_VERSION for HLog file
          2. compression type for HLog file
          3. compression major (minor) version
          4. HLogKey version (covered in latest patch)

          It would create some confusion w.r.t. the different combinations of the above 4

          Show
          Ted Yu added a comment - We're looking at several metadata fields for version: 1. WAL_VERSION for HLog file 2. compression type for HLog file 3. compression major (minor) version 4. HLogKey version (covered in latest patch) It would create some confusion w.r.t. the different combinations of the above 4
          Hide
          Todd Lipcon added a comment -

          First, compression ratio is not good - at least for the data written by PE.

          I saw ~40% compression on a YCSB load. So some workloads may have good results whereas others didn't. Did you also re-run the test after fixing the bug? Maybe that skewed the results?

          Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard.

          I agree we should use a metadata field in the log to describe which compression mechanism is being used.

          Show
          Todd Lipcon added a comment - First, compression ratio is not good - at least for the data written by PE. I saw ~40% compression on a YCSB load. So some workloads may have good results whereas others didn't. Did you also re-run the test after fixing the bug? Maybe that skewed the results? Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard. I agree we should use a metadata field in the log to describe which compression mechanism is being used.
          Hide
          Ted Yu added a comment -

          HLog version decision aside, my feeling about the current implementation is -0.5

          First, compression ratio is not good - at least for the data written by PE.

          Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard.

          Show
          Ted Yu added a comment - HLog version decision aside, my feeling about the current implementation is -0.5 First, compression ratio is not good - at least for the data written by PE. Second, HLogKey persistence becomes dependent on the compression implementation. This would make plugging other compression techniques hard.
          Hide
          Ted Yu added a comment -

          I think we may enhance WAL compression using dictionary in the future.
          So for DICTIONARY compression type, it is desirable to introduce versioning as well.

          I don't have strong opinion about WAL_VERSION actually.

          Show
          Ted Yu added a comment - I think we may enhance WAL compression using dictionary in the future. So for DICTIONARY compression type, it is desirable to introduce versioning as well. I don't have strong opinion about WAL_VERSION actually.
          Hide
          Lars Hofhansl added a comment -

          My question is: would HLog v2 be allowed not to compress Log entries ?

          I think the answer is yes. You're right that VERSION is orthogonal to COMPRESSION. I do agree with Stack that while we're adding metadata to HLog we should add a VERSION as well. We should add both VERSION and COMPRESSION metadata. (Maybe that's what you were saying anyway, if so feel free to ignore me).

          Show
          Lars Hofhansl added a comment - My question is: would HLog v2 be allowed not to compress Log entries ? I think the answer is yes. You're right that VERSION is orthogonal to COMPRESSION. I do agree with Stack that while we're adding metadata to HLog we should add a VERSION as well. We should add both VERSION and COMPRESSION metadata. (Maybe that's what you were saying anyway, if so feel free to ignore me).
          Hide
          Ted Yu added a comment -

          Since WAL compression may be off for the new HLog file version, we would always consult compression type metadata when reading HLog file.
          WAL_VERSION is written but is not needed at time of reading HLog.

          Show
          Ted Yu added a comment - Since WAL compression may be off for the new HLog file version, we would always consult compression type metadata when reading HLog file. WAL_VERSION is written but is not needed at time of reading HLog.
          Hide
          stack added a comment -

          I can add WAL_VERSION as v2 in the metadata.

          Why not as version 1? The absence of WAL_VERSION can be version zero.

          My question is: would HLog v2 be allowed not to compress Log entries ?

          Yes. The compress flag would be 'off' (isn't that the default?)

          If desirable, we can discuss in more detail, face to face, on the 27th.

          Why wait till then? This is the last big one before we can release a 0.94.

          Show
          stack added a comment - I can add WAL_VERSION as v2 in the metadata. Why not as version 1? The absence of WAL_VERSION can be version zero. My question is: would HLog v2 be allowed not to compress Log entries ? Yes. The compress flag would be 'off' (isn't that the default?) If desirable, we can discuss in more detail, face to face, on the 27th. Why wait till then? This is the last big one before we can release a 0.94.
          Hide
          Ted Yu added a comment -

          For code specific review, please use https://reviews.apache.org/r/4185/ where there would be context.

          I can add WAL_VERSION as v2 in the metadata.
          My question is: would HLog v2 be allowed not to compress Log entries ?

          If desirable, we can discuss in more detail, face to face, on the 27th.

          Show
          Ted Yu added a comment - For code specific review, please use https://reviews.apache.org/r/4185/ where there would be context. I can add WAL_VERSION as v2 in the metadata. My question is: would HLog v2 be allowed not to compress Log entries ? If desirable, we can discuss in more detail, face to face, on the 27th.
          Hide
          stack added a comment -

          I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature.

          How does it get in if you don't add it?

          If you don't want to add it, just don't. I'm not going to +1 this patch though if it adds metadata about a new compression feature w/o introducing a general versioning on the WAL.

          Should the Compression class in wal package ...

          The compression class in wal is Compressor.java.

          I have trouble following your responses to my comments because they come in w/o context and are also they are done piecemeal which means I have to spend way more time than I should have to reviewing your stuff. I'd suggest you save up your comments and submit them in a lump rather than hit submit per comment; you'll use up less internet.

          Show
          stack added a comment - I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature. How does it get in if you don't add it? If you don't want to add it, just don't. I'm not going to +1 this patch though if it adds metadata about a new compression feature w/o introducing a general versioning on the WAL. Should the Compression class in wal package ... The compression class in wal is Compressor.java. I have trouble following your responses to my comments because they come in w/o context and are also they are done piecemeal which means I have to spend way more time than I should have to reviewing your stuff. I'd suggest you save up your comments and submit them in a lump rather than hit submit per comment; you'll use up less internet.
          Hide
          Ted Yu added a comment -

          Uploaded v23 onto review board.
          After WAL version metadata design is finalized, will add that.

          Show
          Ted Yu added a comment - Uploaded v23 onto review board. After WAL version metadata design is finalized, will add that.
          Hide
          Ted Yu added a comment -

          Should the Compression class in wal package ...

          I only see KeyValueCompression.java under wal package. Please elaborate which class should carry more comments.

          Show
          Ted Yu added a comment - Should the Compression class in wal package ... I only see KeyValueCompression.java under wal package. Please elaborate which class should carry more comments.
          Hide
          Ted Yu added a comment -

          I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature.
          Say we define WAL_VERSION as v2 which has WAL compression capability. We still need to check compression type metadata before applying dictionary compression.
          In this regard adding WAL_VERSION seems to be redundant.

          Show
          Ted Yu added a comment - I think WAL_VERSION metadata is orthogonal to compression type metadata and I would expect both to be present in new HLog files written with this feature. Say we define WAL_VERSION as v2 which has WAL compression capability. We still need to check compression type metadata before applying dictionary compression. In this regard adding WAL_VERSION seems to be redundant.
          Hide
          stack added a comment -

          The tests do not have variety. I think we should add it here rather than wait for the variety to hit out in the field.

          If only compression would evolve, I think checking against compression type metadata would be adequate.

          The above begins with a conditional, "If...".

          Show
          stack added a comment - The tests do not have variety. I think we should add it here rather than wait for the variety to hit out in the field. If only compression would evolve, I think checking against compression type metadata would be adequate. The above begins with a conditional, "If...".
          Hide
          stack added a comment -

          Its a regular pattern only. Perhaps this does some decent testing? TestWALReplayCompressed?

          Show
          stack added a comment - Its a regular pattern only. Perhaps this does some decent testing? TestWALReplayCompressed?
          Hide
          Ted Yu added a comment -

          Its the test of a single entry only

          Please take a look at the following in test:

              for(int i = 1; i < Short.MAX_VALUE; i++){
                assertTrue(testee.findEntry(BigInteger.valueOf(i).toByteArray(), 0,
                    BigInteger.valueOf(i).toByteArray().length) == -1);
              }
          

          32766 entries of the dictionary are tested.

          If only compression would evolve, I think checking against compression type metadata would be adequate.

          Show
          Ted Yu added a comment - Its the test of a single entry only Please take a look at the following in test: for ( int i = 1; i < Short .MAX_VALUE; i++){ assertTrue(testee.findEntry(BigInteger.valueOf(i).toByteArray(), 0, BigInteger.valueOf(i).toByteArray().length) == -1); } 32766 entries of the dictionary are tested. If only compression would evolve, I think checking against compression type metadata would be adequate.
          Hide
          stack added a comment -

          Its the test of a single entry only which is not really exercising much.

          Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future. Is there plan for the above ?

          I've not heard of any. Is that your argument for not adding a version? Because if there has been no discussion of change up to this, we wouldn't possibly need to change the format in the future?

          Show
          stack added a comment - Its the test of a single entry only which is not really exercising much. Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future. Is there plan for the above ? I've not heard of any. Is that your argument for not adding a version? Because if there has been no discussion of change up to this, we wouldn't possibly need to change the format in the future?
          Hide
          Ted Yu added a comment -

          try a paragraph of text going in and out

          LRUDictionary deals with byte array:

            public short findEntry(byte[] data, int offset, int length) {
          

          In this regard, piping text into the dictionary is functionally same as piping byte[] form of integer.

          Show
          Ted Yu added a comment - try a paragraph of text going in and out LRUDictionary deals with byte array: public short findEntry( byte [] data, int offset, int length) { In this regard, piping text into the dictionary is functionally same as piping byte[] form of integer.
          Hide
          stack added a comment -

          In TestLRUDictionary, we test a single entry in essence. We should try it w/ all kinds of rubbish... really long entries, empty entries, null entries.... similar entries... a dictionary for 32k worth of stuff..as we'll do in the wild. So I'd think?

          A test for the new class KeyValueCompression would be good to have too.

          enableCompression is an odd name for this method. Should it be setCompressionContext since that is what it does (you pass null if no compression)... seems odd passing null to 'enableCompression'

          Should the Compression class in wal package have more javadoc comments explaining the kinda of compression it does? Otherwise, it looks like a generic compressor class when in facts its a one-trick pony?

          Should this method, WALCompressionEnabled, be isWALCompressionEnabled?

          I like your idea of versioning the WAL

          Patch is coming along nicely. Almost there.

          Show
          stack added a comment - In TestLRUDictionary, we test a single entry in essence. We should try it w/ all kinds of rubbish... really long entries, empty entries, null entries.... similar entries... a dictionary for 32k worth of stuff..as we'll do in the wild. So I'd think? A test for the new class KeyValueCompression would be good to have too. enableCompression is an odd name for this method. Should it be setCompressionContext since that is what it does (you pass null if no compression)... seems odd passing null to 'enableCompression' Should the Compression class in wal package have more javadoc comments explaining the kinda of compression it does? Otherwise, it looks like a generic compressor class when in facts its a one-trick pony? Should this method, WALCompressionEnabled, be isWALCompressionEnabled? I like your idea of versioning the WAL Patch is coming along nicely. Almost there.
          Hide
          Ted Yu added a comment -

          Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future.
          Is there plan for the above ?
          Having another compression type is nice but requires making HLogKey persistence pluggable.

          I think it would be better to introduce one meta entry instead of two.

          Show
          Ted Yu added a comment - Introducing WAL_VERSION would imply that we may change HLog aspect other than compression in the future. Is there plan for the above ? Having another compression type is nice but requires making HLogKey persistence pluggable. I think it would be better to introduce one meta entry instead of two.
          Hide
          stack added a comment -

          Is HLog versioned? If not, perhaps instead of a HConstants.WAL_COMPRESSION_VER, add a WAL_VERSION metadata field. Then have another for compression type (NONE or this)?

          For TestLRUDictionary, please outline the combinations that should be added.

          Does it not look bare to you? I'd think that we'd try a paragraph of text going in and out... perhaps test multiple dictionaries in the one file?

          Show
          stack added a comment - Is HLog versioned? If not, perhaps instead of a HConstants.WAL_COMPRESSION_VER, add a WAL_VERSION metadata field. Then have another for compression type (NONE or this)? For TestLRUDictionary, please outline the combinations that should be added. Does it not look bare to you? I'd think that we'd try a paragraph of text going in and out... perhaps test multiple dictionaries in the one file?
          Hide
          Ted Yu added a comment -

          For TestLRUDictionary, please outline the combinations that should be added.

          Show
          Ted Yu added a comment - For TestLRUDictionary, please outline the combinations that should be added.
          Hide
          Ted Yu added a comment -

          I plan to introduce HConstants.WAL_COMPRESSION_VER and store it in Metadata of HLog file.
          I think it can replace HConstants.ENABLE_WAL_COMPRESSION: if there is no HConstants.WAL_COMPRESSION_VER in Metadata, WAL compression is turned off.

          Please comment.

          Show
          Ted Yu added a comment - I plan to introduce HConstants.WAL_COMPRESSION_VER and store it in Metadata of HLog file. I think it can replace HConstants.ENABLE_WAL_COMPRESSION: if there is no HConstants.WAL_COMPRESSION_VER in Metadata, WAL compression is turned off. Please comment.
          Hide
          Ted Yu added a comment -

          One note: I am not sure how representative the sequential write of PE is.
          From above example, gain from HLog compression was 1%.

          Show
          Ted Yu added a comment - One note: I am not sure how representative the sequential write of PE is. From above example, gain from HLog compression was 1%.
          Hide
          Ted Yu added a comment -

          I cannot go to bed if the answer is still No
          With patch v22, I was able to perform decompression/compression round-trip.
          See the timestamp of the files below:

          -rwxrwxrwx   1 zhihyu  110088321   99406052 Mar  9 21:38 sea-lab-3.comp
          -rwxrwxrwx   1 zhihyu  110088321  100664533 Mar  9 21:36 sea-lab-3.decomp
          -rw-r--r--   1 zhihyu  110088321   99406052 Mar  9 21:18 sea-lab-3%2C60020%2C1331337114819.1331337244655
          

          The fix is the second line below:

                while ((e = in.next()) != null) {
                  if (compress) e.enableCompression(null);
          

          This is because Entry e would be carrying non-null context after the in.next() call if the input was compressed HLog.
          This context needs to be stripped before we pass the Entry to writer.

          Patch v22 should be close to the state of checkin.

          Show
          Ted Yu added a comment - I cannot go to bed if the answer is still No With patch v22, I was able to perform decompression/compression round-trip. See the timestamp of the files below: -rwxrwxrwx 1 zhihyu 110088321 99406052 Mar 9 21:38 sea-lab-3.comp -rwxrwxrwx 1 zhihyu 110088321 100664533 Mar 9 21:36 sea-lab-3.decomp -rw-r--r-- 1 zhihyu 110088321 99406052 Mar 9 21:18 sea-lab-3%2C60020%2C1331337114819.1331337244655 The fix is the second line below: while ((e = in.next()) != null ) { if (compress) e.enableCompression( null ); This is because Entry e would be carrying non-null context after the in.next() call if the input was compressed HLog. This context needs to be stripped before we pass the Entry to writer. Patch v22 should be close to the state of checkin.
          Hide
          stack added a comment -

          Does v21 fix the bad decompress that you found above testing with PE?

          Show
          stack added a comment - Does v21 fix the bad decompress that you found above testing with PE?
          Hide
          Ted Yu added a comment -

          Simplified Compressor tool.
          We read compression status from input HLog.
          There is no need to pass -u or -c now.

          I tested new build on the HLog used @ 10/Mar/12 00:00 with the new syntax.

          I uploaded patch v21 onto review board.

          Show
          Ted Yu added a comment - Simplified Compressor tool. We read compression status from input HLog. There is no need to pass -u or -c now. I tested new build on the HLog used @ 10/Mar/12 00:00 with the new syntax. I uploaded patch v21 onto review board.
          Hide
          Ted Yu added a comment -

          I repeated manual decompression based on patch v20.
          Still got:

          12/03/09 15:58:30 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-3.comp, syncFs=true, hflush=true
          Exception in thread "main" java.io.IOException: sea-lab-3.decomp, entryStart=124, pos=1406386, end=98439940, edit=0
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
          	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
          	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:276)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:232)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:201)
          	at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:91)
          	at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:58)
          Caused by: java.io.IOException: //0 read 36 bytes, should read 22
          	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118)
          	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:230)
          	... 3 more
          
          Show
          Ted Yu added a comment - I repeated manual decompression based on patch v20. Still got: 12/03/09 15:58:30 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-3.comp, syncFs= true , hflush= true Exception in thread "main" java.io.IOException: sea-lab-3.decomp, entryStart=124, pos=1406386, end=98439940, edit=0 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:276) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:232) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:201) at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:91) at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:58) Caused by: java.io.IOException: //0 read 36 bytes, should read 22 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:230) ... 3 more
          Hide
          Ted Yu added a comment -

          Uploaded patch v20 onto review board.

          keyContext is used by HLogKey to compress region name and table name of HLogKey

          Show
          Ted Yu added a comment - Uploaded patch v20 onto review board. keyContext is used by HLogKey to compress region name and table name of HLogKey
          Hide
          Ted Yu added a comment -

          Thanks for the reminder w.r.t. Metadata.

          In SequenceFileLogWriter.init(), we can pass Metadata, indicating whether WAL compression is enabled, to SequenceFile.Writer which then gets persisted.
          SequenceFile.Reader.getMetadata() would return WAL compression status.

          The we don't need the following in SequenceFileLogReader.init():

              compression = conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false);
          

          I think the above is important part of the review comments.

          Will address adding unit test too, maybe in later iteration.

          Show
          Ted Yu added a comment - Thanks for the reminder w.r.t. Metadata. In SequenceFileLogWriter.init(), we can pass Metadata, indicating whether WAL compression is enabled, to SequenceFile.Writer which then gets persisted. SequenceFile.Reader.getMetadata() would return WAL compression status. The we don't need the following in SequenceFileLogReader.init(): compression = conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false ); I think the above is important part of the review comments. Will address adding unit test too, maybe in later iteration.
          Hide
          stack added a comment -

          The above method allows to start computation at specified offset while existing hashCode() doesn't have this parameter.

          Should have at least the same name as the other two methods that do same (pity WritableComparator.hashBytes w/ start offset doesn't exist).

          Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a convenient way for doing above.

          When you create a write on a sequencefile, you can pass metadata: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Metadata.html

          For HLogKey, can we designate version of -2 for representing compressed HLogKey ? If HLogKey isn't compressed, we write -1.

          I don't know what this is in response to.

          What about my other items?

          Show
          stack added a comment - The above method allows to start computation at specified offset while existing hashCode() doesn't have this parameter. Should have at least the same name as the other two methods that do same (pity WritableComparator.hashBytes w/ start offset doesn't exist). Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a convenient way for doing above. When you create a write on a sequencefile, you can pass metadata: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Metadata.html For HLogKey, can we designate version of -2 for representing compressed HLogKey ? If HLogKey isn't compressed, we write -1. I don't know what this is in response to. What about my other items?
          Hide
          Ted Yu added a comment -
          +  public static int hashBytes(byte[] bytes, int offset, int length) {
          

          The above method allows to start computation at specified offset while existing hashCode() doesn't have this parameter.

          The remark of putting compression flag as sequence file attribute is really good.
          Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a convenient way for doing above.

          For HLogKey, can we designate version of -2 for representing compressed HLogKey ? If HLogKey isn't compressed, we write -1.

          Show
          Ted Yu added a comment - + public static int hashBytes( byte [] bytes, int offset, int length) { The above method allows to start computation at specified offset while existing hashCode() doesn't have this parameter. The remark of putting compression flag as sequence file attribute is really good. Looking at SequenceFile.Sorter.cloneFileAttributes(), I don't see a convenient way for doing above. For HLogKey, can we designate version of -2 for representing compressed HLogKey ? If HLogKey isn't compressed, we write -1.
          Hide
          stack added a comment -

          The TestLRUDictionary test looks like it could be fatter. Looks like you should be able to throw at it a bunch more combinations. And better excercising of new BidirectionalLRUMap type. Better to find the issues here in unit test than....

          Whats the difference between

          +  public static int hashBytes(byte[] bytes, int offset, int length) {
          

          and the existing

            public static int hashCode(final byte [] b, final int length) {
          

          They look to do the same thing? We should remove the new one if so.

          We will have a keycontext when we are deserializing? Hows that work?

          So we compress at the individual entry level? Why not file at a time? (Sorry if this has been explained earlier)

          Is this right in the WALReader?

          +    compression = conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false);
          

          How does that work if the WAL was written compressed but this flag is false? We break? Shouldn't this instead be keyed off the entries themselves? Should it be a sequence file attribute saying this a compressed file?

          Do we foresee replication being able to use this facility? Seems like a natural having it ship compressed entries.

          Good stuff.

          Show
          stack added a comment - The TestLRUDictionary test looks like it could be fatter. Looks like you should be able to throw at it a bunch more combinations. And better excercising of new BidirectionalLRUMap type. Better to find the issues here in unit test than.... Whats the difference between + public static int hashBytes( byte [] bytes, int offset, int length) { and the existing public static int hashCode( final byte [] b, final int length) { They look to do the same thing? We should remove the new one if so. We will have a keycontext when we are deserializing? Hows that work? So we compress at the individual entry level? Why not file at a time? (Sorry if this has been explained earlier) Is this right in the WALReader? + compression = conf.getBoolean(HConstants.ENABLE_WAL_COMPRESSION, false ); How does that work if the WAL was written compressed but this flag is false? We break? Shouldn't this instead be keyed off the entries themselves? Should it be a sequence file attribute saying this a compressed file? Do we foresee replication being able to use this facility? Seems like a natural having it ship compressed entries. Good stuff.
          Hide
          Ted Yu added a comment -

          I performed the above procedure again by using two config objects at the beginning of transformFile().
          I got same result.

          Show
          Ted Yu added a comment - I performed the above procedure again by using two config objects at the beginning of transformFile(). I got same result.
          Hide
          Todd Lipcon added a comment -

          Good test case - maybe it can be turned into a functional test in the code?

          Show
          Todd Lipcon added a comment - Good test case - maybe it can be turned into a functional test in the code?
          Hide
          Ted Yu added a comment - - edited

          I issued the following command:

          bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 5
          

          After the above job finished, I saw this in region server log:

          2012-03-07 13:01:12,408 INFO  wal.SequenceFileLogWriter (SequenceFileLogWriter.java:init(91)) <<regionserver60020.logRoller>> - WAL compression enabled for hdfs://sea-lab-0:54310/hbase/.logs/sea-lab-5,60020,1331150872956/sea-lab-5%2C60020%2C1331150872956.1331154072399
          

          After copying the HLog to local, I issued:

          bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -u sea-lab-5%2C60020%2C1331150872956.1331154072399 sea-lab-5.decomp
          

          I got:

          -rwxr-xr-x 1 hduser hduser 119487372 2012-03-07 14:12 sea-lab-5.decomp
          -rw-r--r-- 1 hduser hduser 120660017 2012-03-07 14:11 sea-lab-5%2C60020%2C1331150872956.1331154072399
          

          When I issued compression command, I saw:

          $ bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -c sea-lab-5.decomp sea-lab-5.comp
          12/03/07 14:14:17 INFO wal.SequenceFileLogReader: Input stream class: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting length
          12/03/07 14:14:17 INFO wal.SequenceFileLogWriter: WAL compression enabled for sea-lab-5.comp
          12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: new createWriter -- HADOOP-6840 -- not available
          12/03/07 14:14:17 WARN util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
          12/03/07 14:14:17 WARN util.NativeCodeLoader: java.library.path=/apache/hbase/bin/../lib/native/Linux-amd64-64
          12/03/07 14:14:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          12/03/07 14:14:17 INFO compress.CodecPool: Got brand-new compressor [.deflate]
          12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-5.comp, syncFs=true, hflush=true
          Exception in thread "main" java.io.IOException: sea-lab-5.decomp, entryStart=124, pos=1406386, end=119487372, edit=0
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
          	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
          	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:275)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:231)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:200)
          	at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:93)
          	at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:59)
          Caused by: java.io.IOException: //0 read 36 bytes, should read 22
          	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118)
          	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155)
          	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:229)
          	... 3 more
          
          Show
          Ted Yu added a comment - - edited I issued the following command: bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 5 After the above job finished, I saw this in region server log: 2012-03-07 13:01:12,408 INFO wal.SequenceFileLogWriter (SequenceFileLogWriter.java:init(91)) <<regionserver60020.logRoller>> - WAL compression enabled for hdfs: //sea-lab-0:54310/hbase/.logs/sea-lab-5,60020,1331150872956/sea-lab-5%2C60020%2C1331150872956.1331154072399 After copying the HLog to local, I issued: bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -u sea-lab-5%2C60020%2C1331150872956.1331154072399 sea-lab-5.decomp I got: -rwxr-xr-x 1 hduser hduser 119487372 2012-03-07 14:12 sea-lab-5.decomp -rw-r--r-- 1 hduser hduser 120660017 2012-03-07 14:11 sea-lab-5%2C60020%2C1331150872956.1331154072399 When I issued compression command, I saw: $ bin/hbase org.apache.hadoop.hbase.regionserver.wal.Compressor -c sea-lab-5.decomp sea-lab-5.comp 12/03/07 14:14:17 INFO wal.SequenceFileLogReader: Input stream class: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting length 12/03/07 14:14:17 INFO wal.SequenceFileLogWriter: WAL compression enabled for sea-lab-5.comp 12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: new createWriter -- HADOOP-6840 -- not available 12/03/07 14:14:17 WARN util.NativeCodeLoader: Failed to load native -hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 12/03/07 14:14:17 WARN util.NativeCodeLoader: java.library.path=/apache/hbase/bin/../lib/ native /Linux-amd64-64 12/03/07 14:14:17 WARN util.NativeCodeLoader: Unable to load native -hadoop library for your platform... using builtin-java classes where applicable 12/03/07 14:14:17 INFO compress.CodecPool: Got brand- new compressor [.deflate] 12/03/07 14:14:17 DEBUG wal.SequenceFileLogWriter: Path=sea-lab-5.comp, syncFs= true , hflush= true Exception in thread "main" java.io.IOException: sea-lab-5.decomp, entryStart=124, pos=1406386, end=119487372, edit=0 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:275) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:231) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:200) at org.apache.hadoop.hbase.regionserver.wal.Compressor.transformFile(Compressor.java:93) at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:59) Caused by: java.io.IOException: //0 read 36 bytes, should read 22 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2118) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2155) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:229) ... 3 more
          Hide
          Ted Yu added a comment -

          Patch v19 from review board.

          Show
          Ted Yu added a comment - Patch v19 from review board.
          Hide
          Ted Yu added a comment -

          Fix bug in checking sizeBytes in uncompressIntoArray()

          Show
          Ted Yu added a comment - Fix bug in checking sizeBytes in uncompressIntoArray()
          Hide
          Ted Yu added a comment -
          Show
          Ted Yu added a comment - Patch v17 from https://reviews.apache.org/r/4185/
          Hide
          Ted Yu added a comment -

          I got permission from Pi to complete this feature since he is busy with course work.

          I created new review request:
          https://reviews.apache.org/r/4185/

          Show
          Ted Yu added a comment - I got permission from Pi to complete this feature since he is busy with course work. I created new review request: https://reviews.apache.org/r/4185/
          Hide
          Lars Hofhansl added a comment -

          Marking for 0.94

          Show
          Lars Hofhansl added a comment - Marking for 0.94
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5597
          -----------------------------------------------------------

          It may be better if 4608v16.txt is uploaded here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12170>

          Can we toggle this config param after in.init() ?
          This way we only create one Configuration

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12171>

          Should read 'uncompressed array'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12172>

          This assignment is not necessary.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12173>

          Should read '... start writing to'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12174>

          Should read 'the length of entry'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12175>

          Should we add a check for other sizeBytes values ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12176>

          wrap long line, please.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment12177>

          Remove extra empty line.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java
          <https://reviews.apache.org/r/2740/#comment12178>

          This sentence is in parentheses.
          People would think it applies to dictionary indexes.
          Strictly speaking, -1 is not an index.

          Better rephrase this sentence.

          • Ted

          On 2012-03-01 09:58:44, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-03-01 09:58:44)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5597 ----------------------------------------------------------- It may be better if 4608v16.txt is uploaded here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12170 > Can we toggle this config param after in.init() ? This way we only create one Configuration src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12171 > Should read 'uncompressed array' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12172 > This assignment is not necessary. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12173 > Should read '... start writing to' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12174 > Should read 'the length of entry' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12175 > Should we add a check for other sizeBytes values ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12176 > wrap long line, please. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment12177 > Remove extra empty line. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java < https://reviews.apache.org/r/2740/#comment12178 > This sentence is in parentheses. People would think it applies to dictionary indexes. Strictly speaking, -1 is not an index. Better rephrase this sentence. Ted On 2012-03-01 09:58:44, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-03-01 09:58:44) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Ted Yu added a comment -

          Patch v16 decrements HLogKey.VERSION

          Show
          Ted Yu added a comment - Patch v16 decrements HLogKey.VERSION
          Hide
          Ted Yu added a comment -

          The reason we need to decrement HLogKey.VERSION is that HBASE-2195 (which introduced HLogKey.VERSION starting at -1) went into 0.92

          Show
          Ted Yu added a comment - The reason we need to decrement HLogKey.VERSION is that HBASE-2195 (which introduced HLogKey.VERSION starting at -1) went into 0.92
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5525
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/2740/#comment11943>

          HLogKey.VERSION should be decremented to -2.

          The if statement should be changed to:
          if (version == -1 || keyContext == null)

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/2740/#comment11944>

          The if statement should be changed to:
          if (version == -1 || keyContext == null)

          • Ted

          On 2012-03-01 09:58:44, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-03-01 09:58:44)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5525 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/2740/#comment11943 > HLogKey.VERSION should be decremented to -2. The if statement should be changed to: if (version == -1 || keyContext == null) src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/2740/#comment11944 > The if statement should be changed to: if (version == -1 || keyContext == null) Ted On 2012-03-01 09:58:44, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-03-01 09:58:44) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Ted Yu added a comment -

          Converted Li Pi's patch to format acceptable by Hadoop QA

          Show
          Ted Yu added a comment - Converted Li Pi's patch to format acceptable by Hadoop QA
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-03-01 09:58:44.801420)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          Updated as per stack's review.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-03-01 09:58:44.801420) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- Updated as per stack's review. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 17cb0e3 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Dictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java bd31ead src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java a11899c src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line37>

          >

          > Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go?

          >

          > Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later).

          Included!

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 47

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line47>

          >

          > The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent.

          System.out.println("Exactly one of -u or -c must be specified"); should take care of the required thing.

          Help now takes both short and long forms. Everything else just takes short forms.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 66

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line66>

          >

          > Why is the tool called WALCompressor in the usage but the class I invoke is Compressor?

          Probably should be called compressor.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 79

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line79>

          >

          > This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here?

          Not really. All that matters is whether compression is on or off.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line108>

          >

          > Doc the '@return'

          fixed.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 141

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line141>

          >

          > Doc the return

          fixed.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1671

          > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1671>

          >

          > White space

          fixed.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1675

          > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1675>

          >

          > When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle?

          Its called when an logwriter is created. We will start compression a log in the middle if we happen to call it at that time. But that shouldn't happen.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 270

          > <https://reviews.apache.org/r/2740/diff/19/?file=78624#file78624line270>

          >

          > I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null?

          HLogKey has two different formats. If len < 0, that means we're reading the old version of the HLog.

          Keycontext is the compression context that holds the dictionaries used in compression. If it isn't null, that means compression is enabled.

          If len > 0, we're on version 1. We can't compress version 1, but the code for reading version 1 is still in there, for transitioning from earlier HLogs. Compression should never be enabled if we're reading in version 1 Hlogs, because there shouldn't be any version 1 hlogs.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 119

          > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line119>

          >

          > Class comment on what this is about?

          Just a tuple class for holding the various dictionaries used in compression.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 141

          > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line141>

          >

          > Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance?

          Correct. We clear it so we can recycle this instance instead of having to create a new dictionary. Not sure if this makes a huge difference in terms of performance.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 29

          > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line29>

          >

          > Why would I ever let go of terms in the dictionary? Should you explain why in class comment?

          We let go of terms in the dictionary since we have only an finite amount of space, and ability to reference terms of the dictionary.

          If we're using a 2 byte key, that limits our reference space to 65536. We could end up using vints for entries into the dictionary, but this could end up with it growing pretty huge.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 64

          > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line64>

          >

          > Should this be static? Does it need reference to outer class?

          It doesn't need to reference the outer class. Made static.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 168

          > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line168>

          >

          > Class comment? Should this be static?

          made static.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 176

          > <https://reviews.apache.org/r/2740/diff/19/?file=78627#file78627line176>

          >

          > Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here.

          We just burp if compression is on and we get fed an uncompressed file. This should be easy to change though - on the read side.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 28

          > <https://reviews.apache.org/r/2740/diff/19/?file=78629#file78629line28>

          >

          > Should be just called Dictionary. Its in the wal package. No need of the redundant prefix?

          Sure. But we have WALActionsListener and a bunch of other things starting with WAL. I figured we can just have that as well.

          Renamed to dictionary.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 38

          > <https://reviews.apache.org/r/2740/diff/19/?file=78634#file78634line38>

          >

          > This will run all the tests in TestWALReplay? Nice.

          Yup. thats exactly what it does.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5265
          -----------------------------------------------------------

          On 2012-02-22 03:46:12, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line37 > > > Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go? > > Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later). Included! On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 47 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line47 > > > The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent. System.out.println("Exactly one of -u or -c must be specified"); should take care of the required thing. Help now takes both short and long forms. Everything else just takes short forms. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 66 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line66 > > > Why is the tool called WALCompressor in the usage but the class I invoke is Compressor? Probably should be called compressor. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 79 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line79 > > > This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here? Not really. All that matters is whether compression is on or off. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line108 > > > Doc the '@return' fixed. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 141 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line141 > > > Doc the return fixed. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1671 > < https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1671 > > > White space fixed. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1675 > < https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1675 > > > When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle? Its called when an logwriter is created. We will start compression a log in the middle if we happen to call it at that time. But that shouldn't happen. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 270 > < https://reviews.apache.org/r/2740/diff/19/?file=78624#file78624line270 > > > I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null? HLogKey has two different formats. If len < 0, that means we're reading the old version of the HLog. Keycontext is the compression context that holds the dictionaries used in compression. If it isn't null, that means compression is enabled. If len > 0, we're on version 1. We can't compress version 1, but the code for reading version 1 is still in there, for transitioning from earlier HLogs. Compression should never be enabled if we're reading in version 1 Hlogs, because there shouldn't be any version 1 hlogs. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 119 > < https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line119 > > > Class comment on what this is about? Just a tuple class for holding the various dictionaries used in compression. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 141 > < https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line141 > > > Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance? Correct. We clear it so we can recycle this instance instead of having to create a new dictionary. Not sure if this makes a huge difference in terms of performance. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 29 > < https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line29 > > > Why would I ever let go of terms in the dictionary? Should you explain why in class comment? We let go of terms in the dictionary since we have only an finite amount of space, and ability to reference terms of the dictionary. If we're using a 2 byte key, that limits our reference space to 65536. We could end up using vints for entries into the dictionary, but this could end up with it growing pretty huge. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 64 > < https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line64 > > > Should this be static? Does it need reference to outer class? It doesn't need to reference the outer class. Made static. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 168 > < https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line168 > > > Class comment? Should this be static? made static. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 176 > < https://reviews.apache.org/r/2740/diff/19/?file=78627#file78627line176 > > > Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here. We just burp if compression is on and we get fed an uncompressed file. This should be easy to change though - on the read side. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 28 > < https://reviews.apache.org/r/2740/diff/19/?file=78629#file78629line28 > > > Should be just called Dictionary. Its in the wal package. No need of the redundant prefix? Sure. But we have WALActionsListener and a bunch of other things starting with WAL. I figured we can just have that as well. Renamed to dictionary. On 2012-02-22 05:11:37, Michael Stack wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 38 > < https://reviews.apache.org/r/2740/diff/19/?file=78634#file78634line38 > > > This will run all the tests in TestWALReplay? Nice. Yup. thats exactly what it does. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5265 ----------------------------------------------------------- On 2012-02-22 03:46:12, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5265
          -----------------------------------------------------------

          This looks great. Some small comments below.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11488>

          Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go?

          Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later).

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11489>

          The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11490>

          Why is the tool called WALCompressor in the usage but the class I invoke is Compressor?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11491>

          This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11492>

          Doc the '@return'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11493>

          Doc the return

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/2740/#comment11494>

          White space

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/2740/#comment11495>

          When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/2740/#comment11496>

          I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
          <https://reviews.apache.org/r/2740/#comment11497>

          Class comment on what this is about?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
          <https://reviews.apache.org/r/2740/#comment11498>

          Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11499>

          Why would I ever let go of terms in the dictionary? Should you explain why in class comment?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11501>

          Should this be static? Does it need reference to outer class?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11502>

          Class comment? Should this be static?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment11503>

          Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
          <https://reviews.apache.org/r/2740/#comment11504>

          Should be just called Dictionary. Its in the wal package. No need of the redundant prefix?

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          <https://reviews.apache.org/r/2740/#comment11505>

          This will run all the tests in TestWALReplay? Nice.

          • Michael

          On 2012-02-22 03:46:12, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5265 ----------------------------------------------------------- This looks great. Some small comments below. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11488 > Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go? Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later). src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11489 > The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11490 > Why is the tool called WALCompressor in the usage but the class I invoke is Compressor? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11491 > This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11492 > Doc the '@return' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11493 > Doc the return src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/2740/#comment11494 > White space src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/2740/#comment11495 > When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle? src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/2740/#comment11496 > I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null? src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java < https://reviews.apache.org/r/2740/#comment11497 > Class comment on what this is about? src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java < https://reviews.apache.org/r/2740/#comment11498 > Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance? src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11499 > Why would I ever let go of terms in the dictionary? Should you explain why in class comment? src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11501 > Should this be static? Does it need reference to outer class? src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11502 > Class comment? Should this be static? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment11503 > Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java < https://reviews.apache.org/r/2740/#comment11504 > Should be just called Dictionary. Its in the wal package. No need of the redundant prefix? src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java < https://reviews.apache.org/r/2740/#comment11505 > This will run all the tests in TestWALReplay? Nice. Michael On 2012-02-22 03:46:12, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-21 23:30:35, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 61

          > <https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line61>

          >

          > This comment should also be placed at the beginning of compressFile().

          removed the comment, not necessary anymore.

          On 2012-02-21 23:30:35, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 88

          > <https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line88>

          >

          > Typo: should be output.getFileSystem(outconf)

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5254
          -----------------------------------------------------------

          On 2012-02-22 03:46:12, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-21 23:30:35, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 61 > < https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line61 > > > This comment should also be placed at the beginning of compressFile(). removed the comment, not necessary anymore. On 2012-02-21 23:30:35, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 88 > < https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line88 > > > Typo: should be output.getFileSystem(outconf) fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5254 ----------------------------------------------------------- On 2012-02-22 03:46:12, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12.923539)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          fixed typos

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12.923539) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- fixed typos Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5254
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11473>

          This comment should also be placed at the beginning of compressFile().

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11472>

          Typo: should be output.getFileSystem(outconf)

          • Ted

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5254 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11473 > This comment should also be placed at the beginning of compressFile(). src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11472 > Typo: should be output.getFileSystem(outconf) Ted On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          @Kannan - heres the quick overview on 4608:

          When writing the HLog, it checks a set of dictionaries for the key, cf, qualifier, tablename, and regionname. If these items happen to be in the dictionary, it writes the index, instead of the item. If the item is not in the dictionary, it is added to the dictionary.

          When reading from the HLog, it works in the opposite manner. When it encounters an uncompressed item, it adds it to the dictionary. If it encounters an index, it just fetches what it needs from the dictionary.

          The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes per index. (shorts). There is a seperate dictionary for every different field (e.g. one for tablenames, one for regionnames...).

          The dictionary merely must be consistent, if given a bunch of things in a certain order, it should always assign them the same indices, and always evict in the exact same fashion.

          This seems to work fairly well - and noticeably cuts down our write sizes on the vast majority of workloads.

          Show
          Li Pi added a comment - @Kannan - heres the quick overview on 4608: When writing the HLog, it checks a set of dictionaries for the key, cf, qualifier, tablename, and regionname. If these items happen to be in the dictionary, it writes the index, instead of the item. If the item is not in the dictionary, it is added to the dictionary. When reading from the HLog, it works in the opposite manner. When it encounters an uncompressed item, it adds it to the dictionary. If it encounters an index, it just fetches what it needs from the dictionary. The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes per index. (shorts). There is a seperate dictionary for every different field (e.g. one for tablenames, one for regionnames...). The dictionary merely must be consistent, if given a bunch of things in a certain order, it should always assign them the same indices, and always evict in the exact same fashion. This seems to work fairly well - and noticeably cuts down our write sizes on the vast majority of workloads.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96>

          >

          > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.

          Li Pi wrote:

          This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary.

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96 > > > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here. Li Pi wrote: This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary. fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20.464648)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          addresses changes by reviewers above.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20.464648) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- addresses changes by reviewers above. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>

          >

          > this function requires that the whole log data fit in RAM - not a great assumption

          Li Pi wrote:

          old one. will do eventually...

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100 > > > this function requires that the whole log data fit in RAM - not a great assumption Li Pi wrote: old one. will do eventually... fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 226

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line226>

          >

          > NOT_IN_DICTIONARY should be used here.

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4585
          -----------------------------------------------------------

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 226 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line226 > > > NOT_IN_DICTIONARY should be used here. fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4585 ----------------------------------------------------------- On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 74

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line74>

          >

          > I think the better way of expressing this usage would be:

          >

          > WALCompressor [-u | -c] <input> <output>

          >

          > -u - uncompresses the input log

          > -c - compresses the output log

          >

          > Exactly one of -u or -c must be specified

          >

          >

          fixed

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4853
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-07 02:58:00, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 74 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line74 > > > I think the better way of expressing this usage would be: > > WALCompressor [-u | -c] <input> <output> > > -u - uncompresses the input log > -c - compresses the output log > > Exactly one of -u or -c must be specified > > fixed Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think.

          >

          > I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.

          checked it out. looks like in YCSB workloads the 0x00 bytes are actually indexes pointing to the 0th entry of the dictionary.

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 52

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52>

          >

          > invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments

          fixed.

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 86-88

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86>

          >

          > this code doesn't work properly. Here's what you want to do:

          >

          > Configuration conf = new Configuration();

          > FileSystem fs = path.getFileSystem(conf);

          >

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4853
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-07 02:58:00, Todd Lipcon wrote: > I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think. > > I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed. checked it out. looks like in YCSB workloads the 0x00 bytes are actually indexes pointing to the 0th entry of the dictionary. On 2012-02-07 02:58:00, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 52 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52 > > > invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments fixed. On 2012-02-07 02:58:00, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 86-88 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86 > > > this code doesn't work properly. Here's what you want to do: > > Configuration conf = new Configuration(); > FileSystem fs = path.getFileSystem(conf); > fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-15 05:23:04, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>

          >

          > FileSystem has the following methods:

          >

          > /** Returns the configured filesystem implementation.*/

          > public static FileSystem get(Configuration conf) throws IOException {

          >

          > public static FileSystem get(URI uri, Configuration conf) throws IOException {

          >

          > I think the second get() should allow you to read HLog on hdfs

          Todd Lipcon wrote:

          see my earlier comment on this review: path.getFilesystem(conf) is what you want to use

          fixed. hopefully this should work.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5113
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-15 05:23:04, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112 > > > FileSystem has the following methods: > > /** Returns the configured filesystem implementation.*/ > public static FileSystem get(Configuration conf) throws IOException { > > public static FileSystem get(URI uri, Configuration conf) throws IOException { > > I think the second get() should allow you to read HLog on hdfs Todd Lipcon wrote: see my earlier comment on this review: path.getFilesystem(conf) is what you want to use fixed. hopefully this should work. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5113 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 01:33:09, Liyin Tang wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 230

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line230>

          >

          > Should the data be added back to the dict in this case?

          > dict.addEntry(data) ?

          This is taken care of during findentry.

          On 2012-02-14 01:33:09, Liyin Tang wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 192

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line192>

          >

          > WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function.

          This is part of the way HBase stores data uncompressed. It doesn't use an vInt.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5066
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 01:33:09, Liyin Tang wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 230 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line230 > > > Should the data be added back to the dict in this case? > dict.addEntry(data) ? This is taken care of during findentry. On 2012-02-14 01:33:09, Liyin Tang wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 192 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line192 > > > WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function. This is part of the way HBase stores data uncompressed. It doesn't use an vInt. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5066 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 02:29:24, Liyin Tang wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 42

          > <https://reviews.apache.org/r/2740/diff/16/?file=70705#file70705line42>

          >

          > Look like there are side effect to call findEntry() since you will put the data into the dictionary.

          >

          This is intentional. When we look for an entry, that means we intend to compress with it. If we don't find it, then its inserted into the dictionary.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5068
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 02:29:24, Liyin Tang wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 42 > < https://reviews.apache.org/r/2740/diff/16/?file=70705#file70705line42 > > > Look like there are side effect to call findEntry() since you will put the data into the dictionary. > This is intentional. When we look for an entry, that means we intend to compress with it. If we don't find it, then its inserted into the dictionary. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5068 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 37

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line37>

          >

          > I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue.

          fixed. legacy name. <3 eclipse.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 207

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line207>

          >

          > *un*compressed value, right?

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 28

          > <https://reviews.apache.org/r/2740/diff/16/?file=70701#file70701line28>

          >

          > Since this is so simple, I'd move it to be a static inner class of KVCompression above

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 152

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line152>

          >

          > why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 174

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line174>

          >

          > switch order of "in" and "offset" here.

          >

          > Perhaps clearer to name this as "uncompressIntoArray"?

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 44

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line44>

          >

          > I think we can merge this with the other class that just has static methods as well.

          Compressor contains static methods for general purpose compression. KeyValueCompression.java contains static methods for compressing the KeyValue type. Should I merge them?

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 185

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line185>

          >

          > worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary

          done

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96>

          >

          > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.

          This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>

          >

          > this function requires that the whole log data fit in RAM - not a great assumption

          old one. will do eventually...

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 37 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line37 > > > I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue. fixed. legacy name. <3 eclipse. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 207 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line207 > > > *un*compressed value, right? fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 28 > < https://reviews.apache.org/r/2740/diff/16/?file=70701#file70701line28 > > > Since this is so simple, I'd move it to be a static inner class of KVCompression above fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 152 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line152 > > > why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 174 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line174 > > > switch order of "in" and "offset" here. > > Perhaps clearer to name this as "uncompressIntoArray"? fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 44 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line44 > > > I think we can merge this with the other class that just has static methods as well. Compressor contains static methods for general purpose compression. KeyValueCompression.java contains static methods for compressing the KeyValue type. Should I merge them? On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 185 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line185 > > > worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary done On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96 > > > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here. This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100 > > > this function requires that the whole log data fit in RAM - not a great assumption old one. will do eventually... Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          Doing so right now. Will be done before weekend.

          Show
          Li Pi added a comment - Doing so right now. Will be done before weekend.
          Hide
          Ted Yu added a comment -

          @Li:
          Do you have time to address Todd and Liying's comments ?

          Thanks

          Show
          Ted Yu added a comment - @Li: Do you have time to address Todd and Liying's comments ? Thanks
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-15 05:23:04, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>

          >

          > FileSystem has the following methods:

          >

          > /** Returns the configured filesystem implementation.*/

          > public static FileSystem get(Configuration conf) throws IOException {

          >

          > public static FileSystem get(URI uri, Configuration conf) throws IOException {

          >

          > I think the second get() should allow you to read HLog on hdfs

          see my earlier comment on this review: path.getFilesystem(conf) is what you want to use

          • Todd

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5113
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-15 05:23:04, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112 > > > FileSystem has the following methods: > > /** Returns the configured filesystem implementation.*/ > public static FileSystem get(Configuration conf) throws IOException { > > public static FileSystem get(URI uri, Configuration conf) throws IOException { > > I think the second get() should allow you to read HLog on hdfs see my earlier comment on this review: path.getFilesystem(conf) is what you want to use Todd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5113 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5113
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11203>

          FileSystem has the following methods:

          /** Returns the configured filesystem implementation.*/
          public static FileSystem get(Configuration conf) throws IOException {

          public static FileSystem get(URI uri, Configuration conf) throws IOException {

          I think the second get() should allow you to read HLog on hdfs

          • Ted

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5113 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11203 > FileSystem has the following methods: /** Returns the configured filesystem implementation.*/ public static FileSystem get(Configuration conf) throws IOException { public static FileSystem get(URI uri, Configuration conf) throws IOException { I think the second get() should allow you to read HLog on hdfs Ted On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45.411924)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          fixed as per ted yu's review

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45.411924) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- fixed as per ted yu's review Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>

          >

          > I would expect different implementations to be instantiated based on the prefix of path.

          I figured people would only use this on their local machine. I guess the path can actually point to HDFS. Got any examples of how to do this easily?

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 116

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line116>

          >

          > Why do we instantiate Configuration again (there is already one @ line 113) ?

          Hmm. Good point. Waste of heap, but I wasn't really optimizing the command line tool. Fixed!

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 71

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line71>

          >

          > Should we verify that length is larger than pos ?

          I don't think it makes a difference.

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 169

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line169>

          >

          > Typo, should read 'to start reading from'.

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4585
          -----------------------------------------------------------

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112 > > > I would expect different implementations to be instantiated based on the prefix of path. I figured people would only use this on their local machine. I guess the path can actually point to HDFS. Got any examples of how to do this easily? On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 116 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line116 > > > Why do we instantiate Configuration again (there is already one @ line 113) ? Hmm. Good point. Waste of heap, but I wasn't really optimizing the command line tool. Fixed! On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 71 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line71 > > > Should we verify that length is larger than pos ? I don't think it makes a difference. On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 169 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line169 > > > Typo, should read 'to start reading from'. fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4585 ----------------------------------------------------------- On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5068
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11123>

          Look like there are side effect to call findEntry() since you will put the data into the dictionary.

          • Liyin

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5068 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11123 > Look like there are side effect to call findEntry() since you will put the data into the dictionary. Liyin On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5066
          -----------------------------------------------------------

          Nice patch and good job ! I have two questions inline and maybe I just misunderstood the code.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11122>

          WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11120>

          Should the data be added back to the dict in this case?
          dict.addEntry(data) ?

          • Liyin

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5066 ----------------------------------------------------------- Nice patch and good job ! I have two questions inline and maybe I just misunderstood the code. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11122 > WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11120 > Should the data be added back to the dict in this case? dict.addEntry(data) ? Liyin On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          The compression uses 2 byte dictionary indices, so the first 255 entries should start off with 0x00. This might be causing it.

          @Karthik, I'll try to get documentation out when I'm less busy. This quarter is pretty painful so far.

          Show
          Li Pi added a comment - The compression uses 2 byte dictionary indices, so the first 255 entries should start off with 0x00. This might be causing it. @Karthik, I'll try to get documentation out when I'm less busy. This quarter is pretty painful so far.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4853
          -----------------------------------------------------------

          I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think.

          I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10650>

          invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10651>

          I think the better way of expressing this usage would be:

          WALCompressor [-u | -c] <input> <output>

          -u - uncompresses the input log
          -c - compresses the output log

          Exactly one of -u or -c must be specified

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10649>

          this code doesn't work properly. Here's what you want to do:

          Configuration conf = new Configuration();
          FileSystem fs = path.getFileSystem(conf);

          • Todd

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 ----------------------------------------------------------- I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think. I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10650 > invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10651 > I think the better way of expressing this usage would be: WALCompressor [-u | -c] <input> <output> -u - uncompresses the input log -c - compresses the output log Exactly one of -u or -c must be specified src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10649 > this code doesn't work properly. Here's what you want to do: Configuration conf = new Configuration(); FileSystem fs = path.getFileSystem(conf); Todd On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4852
          -----------------------------------------------------------

          I tried to use the command line tool to compress an HLog written by 0.92 and got the follwoing:

          Exception in thread "main" java.lang.NullPointerException
          at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192)
          at org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104)
          at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64)

          Also, if you use the command line tool with no arguments, it should print its help (right now it prints an IndexOutOfBOundsException).

          I'll try again with an hlog written by trunk - I'm guessing the hlog serialization version might have changed or something.

          • Todd

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4852 ----------------------------------------------------------- I tried to use the command line tool to compress an HLog written by 0.92 and got the follwoing: Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192) at org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104) at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64) Also, if you use the command line tool with no arguments, it should print its help (right now it prints an IndexOutOfBOundsException). I'll try again with an hlog written by trunk - I'm guessing the hlog serialization version might have changed or something. Todd On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Kannan Muthukkaruppan added a comment -

          Li: Is there a writeup/description of the scheme that this patch is implementing? If not, would you mind giving a quick overview. Thanks much.

          Show
          Kannan Muthukkaruppan added a comment - Li: Is there a writeup/description of the scheme that this patch is implementing? If not, would you mind giving a quick overview. Thanks much.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:50:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>

          >

          > If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29, we should be able to tell that the queue is full.

          > This implies that readFile() would be called multiple times for a single file.

          That's beside the point. Using a queue here is just silly. reading a file should probably be a different interface altogether rather than writing to a queue – ie it should be a pull interface, not a push.

          I also mentioned to Li offline that it would make sense to add a metadata header to the HLog sequencefiles which indicates that they're compressed. In that case, this code could just use the existing log reader code and log writer code, but vary the output between compressed/uncompressed using the configuration flag.

          • Todd

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4736
          -----------------------------------------------------------

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:50:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100 > > > If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29 , we should be able to tell that the queue is full. > This implies that readFile() would be called multiple times for a single file. That's beside the point. Using a queue here is just silly. reading a file should probably be a different interface altogether rather than writing to a queue – ie it should be a pull interface, not a push. I also mentioned to Li offline that it would make sense to add a metadata header to the HLog sequencefiles which indicates that they're compressed. In that case, this code could just use the existing log reader code and log writer code, but vary the output between compressed/uncompressed using the configuration flag. Todd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4736 ----------------------------------------------------------- On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4736
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10469>

          If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29, we should be able to tell that the queue is full.
          This implies that readFile() would be called multiple times for a single file.

          • Ted

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4736 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10469 > If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29 , we should be able to tell that the queue is full. This implies that readFile() would be called multiple times for a single file. Ted On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          Only got about halfway through. Will continue to look soon. Overall looking pretty good!

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10459>

          I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10460>

          rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          <https://reviews.apache.org/r/2740/#comment10461>

          Since this is so simple, I'd move it to be a static inner class of KVCompression above

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10462>

          I think we can merge this with the other class that just has static methods as well.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10463>

          this function requires that the whole log data fit in RAM - not a great assumption

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10464>

          why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10465>

          switch order of "in" and "offset" here.

          Perhaps clearer to name this as "uncompressIntoArray"?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10467>

          worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10466>

          *un*compressed value, right?

          • Todd

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- Only got about halfway through. Will continue to look soon. Overall looking pretty good! src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10459 > I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10460 > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java < https://reviews.apache.org/r/2740/#comment10461 > Since this is so simple, I'd move it to be a static inner class of KVCompression above src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10462 > I think we can merge this with the other class that just has static methods as well. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10463 > this function requires that the whole log data fit in RAM - not a great assumption src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10464 > why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10465 > switch order of "in" and "offset" here. Perhaps clearer to name this as "uncompressIntoArray"? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10467 > worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10466 > *un*compressed value, right? Todd On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4585
          -----------------------------------------------------------

          Nice work.
          Will try out the Compressor tool.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10215>

          Should we verify that length is larger than pos ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10216>

          I would expect different implementations to be instantiated based on the prefix of path.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10217>

          Why do we instantiate Configuration again (there is already one @ line 113) ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10218>

          Typo, should read 'to start reading from'.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10219>

          NOT_IN_DICTIONARY should be used here.

          • Ted

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4585 ----------------------------------------------------------- Nice work. Will try out the Compressor tool. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10215 > Should we verify that length is larger than pos ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10216 > I would expect different implementations to be instantiated based on the prefix of path. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10217 > Why do we instantiate Configuration again (there is already one @ line 113) ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10218 > Typo, should read 'to start reading from'. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10219 > NOT_IN_DICTIONARY should be used here. Ted On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18.791094)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f