Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Adds a custom dictionary-based compression on WAL. Off by default. To enable, set hbase.regionserver.wal.enablecompression to true in hbase-site.xml.
      Note that replication is currently broken when WAL compression is enabled.
      Show
      Adds a custom dictionary-based compression on WAL. Off by default. To enable, set hbase.regionserver.wal.enablecompression to true in hbase-site.xml. Note that replication is currently broken when WAL compression is enabled.

      Description

      The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog.

      1. 4608v1.txt
        11 kB
        Li Pi
      2. 4608v13.txt
        39 kB
        Li Pi
      3. 4608v13.txt
        39 kB
        Li Pi
      4. 4608v14.txt
        40 kB
        Li Pi
      5. 4608v15.txt
        39 kB
        Ted Yu
      6. 4608v16.txt
        39 kB
        Ted Yu
      7. 4608v17.txt
        39 kB
        Ted Yu
      8. 4608v18.txt
        39 kB
        Ted Yu
      9. 4608-v19.txt
        41 kB
        Ted Yu
      10. 4608-v20.txt
        42 kB
        Ted Yu
      11. 4608-v22.txt
        42 kB
        Ted Yu
      12. 4608v23.txt
        51 kB
        stack
      13. 4608v24.txt
        52 kB
        stack
      14. 4608v25.txt
        52 kB
        stack
      15. 4608v27.txt
        52 kB
        stack
      16. 4608v29.txt
        56 kB
        stack
      17. 4608v30.txt
        57 kB
        stack
      18. 4608v5.txt
        33 kB
        Li Pi
      19. 4608v6.txt
        32 kB
        Li Pi
      20. 4608v7.txt
        32 kB
        Li Pi
      21. 4608v8fixed.txt
        37 kB
        Li Pi
      22. hbase-4608-v28.txt
        56 kB
        stack
      23. hbase-4608-v28.txt
        56 kB
        Todd Lipcon
      24. hbase-4608-v28-delta.txt
        25 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes.

          A different and complimentary avenue of attack for this issue is HDFS-1783.

          Show
          Andrew Purtell added a comment - The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. A different and complimentary avenue of attack for this issue is HDFS-1783 .
          Hide
          stack added a comment -

          Just to be clear, when we talk of compression, we are not talking about gzip or the like? Such compressors compress in chunks – e.g. 32k – with dictionary as preface. If a machine crashes before it flushes the current chunk, you may lose up to the last 32k of edits. This is not the type of compression that is being worked on here? Thanks.

          Show
          stack added a comment - Just to be clear, when we talk of compression, we are not talking about gzip or the like? Such compressors compress in chunks – e.g. 32k – with dictionary as preface. If a machine crashes before it flushes the current chunk, you may lose up to the last 32k of edits. This is not the type of compression that is being worked on here? Thanks.
          Hide
          Jonathan Gray added a comment -

          I think the idea is a custom compression where we can do stuff like start the HLog with a dictionary of some known repetitive stuff. It's very similar to the delta encoding work.

          Show
          Jonathan Gray added a comment - I think the idea is a custom compression where we can do stuff like start the HLog with a dictionary of some known repetitive stuff. It's very similar to the delta encoding work.
          Hide
          Li Pi added a comment -

          A form of custom compression. The ability to recover the uncompressed
          HLog no matter when the machine crashes is a requirement.

          On Thu, Oct 20, 2011 at 11:50 AM, Jonathan Gray (Commented) (JIRA)

          Show
          Li Pi added a comment - A form of custom compression. The ability to recover the uncompressed HLog no matter when the machine crashes is a requirement. On Thu, Oct 20, 2011 at 11:50 AM, Jonathan Gray (Commented) (JIRA)
          Hide
          Todd Lipcon added a comment -

          One quick sketch of how this might work:

          interface CompressionDictionary {
            public byte[] getEntry(int idx);
            public int findEntry(byte[] data);
            public int addEntry(byte[] data);
          }
          

          while writing:
          start each HLog with an empty CompressionDictionary:

          void writeString(byte[] data) {
            int dictIdx = dict.findEntry(data);
            if (dictIdx == -1) {
              // not in dict
              writeByte(0x00);
              WritableUtils.writeString(data); // current implementation
            } else {
              writeInt((1 << 31) | dictIdx);
            }
          }
          

          while reading:

          byte[] readString(in) {
            in.mark();
            byte firstbyte = in.read();
            if (firstbyte & (1 << 31)) {
              in.reset();
              int dictidx = in.readInt() & ~(1 << 31);
              return dict.getEntry(dictidx);
            } else {
              assert firstbyte == 0;
              byte[] ret = WritableUtils.readString();
              dict.addEntry(ret);
            }
          }
          

          then the dictionary could be implemented as a fixed size associative hash... maybe a cuckoo hash or something exotic (they're on my mind since reading the SILT paper last week)

          Show
          Todd Lipcon added a comment - One quick sketch of how this might work: interface CompressionDictionary { public byte [] getEntry( int idx); public int findEntry( byte [] data); public int addEntry( byte [] data); } while writing: start each HLog with an empty CompressionDictionary: void writeString( byte [] data) { int dictIdx = dict.findEntry(data); if (dictIdx == -1) { // not in dict writeByte(0x00); WritableUtils.writeString(data); // current implementation } else { writeInt((1 << 31) | dictIdx); } } while reading: byte [] readString(in) { in.mark(); byte firstbyte = in.read(); if (firstbyte & (1 << 31)) { in.reset(); int dictidx = in.readInt() & ~(1 << 31); return dict.getEntry(dictidx); } else { assert firstbyte == 0; byte [] ret = WritableUtils.readString(); dict.addEntry(ret); } } then the dictionary could be implemented as a fixed size associative hash... maybe a cuckoo hash or something exotic (they're on my mind since reading the SILT paper last week)
          Hide
          Todd Lipcon added a comment -

          oops, on the write side you'd also add it to the dict after writing the literal.

          Show
          Todd Lipcon added a comment - oops, on the write side you'd also add it to the dict after writing the literal.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-11-07 10:27:33.538403)

          Review request for Eli Collins and Todd Lipcon.

          Summary (updated)
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 10:27:33.538403) Review request for Eli Collins and Todd Lipcon. Summary (updated) ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          Review request for Eli Collins and Todd Lipcon.

          Summary
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- Review request for Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-11-07 10:28:10.774971)

          Review request for Eli Collins and Todd Lipcon.

          Summary (updated)
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 10:28:10.774971) Review request for Eli Collins and Todd Lipcon. Summary (updated) ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          Show
          Li Pi added a comment - Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.
          Hide
          Todd Lipcon added a comment -

          I haven't looked at the patch yet, but it would be great if you could build a tool to go along with this for testing that compresses/decompresses logs. EG:

          bin/hbase org.apache.hadoop.hbase.....HLogTool -compress /path/to/hlog /path/to/hlog.compressed
          bin/hbase org.apache.hadoop.hbase.....HLogTool -uncompress /path/to/hlog.compressed /path/to/hlog
          .. or something like that.

          Then real users could see what kind of compression ratio they could expect (and it serves as a decent test that compress/uncompress yields the original file)

          Show
          Todd Lipcon added a comment - I haven't looked at the patch yet, but it would be great if you could build a tool to go along with this for testing that compresses/decompresses logs. EG: bin/hbase org.apache.hadoop.hbase.....HLogTool -compress /path/to/hlog /path/to/hlog.compressed bin/hbase org.apache.hadoop.hbase.....HLogTool -uncompress /path/to/hlog.compressed /path/to/hlog .. or something like that. Then real users could see what kind of compression ratio they could expect (and it serves as a decent test that compress/uncompress yields the original file)
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37.111204)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37.111204) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3093
          -----------------------------------------------------------

          Cool stuff.

          I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.

          Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.
          On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?

          (As I said, I am probably missing something).

          See minor comments inline.

          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <https://reviews.apache.org/r/2740/#comment6899>

          This is functionally the same as before, but less readable. I don't think this leads to much performance improvement.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment6900>

          I think we leave out the line with the year now.
          Lot's of leading whitespace and weird indentation in this file.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment6901>

          passing 0 here? I might be missing something, but looking down at readCompressed that looks wrong.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          <https://reviews.apache.org/r/2740/#comment6902>

          Could we have a no-op compressor instead?

          • Lars

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3093 ----------------------------------------------------------- Cool stuff. I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again. Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index. On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right? (As I said, I am probably missing something). See minor comments inline. src/main/java/org/apache/hadoop/hbase/KeyValue.java < https://reviews.apache.org/r/2740/#comment6899 > This is functionally the same as before, but less readable. I don't think this leads to much performance improvement. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment6900 > I think we leave out the line with the year now. Lot's of leading whitespace and weird indentation in this file. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment6901 > passing 0 here? I might be missing something, but looking down at readCompressed that looks wrong. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java < https://reviews.apache.org/r/2740/#comment6902 > Could we have a no-op compressor instead? Lars On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > Cool stuff.

          >

          > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.

          >

          > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.

          > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?

          >

          > (As I said, I am probably missing something).

          >

          > See minor comments inline.

          You aren't missing anything! Thats exactly how it works.

          Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read.

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1088

          > <https://reviews.apache.org/r/2740/diff/1/?file=56620#file56620line1088>

          >

          > This is functionally the same as before, but less readable. I don't think this leads to much performance improvement.

          good point, i can get rid of this.

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 2

          > <https://reviews.apache.org/r/2740/diff/1/?file=56621#file56621line2>

          >

          > I think we leave out the line with the year now.

          > Lot's of leading whitespace and weird indentation in this file.

          I need to fix my eclipse autoformatter. Will take care of this and formatting bugs.

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 62

          > <https://reviews.apache.org/r/2740/diff/1/?file=56621#file56621line62>

          >

          > passing 0 here? I might be missing something, but looking down at readCompressed that looks wrong.

          We pass a 0, because we don't encode the length of the qualifier. I don't know why we don't but thats how KeyValue does it.

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157

          > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157>

          >

          > Could we have a no-op compressor instead?

          no-op compressor? as in one that does nothing?

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3093
          -----------------------------------------------------------

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-11-07 23:39:59, Lars Hofhansl wrote: > Cool stuff. > > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again. > > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index. > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right? > > (As I said, I am probably missing something). > > See minor comments inline. You aren't missing anything! Thats exactly how it works. Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read. On 2011-11-07 23:39:59, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1088 > < https://reviews.apache.org/r/2740/diff/1/?file=56620#file56620line1088 > > > This is functionally the same as before, but less readable. I don't think this leads to much performance improvement. good point, i can get rid of this. On 2011-11-07 23:39:59, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 2 > < https://reviews.apache.org/r/2740/diff/1/?file=56621#file56621line2 > > > I think we leave out the line with the year now. > Lot's of leading whitespace and weird indentation in this file. I need to fix my eclipse autoformatter. Will take care of this and formatting bugs. On 2011-11-07 23:39:59, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 62 > < https://reviews.apache.org/r/2740/diff/1/?file=56621#file56621line62 > > > passing 0 here? I might be missing something, but looking down at readCompressed that looks wrong. We pass a 0, because we don't encode the length of the qualifier. I don't know why we don't but thats how KeyValue does it. On 2011-11-07 23:39:59, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157 > < https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157 > > > Could we have a no-op compressor instead? no-op compressor? as in one that does nothing? Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3093 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > Cool stuff.

          >

          > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.

          >

          > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.

          > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?

          >

          > (As I said, I am probably missing something).

          >

          > See minor comments inline.

          Li Pi wrote:

          You aren't missing anything! Thats exactly how it works.

          Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read.

          Ok... What I cannot find then, is the code that builds the dictionary during read

          Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today.

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157

          > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157>

          >

          > Could we have a no-op compressor instead?

          Li Pi wrote:

          no-op compressor? as in one that does nothing?

          Yep... So compression will never be null, and we can safe if-statements (and make the code more readable)

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3093
          -----------------------------------------------------------

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-11-07 23:39:59, Lars Hofhansl wrote: > Cool stuff. > > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again. > > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index. > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right? > > (As I said, I am probably missing something). > > See minor comments inline. Li Pi wrote: You aren't missing anything! Thats exactly how it works. Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read. Ok... What I cannot find then, is the code that builds the dictionary during read Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today. On 2011-11-07 23:39:59, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157 > < https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157 > > > Could we have a no-op compressor instead? Li Pi wrote: no-op compressor? as in one that does nothing? Yep... So compression will never be null, and we can safe if-statements (and make the code more readable) Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3093 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > Cool stuff.

          >

          > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.

          >

          > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.

          > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?

          >

          > (As I said, I am probably missing something).

          >

          > See minor comments inline.

          Li Pi wrote:

          You aren't missing anything! Thats exactly how it works.

          Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read.

          Lars Hofhansl wrote:

          Ok... What I cannot find then, is the code that builds the dictionary during read

          Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today.

          Oops, somehow I deleted that line. There are comments for it. Added it back in.

          //if this isn't in the dictionary, we need to add to the dictionary.

          As for the more general concern: HBase won't return a write to the client until the WALEdit write is completely done. So aborting midway won't be an issue - and even if we abort midway, we can recover everything thats been written so far.

          For the beginning of the file getting garbled? - True but we'd lose some information with or without compression. With compression we lose more information, but that's the nature of compression. Recovering a partially garbled WAL fully is impossible no matter what approach we use. Either way, its not a contingency the WAL is built to handle - a partial recovery after all WAL replica's have been corrupted.

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157

          > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157>

          >

          > Could we have a no-op compressor instead?

          Li Pi wrote:

          no-op compressor? as in one that does nothing?

          Lars Hofhansl wrote:

          Yep... So compression will never be null, and we can safe if-statements (and make the code more readable)

          Sure. I should probably define a compressor interface as well.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3093
          -----------------------------------------------------------

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-11-07 23:39:59, Lars Hofhansl wrote: > Cool stuff. > > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again. > > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index. > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right? > > (As I said, I am probably missing something). > > See minor comments inline. Li Pi wrote: You aren't missing anything! Thats exactly how it works. Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read. Lars Hofhansl wrote: Ok... What I cannot find then, is the code that builds the dictionary during read Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today. Oops, somehow I deleted that line. There are comments for it. Added it back in. //if this isn't in the dictionary, we need to add to the dictionary. As for the more general concern: HBase won't return a write to the client until the WALEdit write is completely done. So aborting midway won't be an issue - and even if we abort midway, we can recover everything thats been written so far. For the beginning of the file getting garbled? - True but we'd lose some information with or without compression. With compression we lose more information, but that's the nature of compression. Recovering a partially garbled WAL fully is impossible no matter what approach we use. Either way, its not a contingency the WAL is built to handle - a partial recovery after all WAL replica's have been corrupted. On 2011-11-07 23:39:59, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 157 > < https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line157 > > > Could we have a no-op compressor instead? Li Pi wrote: no-op compressor? as in one that does nothing? Lars Hofhansl wrote: Yep... So compression will never be null, and we can safe if-statements (and make the code more readable) Sure. I should probably define a compressor interface as well. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3093 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > Cool stuff.

          >

          > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.

          >

          > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.

          > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?

          >

          > (As I said, I am probably missing something).

          >

          > See minor comments inline.

          Li Pi wrote:

          You aren't missing anything! Thats exactly how it works.

          Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read.

          Lars Hofhansl wrote:

          Ok... What I cannot find then, is the code that builds the dictionary during read

          Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today.

          Li Pi wrote:

          Oops, somehow I deleted that line. There are comments for it. Added it back in.

          //if this isn't in the dictionary, we need to add to the dictionary.

          As for the more general concern: HBase won't return a write to the client until the WALEdit write is completely done. So aborting midway won't be an issue - and even if we abort midway, we can recover everything thats been written so far.

          For the beginning of the file getting garbled? - True but we'd lose some information with or without compression. With compression we lose more information, but that's the nature of compression. Recovering a partially garbled WAL fully is impossible no matter what approach we use. Either way, its not a contingency the WAL is built to handle - a partial recovery after all WAL replica's have been corrupted.

          well, in the non-compressed WAL case, we can re-sync to a SequenceFile "SYNC" marker and continue reading from there in the face of arbitrary corruption.

          Perhaps the compression mechanism should have some kind of "maximum lookback" - ie when a dictionary is being built, keep the file offset where each dictionary word was used. Then, when deciding to use a dict reference vs a literal, if the curOffset - lastUsedOffset > MAX_LOOKBACK_THRESHOLD, we re-write the entry. This would bound the size of unrecoverable WAL portions while still providing good compression (similar to what we have today)

          • Todd

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3093
          -----------------------------------------------------------

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-11-07 23:39:59, Lars Hofhansl wrote: > Cool stuff. > > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again. > > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index. > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right? > > (As I said, I am probably missing something). > > See minor comments inline. Li Pi wrote: You aren't missing anything! Thats exactly how it works. Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read. Lars Hofhansl wrote: Ok... What I cannot find then, is the code that builds the dictionary during read Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today. Li Pi wrote: Oops, somehow I deleted that line. There are comments for it. Added it back in. //if this isn't in the dictionary, we need to add to the dictionary. As for the more general concern: HBase won't return a write to the client until the WALEdit write is completely done. So aborting midway won't be an issue - and even if we abort midway, we can recover everything thats been written so far. For the beginning of the file getting garbled? - True but we'd lose some information with or without compression. With compression we lose more information, but that's the nature of compression. Recovering a partially garbled WAL fully is impossible no matter what approach we use. Either way, its not a contingency the WAL is built to handle - a partial recovery after all WAL replica's have been corrupted. well, in the non-compressed WAL case, we can re-sync to a SequenceFile "SYNC" marker and continue reading from there in the face of arbitrary corruption. Perhaps the compression mechanism should have some kind of "maximum lookback" - ie when a dictionary is being built, keep the file offset where each dictionary word was used. Then, when deciding to use a dict reference vs a literal, if the curOffset - lastUsedOffset > MAX_LOOKBACK_THRESHOLD, we re-write the entry. This would bound the size of unrecoverable WAL portions while still providing good compression (similar to what we have today) Todd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3093 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-11-07 23:39:59, Lars Hofhansl wrote:

          > Cool stuff.

          >

          > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again.

          >

          > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index.

          > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right?

          >

          > (As I said, I am probably missing something).

          >

          > See minor comments inline.

          Li Pi wrote:

          You aren't missing anything! Thats exactly how it works.

          Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read.

          Lars Hofhansl wrote:

          Ok... What I cannot find then, is the code that builds the dictionary during read

          Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today.

          Li Pi wrote:

          Oops, somehow I deleted that line. There are comments for it. Added it back in.

          //if this isn't in the dictionary, we need to add to the dictionary.

          As for the more general concern: HBase won't return a write to the client until the WALEdit write is completely done. So aborting midway won't be an issue - and even if we abort midway, we can recover everything thats been written so far.

          For the beginning of the file getting garbled? - True but we'd lose some information with or without compression. With compression we lose more information, but that's the nature of compression. Recovering a partially garbled WAL fully is impossible no matter what approach we use. Either way, its not a contingency the WAL is built to handle - a partial recovery after all WAL replica's have been corrupted.

          Todd Lipcon wrote:

          well, in the non-compressed WAL case, we can re-sync to a SequenceFile "SYNC" marker and continue reading from there in the face of arbitrary corruption.

          Perhaps the compression mechanism should have some kind of "maximum lookback" - ie when a dictionary is being built, keep the file offset where each dictionary word was used. Then, when deciding to use a dict reference vs a literal, if the curOffset - lastUsedOffset > MAX_LOOKBACK_THRESHOLD, we re-write the entry. This would bound the size of unrecoverable WAL portions while still providing good compression (similar to what we have today)

          That makes sense. Maybe file a separate jira and use this one to get the compression in?

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3093
          -----------------------------------------------------------

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-11-07 23:39:59, Lars Hofhansl wrote: > Cool stuff. > > I am probably just missing something... But when is the dictionary itself stored? Don't we need to read out the logs again. > > Just so I understand: We build up the dictionary as we go along. In the beginning most things won't be in the dictionary, we write them out and add them to the dict, and from that time on when we encounter them again we just write the index. > On the read we could also build up the dict as we go along, because when values weren't in the dictionary they where written into the file, so we can recreate the dictionary as we read. Right? > > (As I said, I am probably missing something). > > See minor comments inline. Li Pi wrote: You aren't missing anything! Thats exactly how it works. Each WAL starts off with a brand new shiny dictionary. We build up the dictionary as we write, and when we read, we start off with a shiny new dictionary again. The dictionary is recreated upon read. Lars Hofhansl wrote: Ok... What I cannot find then, is the code that builds the dictionary during read Also as a general concern... We write these WAL logs (in part) for redundancy. Compression is the opposite of redundancy... So say, we garble the beginning of a WAL file, then the entire file will be useless to us... I don't think that is a big deal, though. As the WAL entries are variable length this is mostly true even today. Li Pi wrote: Oops, somehow I deleted that line. There are comments for it. Added it back in. //if this isn't in the dictionary, we need to add to the dictionary. As for the more general concern: HBase won't return a write to the client until the WALEdit write is completely done. So aborting midway won't be an issue - and even if we abort midway, we can recover everything thats been written so far. For the beginning of the file getting garbled? - True but we'd lose some information with or without compression. With compression we lose more information, but that's the nature of compression. Recovering a partially garbled WAL fully is impossible no matter what approach we use. Either way, its not a contingency the WAL is built to handle - a partial recovery after all WAL replica's have been corrupted. Todd Lipcon wrote: well, in the non-compressed WAL case, we can re-sync to a SequenceFile "SYNC" marker and continue reading from there in the face of arbitrary corruption. Perhaps the compression mechanism should have some kind of "maximum lookback" - ie when a dictionary is being built, keep the file offset where each dictionary word was used. Then, when deciding to use a dict reference vs a literal, if the curOffset - lastUsedOffset > MAX_LOOKBACK_THRESHOLD, we re-write the entry. This would bound the size of unrecoverable WAL portions while still providing good compression (similar to what we have today) That makes sense. Maybe file a separate jira and use this one to get the compression in? Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3093 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3134
          -----------------------------------------------------------

          overall, interesting idea. especially for the counter workload case.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment6948>

          Since dictionaries would probably get a little more complex (threshold size, different types of dictionaries). You would also need to write code to persist dictionary state at the beginning of the HLog.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment6946>

          you're planning to write the code to threshold size, correct? This should probably be user-configurable. Setting it to 0 to disable compression.

          • Nicolas

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3134 ----------------------------------------------------------- overall, interesting idea. especially for the counter workload case. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment6948 > Since dictionaries would probably get a little more complex (threshold size, different types of dictionaries). You would also need to write code to persist dictionary state at the beginning of the HLog. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment6946 > you're planning to write the code to threshold size, correct? This should probably be user-configurable. Setting it to 0 to disable compression. Nicolas On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3136
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          <https://reviews.apache.org/r/2740/#comment6949>

          should compression be added to the HLogKey as well to compress regionName & table? It seems like the biggest wins will come from table + region + family, which all have user-bounded values. It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary

          • Nicolas

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3136 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java < https://reviews.apache.org/r/2740/#comment6949 > should compression be added to the HLogKey as well to compress regionName & table? It seems like the biggest wins will come from table + region + family, which all have user-bounded values. It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary Nicolas On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3142
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment6953>

          The entry returned maybe null, right ?

          • Ted

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3142 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment6953 > The entry returned maybe null, right ? Ted On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-11-09 20:00:52, Nicolas Spiegelberg wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, lines 122-124

          > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line122>

          >

          > should compression be added to the HLogKey as well to compress regionName & table? It seems like the biggest wins will come from table + region + family, which all have user-bounded values. It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary

          Yes. I'm adding compression for regionname, table, and family as well. For this kind of simple 1 way associative dictionary, its likely that those two factors will end up dominating, but other more complex dictionaries can be used, perhaps with more interesting eviction strategies.

          I do agree using multiple dictionaries is a simple strategy.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3136
          -----------------------------------------------------------

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-11-09 20:00:52, Nicolas Spiegelberg wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, lines 122-124 > < https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line122 > > > should compression be added to the HLogKey as well to compress regionName & table? It seems like the biggest wins will come from table + region + family, which all have user-bounded values. It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary Yes. I'm adding compression for regionname, table, and family as well. For this kind of simple 1 way associative dictionary, its likely that those two factors will end up dominating, but other more complex dictionaries can be used, perhaps with more interesting eviction strategies. I do agree using multiple dictionaries is a simple strategy. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3136 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Kannan Muthukkaruppan added a comment -

          Li wrote: <<< The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. >>>

          Compression potentially adds some time, but then, yes, you save somewhere else in amount of stuff DFS has to do. I am curious what kind of improvement are you seeing with your changes. Without "sync" (deferred log flushing) the win might be even more. Perhaps, could you share some numbers with and without "sync".

          Show
          Kannan Muthukkaruppan added a comment - Li wrote: <<< The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. >>> Compression potentially adds some time, but then, yes, you save somewhere else in amount of stuff DFS has to do. I am curious what kind of improvement are you seeing with your changes. Without "sync" (deferred log flushing) the win might be even more. Perhaps, could you share some numbers with and without "sync".
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-11-09 20:00:52, Nicolas Spiegelberg wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, lines 122-124

          > <https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line122>

          >

          > should compression be added to the HLogKey as well to compress regionName & table? It seems like the biggest wins will come from table + region + family, which all have user-bounded values. It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary

          Li Pi wrote:

          Yes. I'm adding compression for regionname, table, and family as well. For this kind of simple 1 way associative dictionary, its likely that those two factors will end up dominating, but other more complex dictionaries can be used, perhaps with more interesting eviction strategies.

          I do agree using multiple dictionaries is a simple strategy.

          FYI I already compress CF name along with column qualifier. Are regionname and table stored as part of the row key?

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review3136
          -----------------------------------------------------------

          On 2011-11-07 23:12:37, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-11-07 23:12:37)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-11-09 20:00:52, Nicolas Spiegelberg wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, lines 122-124 > < https://reviews.apache.org/r/2740/diff/1/?file=56624#file56624line122 > > > should compression be added to the HLogKey as well to compress regionName & table? It seems like the biggest wins will come from table + region + family, which all have user-bounded values. It might even make sense to have these values in a different dictionary from row & column qualifier, which can be unbounded and might accidentally dominate the dictionary Li Pi wrote: Yes. I'm adding compression for regionname, table, and family as well. For this kind of simple 1 way associative dictionary, its likely that those two factors will end up dominating, but other more complex dictionaries can be used, perhaps with more interesting eviction strategies. I do agree using multiple dictionaries is a simple strategy. FYI I already compress CF name along with column qualifier. Are regionname and table stored as part of the row key? Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review3136 ----------------------------------------------------------- On 2011-11-07 23:12:37, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-11-07 23:12:37) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java e68e486 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Lars Hofhansl added a comment -

          Are you still working on this Li?
          I think this is an important featured to get into HBase, especially if we want to do log archival for backups and PIT restores.

          Show
          Lars Hofhansl added a comment - Are you still working on this Li? I think this is an important featured to get into HBase, especially if we want to do log archival for backups and PIT restores.
          Hide
          Li Pi added a comment -

          Yup. Just finished finals. So I have time again.
          On Dec 13, 2011 10:40 PM, "Lars Hofhansl (Commented) (JIRA)" <

          Show
          Li Pi added a comment - Yup. Just finished finals. So I have time again. On Dec 13, 2011 10:40 PM, "Lars Hofhansl (Commented) (JIRA)" <
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24.065183)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          Some new things, for WALCompress.

          I've modified TestWALReplay to test compression - this is a quick hack to have effective test cases. I'm building my own subset later.

          Integration is done, including config, but it doesn't all work yet. It worked before I tried compressing HLogKeys, SequenceFile seems to try to read them out of order, causing it to hit empty dictionary entries. Not sure what to do about this, any advice?

          If you only compress KeyValues/WALEdits, it works fine.

          Summary
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24.065183) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- Some new things, for WALCompress. I've modified TestWALReplay to test compression - this is a quick hack to have effective test cases. I'm building my own subset later. Integration is done, including config, but it doesn't all work yet. It worked before I tried compressing HLogKeys, SequenceFile seems to try to read them out of order, causing it to hit empty dictionary entries. Not sure what to do about this, any advice? If you only compress KeyValues/WALEdits, it works fine. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4098
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9217>

          Apache headers go here.

          • Li

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4098 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9217 > Apache headers go here. Li On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4100
          -----------------------------------------------------------

          Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey later.
          Looks like hash collisions in SimpleDictionary could be nasty.

          Other than that mostly whitespace.

          Cool stuff.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9223>

          Should remove the year line.
          Also some extra whitespace in this file.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/2740/#comment9237>

          Bunch of whitespace in here.
          As said above, maybe do HLogKey in a separate jira.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment9236>

          bunch of whitespace in here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment9234>

          whitespace

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment9235>

          I know this is not done, yet... But needs to be a fully qualified config name.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment9233>

          LOG.debug?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          <https://reviews.apache.org/r/2740/#comment9232>

          Hardcoding SimpleDictionary here?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9230>

          year...

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9229>

          What if you have a hash collision?
          You now overwrite the old value that just happens to have the same hash code. Is that OK?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9231>

          Here too; what happens for hash collisions?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
          <https://reviews.apache.org/r/2740/#comment9228>

          Year... And trailing whitespace in here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          <https://reviews.apache.org/r/2740/#comment9225>

          bunch of extra leading whitespace in this file

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          <https://reviews.apache.org/r/2740/#comment9226>

          Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing.

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
          <https://reviews.apache.org/r/2740/#comment9227>

          I assume you'll tests with/without compression.

          • Lars

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4100 ----------------------------------------------------------- Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey later. Looks like hash collisions in SimpleDictionary could be nasty. Other than that mostly whitespace. Cool stuff. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9223 > Should remove the year line. Also some extra whitespace in this file. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/2740/#comment9237 > Bunch of whitespace in here. As said above, maybe do HLogKey in a separate jira. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment9236 > bunch of whitespace in here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment9234 > whitespace src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment9235 > I know this is not done, yet... But needs to be a fully qualified config name. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment9233 > LOG.debug? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java < https://reviews.apache.org/r/2740/#comment9232 > Hardcoding SimpleDictionary here? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9230 > year... src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9229 > What if you have a hash collision? You now overwrite the old value that just happens to have the same hash code. Is that OK? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9231 > Here too; what happens for hash collisions? src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java < https://reviews.apache.org/r/2740/#comment9228 > Year... And trailing whitespace in here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java < https://reviews.apache.org/r/2740/#comment9225 > bunch of extra leading whitespace in this file src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java < https://reviews.apache.org/r/2740/#comment9226 > Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java < https://reviews.apache.org/r/2740/#comment9227 > I assume you'll tests with/without compression. Lars On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-12-23 06:34:53, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 73

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line73>

          >

          > What if you have a hash collision?

          > You now overwrite the old value that just happens to have the same hash code. Is that OK?

          I overwrite the old value. As long as we do it for both reads and writes, thats okay! (The state of the dictionary must be consistent).

          On 2011-12-23 06:34:53, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 82

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line82>

          >

          > Here too; what happens for hash collisions?

          The old value would have been evicted by the latest value.

          On 2011-12-23 06:34:53, Lars Hofhansl wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java, line 84

          > <https://reviews.apache.org/r/2740/diff/2/?file=65775#file65775line84>

          >

          > I assume you'll tests with/without compression.

          I'm gonna write better tests, this is just sort of a hackwish way to make it work.

          On 2011-12-23 06:34:53, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 130

          > <https://reviews.apache.org/r/2740/diff/2/?file=65774#file65774line130>

          >

          > Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing.

          Didn't want to create a new KeyValue, or modify it, rather - thus the CompressedKeyValue thing.

          I can refactor this.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4100
          -----------------------------------------------------------

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-12-23 06:34:53, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 73 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line73 > > > What if you have a hash collision? > You now overwrite the old value that just happens to have the same hash code. Is that OK? I overwrite the old value. As long as we do it for both reads and writes, thats okay! (The state of the dictionary must be consistent). On 2011-12-23 06:34:53, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 82 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line82 > > > Here too; what happens for hash collisions? The old value would have been evicted by the latest value. On 2011-12-23 06:34:53, Lars Hofhansl wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java, line 84 > < https://reviews.apache.org/r/2740/diff/2/?file=65775#file65775line84 > > > I assume you'll tests with/without compression. I'm gonna write better tests, this is just sort of a hackwish way to make it work. On 2011-12-23 06:34:53, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 130 > < https://reviews.apache.org/r/2740/diff/2/?file=65774#file65774line130 > > > Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing. Didn't want to create a new KeyValue, or modify it, rather - thus the CompressedKeyValue thing. I can refactor this. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4100 ----------------------------------------------------------- On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-12-23 06:34:53, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 73

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line73>

          >

          > What if you have a hash collision?

          > You now overwrite the old value that just happens to have the same hash code. Is that OK?

          Li Pi wrote:

          I overwrite the old value. As long as we do it for both reads and writes, thats okay! (The state of the dictionary must be consistent).

          I see, because read and write would do that in the same order.

          On 2011-12-23 06:34:53, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 130

          > <https://reviews.apache.org/r/2740/diff/2/?file=65774#file65774line130>

          >

          > Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing.

          Li Pi wrote:

          Didn't want to create a new KeyValue, or modify it, rather - thus the CompressedKeyValue thing.

          I can refactor this.

          That was just a general comment. I've thinking quite often how our life would be nice if KeyValue was just an interface rather than a concrete class. Fixing that would be a huge PITA... Different jira

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4100
          -----------------------------------------------------------

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-12-23 06:34:53, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 73 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line73 > > > What if you have a hash collision? > You now overwrite the old value that just happens to have the same hash code. Is that OK? Li Pi wrote: I overwrite the old value. As long as we do it for both reads and writes, thats okay! (The state of the dictionary must be consistent). I see, because read and write would do that in the same order. On 2011-12-23 06:34:53, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 130 > < https://reviews.apache.org/r/2740/diff/2/?file=65774#file65774line130 > > > Would sure be nice if we had a KeyValue interface and the implementations would just do the right thing. Li Pi wrote: Didn't want to create a new KeyValue, or modify it, rather - thus the CompressedKeyValue thing. I can refactor this. That was just a general comment. I've thinking quite often how our life would be nice if KeyValue was just an interface rather than a concrete class. Fixing that would be a huge PITA... Different jira Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4100 ----------------------------------------------------------- On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-12-23 06:34:53, Lars Hofhansl wrote:

          > Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey later.

          > Looks like hash collisions in SimpleDictionary could be nasty.

          >

          > Other than that mostly whitespace.

          >

          > Cool stuff.

          Just did another test, looks like SequenceFile doesn't actually do it out of order, theres another bug making HLogKey break.

          I'll figure it out later. probably after christmas.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4100
          -----------------------------------------------------------

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-12-23 06:34:53, Lars Hofhansl wrote: > Maybe just do this for WALEdits/KeyValues for now and tackle HLogKey later. > Looks like hash collisions in SimpleDictionary could be nasty. > > Other than that mostly whitespace. > > Cool stuff. Just did another test, looks like SequenceFile doesn't actually do it out of order, theres another bug making HLogKey break. I'll figure it out later. probably after christmas. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4100 ----------------------------------------------------------- On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          Okay. I'm confused.

          I disabled compression, went back to trunk, and changed these lines of code in HLogKey

          System.out.println("Writing region: " + this.encodedRegionName.hashCode());
          Bytes.writeByteArray(out, this.encodedRegionName);
          System.out.println("Writing table: " + this.tablename.hashCode());
          Bytes.writeByteArray(out, this.tablename);

          And

          in.readFully(this.encodedRegionName);
          System.out.println("Reading region: " + this.encodedRegionName.hashCode());
          this.tablename = Bytes.readByteArray(in);
          System.out.println("Reading table: " + this.tablename.hashCode());

          then I ran test replay after partial flush.

          Got this as output

          PositionWritten 124
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 319
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 514
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 709
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 904
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 1099
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 1294
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 1489
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 1684
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 1879
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 2074
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 2289
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 2484
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 2679
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 2874
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 3069
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 3264
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 3459
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 3654
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 3849
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 4044
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 4239
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 4454
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 4649
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 4844
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 5039
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 5234
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 5429
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 5624
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 5819
          Writing region: 1251181435
          Writing table: 446506621
          PositionWritten 124
          Writing region: 736259394
          Writing table: 510860944
          PositionWritten 319
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 514
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 709
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 904
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 1099
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 1294
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 1489
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 1684
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 1879
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 2074
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 2289
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 2484
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 2679
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 2874
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 3069
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 3264
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 3459
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 3654
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 3849
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 4044
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 4239
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 4454
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 4649
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 4844
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 5039
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 5234
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 5429
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 5624
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 5819
          Writing region: 1336786910
          Writing table: 403681456
          PositionWritten 6014
          Writing region: 1336786910
          Writing table: 403681456

          followed by

          PositionRead 124
          Reading region: 1037916733
          Reading table: 256866950
          PositionRead 319
          Reading region: 720698180
          Reading table: 966542180
          PositionRead 514
          Reading region: 1108113352
          Reading table: 1082920280
          PositionRead 709
          Reading region: 717237635
          Reading table: 787220834
          PositionRead 904
          Reading region: 173807871
          Reading table: 611127977
          PositionRead 1099
          Reading region: 1961109485
          Reading table: 788100239
          PositionRead 1294
          Reading region: 2069065824
          Reading table: 586608097
          PositionRead 1489
          Reading region: 24862902
          Reading table: 1258966396
          PositionRead 1684
          Reading region: 291843681
          Reading table: 164096819
          PositionRead 1879
          Reading region: 606234185
          Reading table: 1315525927
          PositionRead 2074
          Reading region: 1700109224
          Reading table: 1465804433
          PositionRead 2289
          Reading region: 1990190694
          Reading table: 2077192033
          PositionRead 2484
          Reading region: 1872332999
          Reading table: 1222834702
          PositionRead 2679
          Reading region: 764334724
          Reading table: 2074013561
          PositionRead 2874
          Reading region: 2138845270
          Reading table: 843685757
          PositionRead 3069
          Reading region: 2139480405
          Reading table: 780981467
          PositionRead 3264
          Reading region: 535465405
          Reading table: 1610580905
          PositionRead 3459
          Reading region: 1899900
          Reading table: 1866848242
          PositionRead 3654
          Reading region: 1382320624
          Reading table: 1184634322
          PositionRead 3849
          Reading region: 828158517
          Reading table: 1018679012
          PositionRead 4044
          Reading region: 1198520800
          Reading table: 142476740
          PositionRead 4239
          Reading region: 162302775
          Reading table: 518507735
          PositionRead 4454
          Reading region: 70862619
          Reading table: 1282097095
          PositionRead 4649
          Reading region: 354961667
          Reading table: 131165903
          PositionRead 4844
          Reading region: 1187109899
          Reading table: 1632991863
          PositionRead 5039
          Reading region: 853232781
          Reading table: 1535039248
          PositionRead 5234
          Reading region: 1683589725
          Reading table: 847975203
          PositionRead 5429
          Reading region: 1217755329
          Reading table: 1294658593
          PositionRead 5624
          Reading region: 1022661147
          Reading table: 1554270688
          PositionRead 5819
          Reading region: 636371108
          Reading table: 1020650096
          PositionRead 6014
          Reading region: 2114274883
          Reading table: 206051672
          PositionRead 6176

          It doesn't seem like we're reading what we wrote.

          Show
          Li Pi added a comment - Okay. I'm confused. I disabled compression, went back to trunk, and changed these lines of code in HLogKey System.out.println("Writing region: " + this.encodedRegionName.hashCode()); Bytes.writeByteArray(out, this.encodedRegionName); System.out.println("Writing table: " + this.tablename.hashCode()); Bytes.writeByteArray(out, this.tablename); And in.readFully(this.encodedRegionName); System.out.println("Reading region: " + this.encodedRegionName.hashCode()); this.tablename = Bytes.readByteArray(in); System.out.println("Reading table: " + this.tablename.hashCode()); then I ran test replay after partial flush. Got this as output PositionWritten 124 Writing region: 1251181435 Writing table: 446506621 PositionWritten 319 Writing region: 1251181435 Writing table: 446506621 PositionWritten 514 Writing region: 1251181435 Writing table: 446506621 PositionWritten 709 Writing region: 1251181435 Writing table: 446506621 PositionWritten 904 Writing region: 1251181435 Writing table: 446506621 PositionWritten 1099 Writing region: 1251181435 Writing table: 446506621 PositionWritten 1294 Writing region: 1251181435 Writing table: 446506621 PositionWritten 1489 Writing region: 1251181435 Writing table: 446506621 PositionWritten 1684 Writing region: 1251181435 Writing table: 446506621 PositionWritten 1879 Writing region: 1251181435 Writing table: 446506621 PositionWritten 2074 Writing region: 1251181435 Writing table: 446506621 PositionWritten 2289 Writing region: 1251181435 Writing table: 446506621 PositionWritten 2484 Writing region: 1251181435 Writing table: 446506621 PositionWritten 2679 Writing region: 1251181435 Writing table: 446506621 PositionWritten 2874 Writing region: 1251181435 Writing table: 446506621 PositionWritten 3069 Writing region: 1251181435 Writing table: 446506621 PositionWritten 3264 Writing region: 1251181435 Writing table: 446506621 PositionWritten 3459 Writing region: 1251181435 Writing table: 446506621 PositionWritten 3654 Writing region: 1251181435 Writing table: 446506621 PositionWritten 3849 Writing region: 1251181435 Writing table: 446506621 PositionWritten 4044 Writing region: 1251181435 Writing table: 446506621 PositionWritten 4239 Writing region: 1251181435 Writing table: 446506621 PositionWritten 4454 Writing region: 1251181435 Writing table: 446506621 PositionWritten 4649 Writing region: 1251181435 Writing table: 446506621 PositionWritten 4844 Writing region: 1251181435 Writing table: 446506621 PositionWritten 5039 Writing region: 1251181435 Writing table: 446506621 PositionWritten 5234 Writing region: 1251181435 Writing table: 446506621 PositionWritten 5429 Writing region: 1251181435 Writing table: 446506621 PositionWritten 5624 Writing region: 1251181435 Writing table: 446506621 PositionWritten 5819 Writing region: 1251181435 Writing table: 446506621 PositionWritten 124 Writing region: 736259394 Writing table: 510860944 PositionWritten 319 Writing region: 1336786910 Writing table: 403681456 PositionWritten 514 Writing region: 1336786910 Writing table: 403681456 PositionWritten 709 Writing region: 1336786910 Writing table: 403681456 PositionWritten 904 Writing region: 1336786910 Writing table: 403681456 PositionWritten 1099 Writing region: 1336786910 Writing table: 403681456 PositionWritten 1294 Writing region: 1336786910 Writing table: 403681456 PositionWritten 1489 Writing region: 1336786910 Writing table: 403681456 PositionWritten 1684 Writing region: 1336786910 Writing table: 403681456 PositionWritten 1879 Writing region: 1336786910 Writing table: 403681456 PositionWritten 2074 Writing region: 1336786910 Writing table: 403681456 PositionWritten 2289 Writing region: 1336786910 Writing table: 403681456 PositionWritten 2484 Writing region: 1336786910 Writing table: 403681456 PositionWritten 2679 Writing region: 1336786910 Writing table: 403681456 PositionWritten 2874 Writing region: 1336786910 Writing table: 403681456 PositionWritten 3069 Writing region: 1336786910 Writing table: 403681456 PositionWritten 3264 Writing region: 1336786910 Writing table: 403681456 PositionWritten 3459 Writing region: 1336786910 Writing table: 403681456 PositionWritten 3654 Writing region: 1336786910 Writing table: 403681456 PositionWritten 3849 Writing region: 1336786910 Writing table: 403681456 PositionWritten 4044 Writing region: 1336786910 Writing table: 403681456 PositionWritten 4239 Writing region: 1336786910 Writing table: 403681456 PositionWritten 4454 Writing region: 1336786910 Writing table: 403681456 PositionWritten 4649 Writing region: 1336786910 Writing table: 403681456 PositionWritten 4844 Writing region: 1336786910 Writing table: 403681456 PositionWritten 5039 Writing region: 1336786910 Writing table: 403681456 PositionWritten 5234 Writing region: 1336786910 Writing table: 403681456 PositionWritten 5429 Writing region: 1336786910 Writing table: 403681456 PositionWritten 5624 Writing region: 1336786910 Writing table: 403681456 PositionWritten 5819 Writing region: 1336786910 Writing table: 403681456 PositionWritten 6014 Writing region: 1336786910 Writing table: 403681456 followed by PositionRead 124 Reading region: 1037916733 Reading table: 256866950 PositionRead 319 Reading region: 720698180 Reading table: 966542180 PositionRead 514 Reading region: 1108113352 Reading table: 1082920280 PositionRead 709 Reading region: 717237635 Reading table: 787220834 PositionRead 904 Reading region: 173807871 Reading table: 611127977 PositionRead 1099 Reading region: 1961109485 Reading table: 788100239 PositionRead 1294 Reading region: 2069065824 Reading table: 586608097 PositionRead 1489 Reading region: 24862902 Reading table: 1258966396 PositionRead 1684 Reading region: 291843681 Reading table: 164096819 PositionRead 1879 Reading region: 606234185 Reading table: 1315525927 PositionRead 2074 Reading region: 1700109224 Reading table: 1465804433 PositionRead 2289 Reading region: 1990190694 Reading table: 2077192033 PositionRead 2484 Reading region: 1872332999 Reading table: 1222834702 PositionRead 2679 Reading region: 764334724 Reading table: 2074013561 PositionRead 2874 Reading region: 2138845270 Reading table: 843685757 PositionRead 3069 Reading region: 2139480405 Reading table: 780981467 PositionRead 3264 Reading region: 535465405 Reading table: 1610580905 PositionRead 3459 Reading region: 1899900 Reading table: 1866848242 PositionRead 3654 Reading region: 1382320624 Reading table: 1184634322 PositionRead 3849 Reading region: 828158517 Reading table: 1018679012 PositionRead 4044 Reading region: 1198520800 Reading table: 142476740 PositionRead 4239 Reading region: 162302775 Reading table: 518507735 PositionRead 4454 Reading region: 70862619 Reading table: 1282097095 PositionRead 4649 Reading region: 354961667 Reading table: 131165903 PositionRead 4844 Reading region: 1187109899 Reading table: 1632991863 PositionRead 5039 Reading region: 853232781 Reading table: 1535039248 PositionRead 5234 Reading region: 1683589725 Reading table: 847975203 PositionRead 5429 Reading region: 1217755329 Reading table: 1294658593 PositionRead 5624 Reading region: 1022661147 Reading table: 1554270688 PositionRead 5819 Reading region: 636371108 Reading table: 1020650096 PositionRead 6014 Reading region: 2114274883 Reading table: 206051672 PositionRead 6176 It doesn't seem like we're reading what we wrote.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4121
          -----------------------------------------------------------

          good start.

          A general design thing: rather than using these static readCompressed/writeCompressed methods, we can introduce an interface something like:
          interface WALCompression

          { public int encodeKeyValue(KeyValue kv, byte[] out, int offset); ... }

          and then have the current non-compressed code path just be the default implementation of WALCompression – and add a configuration which specifies the class to use as the implementor of this interface. We can also store the class name in the WAL metadata so that you can read compressed HLogs even if you are writing non-compressed ones (useful for replication if one cluster uses compression and the other doesn't, for example)

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9296>

          no need to wrap lines

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9295>

          this should be // comments inside the function, rather than javadoc style comments above

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9299>

          we should probably use vints here - most keys and many values are <100bytes long, so we could store the lengths in 1 byte instead of the 4 used here

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9298>

          extra space

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9301>

          extra word "designed"?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9302>

          example should use arguments like "-u compressed-hlog uncompressed-hlog" rather than "filename" twice

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9303>

          check args.length first and print help if it's not got 3 args

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9304>

          should be an 'else if' – and have a final 'else' clause that gives usage

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9305>

          TODO: need to change this config key to match our others

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9306>

          this assumes the whole log's content fits in memory, which shouldn't be necessary... why not loop reading one record from reader and writing one to writer?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9307>

          should have a finally

          { in.close(); }

          probably

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9308>

          should go in finally clause. Also use IOUtils.closeStream as long as "out" implements Closeable (I think it does?)

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9309>

          why not combine this with the if/else above?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9312>

          most of this byte is wasted - we're only using 2 of the 6 bits... and I think we could actually get rid of EMPTY as well.

          If we limit the dictionaries to 32k entries, then we could use the following:

          If bit 0 == 0: dictionary reference
          bits 1 through 15: the dictionary index
          if bit 0 == 1: new value
          start a varint encoding in this byte

          but let's leave this as is for now just to get the rest of the code-level issues cleaned up

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9310>

          rather than this, why not use varints here so you don't have to specify up front what the size is?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9311>

          use constant

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/2740/#comment9313>

          since we have several methods that take all these parameters, and we might want to change the compression scheme in the future, I think it makes sense to introduce a class WALCompressionContext with getters for each of the dictionaries

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/2740/#comment9314>

          indentation

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment9315>

          again the Context object here would make things a little cleaner to integrate:

          • you can drop "compression" boolean and just check "if (compressionContext != null)"
          • you only add one integration point to the existing code instead of lots of new member vars

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9316>

          this should be all caps – but also probably something from the configuration

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9317>

          private final

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9318>

          LOG.isDebugEnabled – or maybe this should even be TRACE level

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9319>

          hashCode() on a byte[] is identity-based - you should use Bytes.hashCode()

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9320>

          equals is identity based here... should use Bytes.equals()

          Also Bytes.equals I believe handles nulls, so you can collapse two of these three clauses together

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9321>

          I'd call this clear()

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
          <https://reviews.apache.org/r/2740/#comment9322>

          does it have to be public?

          • Todd

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4121 ----------------------------------------------------------- good start. A general design thing: rather than using these static readCompressed/writeCompressed methods, we can introduce an interface something like: interface WALCompression { public int encodeKeyValue(KeyValue kv, byte[] out, int offset); ... } and then have the current non-compressed code path just be the default implementation of WALCompression – and add a configuration which specifies the class to use as the implementor of this interface. We can also store the class name in the WAL metadata so that you can read compressed HLogs even if you are writing non-compressed ones (useful for replication if one cluster uses compression and the other doesn't, for example) src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9296 > no need to wrap lines src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9295 > this should be // comments inside the function, rather than javadoc style comments above src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9299 > we should probably use vints here - most keys and many values are <100bytes long, so we could store the lengths in 1 byte instead of the 4 used here src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9298 > extra space src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9301 > extra word "designed"? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9302 > example should use arguments like "-u compressed-hlog uncompressed-hlog" rather than "filename" twice src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9303 > check args.length first and print help if it's not got 3 args src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9304 > should be an 'else if' – and have a final 'else' clause that gives usage src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9305 > TODO: need to change this config key to match our others src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9306 > this assumes the whole log's content fits in memory, which shouldn't be necessary... why not loop reading one record from reader and writing one to writer? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9307 > should have a finally { in.close(); } probably src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9308 > should go in finally clause. Also use IOUtils.closeStream as long as "out" implements Closeable (I think it does?) src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9309 > why not combine this with the if/else above? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9312 > most of this byte is wasted - we're only using 2 of the 6 bits... and I think we could actually get rid of EMPTY as well. If we limit the dictionaries to 32k entries, then we could use the following: If bit 0 == 0: dictionary reference bits 1 through 15: the dictionary index if bit 0 == 1: new value start a varint encoding in this byte but let's leave this as is for now just to get the rest of the code-level issues cleaned up src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9310 > rather than this, why not use varints here so you don't have to specify up front what the size is? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9311 > use constant src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/2740/#comment9313 > since we have several methods that take all these parameters, and we might want to change the compression scheme in the future, I think it makes sense to introduce a class WALCompressionContext with getters for each of the dictionaries src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/2740/#comment9314 > indentation src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment9315 > again the Context object here would make things a little cleaner to integrate: you can drop "compression" boolean and just check "if (compressionContext != null)" you only add one integration point to the existing code instead of lots of new member vars src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9316 > this should be all caps – but also probably something from the configuration src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9317 > private final src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9318 > LOG.isDebugEnabled – or maybe this should even be TRACE level src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9319 > hashCode() on a byte[] is identity-based - you should use Bytes.hashCode() src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9320 > equals is identity based here... should use Bytes.equals() Also Bytes.equals I believe handles nulls, so you can collapse two of these three clauses together src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9321 > I'd call this clear() src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java < https://reviews.apache.org/r/2740/#comment9322 > does it have to be public? Todd On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4125
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9323>

          public class SequenceFileLogWriter implements HLog.Writer {
          And
          public interface Writer {

          There is no Closeable mentioned above although Writer has close() method.

          • Ted

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4125 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9323 > public class SequenceFileLogWriter implements HLog.Writer { And public interface Writer { There is no Closeable mentioned above although Writer has close() method. Ted On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 161

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line161>

          >

          > use constant

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 48

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line48>

          >

          > LOG.isDebugEnabled – or maybe this should even be TRACE level

          removed this completely, not needed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 34

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line34>

          >

          > private final

          removed completely.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 32

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line32>

          >

          > this should be all caps – but also probably something from the configuration

          changed

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 23

          > <https://reviews.apache.org/r/2740/diff/2/?file=65773#file65773line23>

          >

          > does it have to be public?

          now default.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 57

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line57>

          >

          > hashCode() on a byte[] is identity-based - you should use Bytes.hashCode()

          yup. i just figured this out. cost me a ton of pain. was wondering why things weren't compressing the way they should.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, lines 82-85

          > <https://reviews.apache.org/r/2740/diff/2/?file=65769#file65769line82>

          >

          > indentation

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, lines 144-150

          > <https://reviews.apache.org/r/2740/diff/2/?file=65770#file65770line144>

          >

          > again the Context object here would make things a little cleaner to integrate:

          > - you can drop "compression" boolean and just check "if (compressionContext != null)"

          > - you only add one integration point to the existing code instead of lots of new member vars

          will do in a refactoring pass.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 90

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line90>

          >

          > I'd call this clear()

          done.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 64

          > <https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line64>

          >

          > equals is identity based here... should use Bytes.equals()

          >

          > Also Bytes.equals I believe handles nulls, so you can collapse two of these three clauses together

          also just figured this out.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1655

          > <https://reviews.apache.org/r/2740/diff/2/?file=65768#file65768line1655>

          >

          > since we have several methods that take all these parameters, and we might want to change the compression scheme in the future, I think it makes sense to introduce a class WALCompressionContext with getters for each of the dictionaries

          Will make a compression context during refactoring.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, lines 57-58

          > <https://reviews.apache.org/r/2740/diff/2/?file=65766#file65766line57>

          >

          > we should probably use vints here - most keys and many values are <100bytes long, so we could store the lengths in 1 byte instead of the 4 used here

          Will do. I didn't bother compression the size values in KeyValue. Should do that as well - squeeze out extra space.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 70

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line70>

          >

          > should have a finally { in.close(); } probably

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 28

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line28>

          >

          > extra word "designed"?

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 33

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line33>

          >

          > example should use arguments like "-u compressed-hlog uncompressed-hlog" rather than "filename" twice

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line37>

          >

          > check args.length first and print help if it's not got 3 args

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 43-45

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line43>

          >

          > should be an 'else if' – and have a final 'else' clause that gives usage

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 60

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line60>

          >

          > TODO: need to change this config key to match our others

          fixed.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 66-69

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line66>

          >

          > this assumes the whole log's content fits in memory, which shouldn't be necessary... why not loop reading one record from reader and writing one to writer?

          will do in optimization pass.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 90

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line90>

          >

          > should go in finally clause. Also use IOUtils.closeStream as long as "out" implements Closeable (I think it does?)

          done.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 114-116

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line114>

          >

          > why not combine this with the if/else above?

          because we need to write our size.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 133

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line133>

          >

          > most of this byte is wasted - we're only using 2 of the 6 bits... and I think we could actually get rid of EMPTY as well.

          >

          > If we limit the dictionaries to 32k entries, then we could use the following:

          >

          > If bit 0 == 0: dictionary reference

          > bits 1 through 15: the dictionary index

          > if bit 0 == 1: new value

          > start a varint encoding in this byte

          >

          > but let's leave this as is for now just to get the rest of the code-level issues cleaned up

          will do optimisation pass next.

          On 2011-12-27 17:42:31, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 153-159

          > <https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line153>

          >

          > rather than this, why not use varints here so you don't have to specify up front what the size is?

          This is how KeyValue stores the length of its stuff. Didn't want to change that. will do during optimisation pass.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4121
          -----------------------------------------------------------

          On 2011-12-23 06:00:24, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-23 06:00:24)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 161 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line161 > > > use constant fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 48 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line48 > > > LOG.isDebugEnabled – or maybe this should even be TRACE level removed this completely, not needed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 34 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line34 > > > private final removed completely. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 32 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line32 > > > this should be all caps – but also probably something from the configuration changed On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 23 > < https://reviews.apache.org/r/2740/diff/2/?file=65773#file65773line23 > > > does it have to be public? now default. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 57 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line57 > > > hashCode() on a byte[] is identity-based - you should use Bytes.hashCode() yup. i just figured this out. cost me a ton of pain. was wondering why things weren't compressing the way they should. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, lines 82-85 > < https://reviews.apache.org/r/2740/diff/2/?file=65769#file65769line82 > > > indentation fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, lines 144-150 > < https://reviews.apache.org/r/2740/diff/2/?file=65770#file65770line144 > > > again the Context object here would make things a little cleaner to integrate: > - you can drop "compression" boolean and just check "if (compressionContext != null)" > - you only add one integration point to the existing code instead of lots of new member vars will do in a refactoring pass. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 90 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line90 > > > I'd call this clear() done. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 64 > < https://reviews.apache.org/r/2740/diff/2/?file=65772#file65772line64 > > > equals is identity based here... should use Bytes.equals() > > Also Bytes.equals I believe handles nulls, so you can collapse two of these three clauses together also just figured this out. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1655 > < https://reviews.apache.org/r/2740/diff/2/?file=65768#file65768line1655 > > > since we have several methods that take all these parameters, and we might want to change the compression scheme in the future, I think it makes sense to introduce a class WALCompressionContext with getters for each of the dictionaries Will make a compression context during refactoring. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, lines 57-58 > < https://reviews.apache.org/r/2740/diff/2/?file=65766#file65766line57 > > > we should probably use vints here - most keys and many values are <100bytes long, so we could store the lengths in 1 byte instead of the 4 used here Will do. I didn't bother compression the size values in KeyValue. Should do that as well - squeeze out extra space. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 70 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line70 > > > should have a finally { in.close(); } probably fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 28 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line28 > > > extra word "designed"? fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 33 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line33 > > > example should use arguments like "-u compressed-hlog uncompressed-hlog" rather than "filename" twice fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line37 > > > check args.length first and print help if it's not got 3 args fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 43-45 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line43 > > > should be an 'else if' – and have a final 'else' clause that gives usage fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 60 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line60 > > > TODO: need to change this config key to match our others fixed. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 66-69 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line66 > > > this assumes the whole log's content fits in memory, which shouldn't be necessary... why not loop reading one record from reader and writing one to writer? will do in optimization pass. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 90 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line90 > > > should go in finally clause. Also use IOUtils.closeStream as long as "out" implements Closeable (I think it does?) done. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 114-116 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line114 > > > why not combine this with the if/else above? because we need to write our size. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 133 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line133 > > > most of this byte is wasted - we're only using 2 of the 6 bits... and I think we could actually get rid of EMPTY as well. > > If we limit the dictionaries to 32k entries, then we could use the following: > > If bit 0 == 0: dictionary reference > bits 1 through 15: the dictionary index > if bit 0 == 1: new value > start a varint encoding in this byte > > but let's leave this as is for now just to get the rest of the code-level issues cleaned up will do optimisation pass next. On 2011-12-27 17:42:31, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 153-159 > < https://reviews.apache.org/r/2740/diff/2/?file=65767#file65767line153 > > > rather than this, why not use varints here so you don't have to specify up front what the size is? This is how KeyValue stores the length of its stuff. Didn't want to change that. will do during optimisation pass. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4121 ----------------------------------------------------------- On 2011-12-23 06:00:24, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-23 06:00:24) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-12-29 04:38:25.385999)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          added tests. fixed code issues as mentioned by todd.

          Summary
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-29 04:38:25.385999) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- added tests. fixed code issues as mentioned by todd. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          This should be a good time to mention that, at this point, the patch is working.

          There is some refactoring to make it prettier, and room for
          optimization, but please test out the compressor! (with a realistic
          load and see how much improvement it gains.)

          Compressor.java contains a command line compression tool that you can
          use. Just run this against a HLog and check the differing sizes of the
          outputs.

          On Wed, Dec 28, 2011 at 8:38 PM, jiraposter@reviews.apache.org

          Show
          Li Pi added a comment - This should be a good time to mention that, at this point, the patch is working. There is some refactoring to make it prettier, and room for optimization, but please test out the compressor! (with a realistic load and see how much improvement it gains.) Compressor.java contains a command line compression tool that you can use. Just run this against a HLog and check the differing sizes of the outputs. On Wed, Dec 28, 2011 at 8:38 PM, jiraposter@reviews.apache.org
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-12-31 00:20:40.770066)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          WritableContext makes things cleaner. Some space optimizations to make compression even more efficient.

          Summary
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-31 00:20:40.770066) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- WritableContext makes things cleaner. Some space optimizations to make compression even more efficient. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-12-31 02:06:00.510532)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          fixed a failing test.

          Summary
          -------

          Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly.

          Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-31 02:06:00.510532) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- fixed a failing test. Summary ------- Heres what I have so far. Things are written, and "should work". I need to rework the test cases to test this, and put something in the config file to enable/disable. Obviously this isn't ready for commit at the moment, but I can get those two things done pretty quickly. Obviously the dictionary is incredibly simple at the moment, I'll come up with something cooler sooner. Let me know how this looks. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          Yup. good time to do it.

          On Fri, Dec 30, 2011 at 4:35 PM, Zhihong Yu (Commented) (JIRA)

          Show
          Li Pi added a comment - Yup. good time to do it. On Fri, Dec 30, 2011 at 4:35 PM, Zhihong Yu (Commented) (JIRA)
          Hide
          Ted Yu added a comment -

          @Li:
          Please use '--no-prefix' to generate diff.
          Otherwise Hadoop QA won't be able to apply your patch.

          Show
          Ted Yu added a comment - @Li: Please use '--no-prefix' to generate diff. Otherwise Hadoop QA won't be able to apply your patch.
          Hide
          Li Pi added a comment -

          no prefix patch

          Show
          Li Pi added a comment - no prefix patch
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2011-12-31 20:19:11.951711)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary (updated)
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-31 20:19:11.951711) Review request for hbase, Eli Collins and Todd Lipcon. Summary (updated) ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Ted Yu added a comment -

          @Li:
          Hadoop QA is taking a vacation See https://builds.apache.org/job/PreCommit-HBASE-Build/

          I ran patch v5 on Linux and didn't observe notable issue. But then you have a new patch.
          Please try to run your latest patch through test suite.

          Show
          Ted Yu added a comment - @Li: Hadoop QA is taking a vacation See https://builds.apache.org/job/PreCommit-HBASE-Build/ I ran patch v5 on Linux and didn't observe notable issue. But then you have a new patch. Please try to run your latest patch through test suite.
          Hide
          Lars Hofhansl added a comment -

          @Li: How big do you expect the in-memory dictionary to grow?
          I was wondering if the reading or writing process could give the compressor hints about when would be a good time to reset the dictionary (for example when memstore flush entry was found).
          The compressor can choose to ignore the hints and use some internal logic, or reset the dictionary when it got hinted.

          Show
          Lars Hofhansl added a comment - @Li: How big do you expect the in-memory dictionary to grow? I was wondering if the reading or writing process could give the compressor hints about when would be a good time to reset the dictionary (for example when memstore flush entry was found). The compressor can choose to ignore the hints and use some internal logic, or reset the dictionary when it got hinted.
          Hide
          Li Pi added a comment -

          max size = 64k * around 100-200 bytes. Really not that big. Less than 100 megabytes.

          Show
          Li Pi added a comment - max size = 64k * around 100-200 bytes. Really not that big. Less than 100 megabytes.
          Hide
          Li Pi added a comment -

          I was thinking of replacing the 1-way associative with a a 127 sized LRU dictionary. should allow us to save a few bytes, and also be far more efficient with our eviction strategy.

          Show
          Li Pi added a comment - I was thinking of replacing the 1-way associative with a a 127 sized LRU dictionary. should allow us to save a few bytes, and also be far more efficient with our eviction strategy.
          Hide
          Ted Yu added a comment -

          How about using Guava's MapMaker ?
          From SingleSizeCache.java:

              backingMap = new MapMaker().maximumSize(numBlocks - 1)
                  .evictionListener(listener).makeMap();
          
          Show
          Ted Yu added a comment - How about using Guava's MapMaker ? From SingleSizeCache.java: backingMap = new MapMaker().maximumSize(numBlocks - 1) .evictionListener(listener).makeMap();
          Hide
          Li Pi added a comment -

          Guava's mapmaker doesn't guarantee consistent eviction. You'd want to either use 2 LinkedHashMap's or your own LRU style system.

          Show
          Li Pi added a comment - Guava's mapmaker doesn't guarantee consistent eviction. You'd want to either use 2 LinkedHashMap's or your own LRU style system.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4172
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/HConstants.java
          <https://reviews.apache.org/r/2740/#comment9402>

          This name may refer to the compression algorithm.
          I think the word 'enable' should be part of the name.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9403>

          No year needed.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9404>

          This javadoc should be combined with above block.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment9405>

          Should read 'Compresses and ...'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          <https://reviews.apache.org/r/2740/#comment9407>

          Add license, please.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          <https://reviews.apache.org/r/2740/#comment9406>

          Add jaavdoc for this class, please.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9408>

          License, please.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9409>

          Add javadoc for the parameters, please.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/2740/#comment9410>

          Should there be disableCompression ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
          <https://reviews.apache.org/r/2740/#comment9411>

          Remove this year line, please.

          • Ted

          On 2011-12-31 20:19:11, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2011-12-31 20:19:11)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4172 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/HConstants.java < https://reviews.apache.org/r/2740/#comment9402 > This name may refer to the compression algorithm. I think the word 'enable' should be part of the name. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9403 > No year needed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9404 > This javadoc should be combined with above block. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment9405 > Should read 'Compresses and ...' src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java < https://reviews.apache.org/r/2740/#comment9407 > Add license, please. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java < https://reviews.apache.org/r/2740/#comment9406 > Add jaavdoc for this class, please. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9408 > License, please. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9409 > Add javadoc for the parameters, please. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/2740/#comment9410 > Should there be disableCompression ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java < https://reviews.apache.org/r/2740/#comment9411 > Remove this year line, please. Ted On 2011-12-31 20:19:11, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2011-12-31 20:19:11) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-06 00:01:44.856233)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          Added a LRU dictionary. Should be more efficient than a 1-way associative cache.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-06 00:01:44.856233) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- Added a LRU dictionary. Should be more efficient than a 1-way associative cache. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSimpleDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-07 01:25:20.762498)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          Addressed ted yu's changes. Also switched SimpleDictionary to LRUDictionary. Much smarter eviction algorithm.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-07 01:25:20.762498) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- Addressed ted yu's changes. Also switched SimpleDictionary to LRUDictionary. Much smarter eviction algorithm. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1655

          > <https://reviews.apache.org/r/2740/diff/5/?file=66009#file66009line1655>

          >

          > Should there be disableCompression ?

          Compression is always enabled if config. Otherwise decompressor won't know whether to try to decompress the log or not.

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 2

          > <https://reviews.apache.org/r/2740/diff/5/?file=66013#file66013line2>

          >

          > Remove this year line, please.

          Done.

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 137

          > <https://reviews.apache.org/r/2740/diff/5/?file=66008#file66008line137>

          >

          > Add javadoc for the parameters, please.

          Added.

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 1

          > <https://reviews.apache.org/r/2740/diff/5/?file=66008#file66008line1>

          >

          > License, please.

          Done

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 579

          > <https://reviews.apache.org/r/2740/diff/5/?file=66005#file66005line579>

          >

          > This name may refer to the compression algorithm.

          > I think the word 'enable' should be part of the name.

          fixed.

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 2

          > <https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line2>

          >

          > No year needed.

          fixed.

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 51

          > <https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line51>

          >

          > This javadoc should be combined with above block.

          fixed.

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 87

          > <https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line87>

          >

          > Should read 'Compresses and ...'

          fixed.

          On 2012-01-02 23:39:38, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 1

          > <https://reviews.apache.org/r/2740/diff/5/?file=66007#file66007line1>

          >

          > Add license, please.

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4172
          -----------------------------------------------------------

          On 2012-01-07 01:25:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-07 01:25:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1655 > < https://reviews.apache.org/r/2740/diff/5/?file=66009#file66009line1655 > > > Should there be disableCompression ? Compression is always enabled if config. Otherwise decompressor won't know whether to try to decompress the log or not. On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java, line 2 > < https://reviews.apache.org/r/2740/diff/5/?file=66013#file66013line2 > > > Remove this year line, please. Done. On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 137 > < https://reviews.apache.org/r/2740/diff/5/?file=66008#file66008line137 > > > Add javadoc for the parameters, please. Added. On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 1 > < https://reviews.apache.org/r/2740/diff/5/?file=66008#file66008line1 > > > License, please. Done On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 579 > < https://reviews.apache.org/r/2740/diff/5/?file=66005#file66005line579 > > > This name may refer to the compression algorithm. > I think the word 'enable' should be part of the name. fixed. On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 2 > < https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line2 > > > No year needed. fixed. On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 51 > < https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line51 > > > This javadoc should be combined with above block. fixed. On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 87 > < https://reviews.apache.org/r/2740/diff/5/?file=66006#file66006line87 > > > Should read 'Compresses and ...' fixed. On 2012-01-02 23:39:38, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 1 > < https://reviews.apache.org/r/2740/diff/5/?file=66007#file66007line1 > > > Add license, please. fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4172 ----------------------------------------------------------- On 2012-01-07 01:25:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-07 01:25:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-07 03:13:33.237858)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-07 03:13:33.237858) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Ted Yu added a comment -

          From https://builds.apache.org/job/PreCommit-HBASE-Build/693//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/:

          java.net.BindException: Problem binding to localhost/127.0.0.1:50150 : Address already in use
          	at org.apache.hadoop.ipc.Server.bind(Server.java:227)
          

          Strange, was the above caused by parallel test case execution ?

          Show
          Ted Yu added a comment - From https://builds.apache.org/job/PreCommit-HBASE-Build/693//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLog/testAppendClose/: java.net.BindException: Problem binding to localhost/127.0.0.1:50150 : Address already in use at org.apache.hadoop.ipc.Server.bind(Server.java:227) Strange, was the above caused by parallel test case execution ?
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4232
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9566>

          Add javadoc please.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9567>

          Please give this config parameter better name.
          How about 'hbase.regionserver.wal.compressed' ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9568>

          Would this be able to hold large number of HLog.Entry's in memory ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment9569>

          Since short is signed, how do I know that the return value would be positive ?
          e.g. (short)0xFE00 == -512

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment9570>

          I suggest naming this class Node.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          <https://reviews.apache.org/r/2740/#comment9573>

          Is compressed a better name ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
          <https://reviews.apache.org/r/2740/#comment9574>

          White space makes indentation look weird.

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment9575>

          Please add Apache license.

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment9576>

          Add short javadoc and test category, please.

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          <https://reviews.apache.org/r/2740/#comment9572>

          Please remove year.

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          <https://reviews.apache.org/r/2740/#comment9571>

          You will add the real test, right ?

          Also, missing test category.

          • Ted

          On 2012-01-07 03:13:33, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-07 03:13:33)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4232 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9566 > Add javadoc please. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9567 > Please give this config parameter better name. How about 'hbase.regionserver.wal.compressed' ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9568 > Would this be able to hold large number of HLog.Entry's in memory ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment9569 > Since short is signed, how do I know that the return value would be positive ? e.g. (short)0xFE00 == -512 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment9570 > I suggest naming this class Node. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java < https://reviews.apache.org/r/2740/#comment9573 > Is compressed a better name ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java < https://reviews.apache.org/r/2740/#comment9574 > White space makes indentation look weird. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java < https://reviews.apache.org/r/2740/#comment9575 > Please add Apache license. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java < https://reviews.apache.org/r/2740/#comment9576 > Add short javadoc and test category, please. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java < https://reviews.apache.org/r/2740/#comment9572 > Please remove year. src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java < https://reviews.apache.org/r/2740/#comment9571 > You will add the real test, right ? Also, missing test category. Ted On 2012-01-07 03:13:33, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-07 03:13:33) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 35

          > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line35>

          >

          > Add javadoc please.

          Done.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 84

          > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line84>

          >

          > Please give this config parameter better name.

          > How about 'hbase.regionserver.wal.compressed' ?

          Done.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 91

          > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line91>

          >

          > Would this be able to hold large number of HLog.Entry's in memory ?

          An HLog is at most 400mb, should be okay?

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 229

          > <https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line229>

          >

          > Since short is signed, how do I know that the return value would be positive ?

          > e.g. (short)0xFE00 == -512

          if the hi bit is negative, (we read that), then we do something else, because its not part of the dictionary. added an assert anyways.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 166

          > <https://reviews.apache.org/r/2740/diff/8/?file=67063#file67063line166>

          >

          > I suggest naming this class Node.

          Done.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 74

          > <https://reviews.apache.org/r/2740/diff/8/?file=67067#file67067line74>

          >

          > Is compressed a better name ?

          Done.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 127

          > <https://reviews.apache.org/r/2740/diff/8/?file=67067#file67067line127>

          >

          > White space makes indentation look weird.

          fixed.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, line 1

          > <https://reviews.apache.org/r/2740/diff/8/?file=67068#file67068line1>

          >

          > Please add Apache license.

          fixed.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, line 12

          > <https://reviews.apache.org/r/2740/diff/8/?file=67068#file67068line12>

          >

          > Add short javadoc and test category, please.

          test category - small?

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 2

          > <https://reviews.apache.org/r/2740/diff/8/?file=67070#file67070line2>

          >

          > Please remove year.

          Done.

          On 2012-01-07 05:18:45, Ted Yu wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 29

          > <https://reviews.apache.org/r/2740/diff/8/?file=67070#file67070line29>

          >

          > You will add the real test, right ?

          >

          > Also, missing test category.

          This is actually a really good test. If testWALReplay works after compression is enabled, then the compression/decompression is working. This is the real test.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4232
          -----------------------------------------------------------

          On 2012-01-10 02:34:06, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-10 02:34:06)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-07 05:18:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 35 > < https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line35 > > > Add javadoc please. Done. On 2012-01-07 05:18:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 84 > < https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line84 > > > Please give this config parameter better name. > How about 'hbase.regionserver.wal.compressed' ? Done. On 2012-01-07 05:18:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 91 > < https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line91 > > > Would this be able to hold large number of HLog.Entry's in memory ? An HLog is at most 400mb, should be okay? On 2012-01-07 05:18:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 229 > < https://reviews.apache.org/r/2740/diff/8/?file=67060#file67060line229 > > > Since short is signed, how do I know that the return value would be positive ? > e.g. (short)0xFE00 == -512 if the hi bit is negative, (we read that), then we do something else, because its not part of the dictionary. added an assert anyways. On 2012-01-07 05:18:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 166 > < https://reviews.apache.org/r/2740/diff/8/?file=67063#file67063line166 > > > I suggest naming this class Node. Done. On 2012-01-07 05:18:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 74 > < https://reviews.apache.org/r/2740/diff/8/?file=67067#file67067line74 > > > Is compressed a better name ? Done. On 2012-01-07 05:18:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java, line 127 > < https://reviews.apache.org/r/2740/diff/8/?file=67067#file67067line127 > > > White space makes indentation look weird. fixed. On 2012-01-07 05:18:45, Ted Yu wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, line 1 > < https://reviews.apache.org/r/2740/diff/8/?file=67068#file67068line1 > > > Please add Apache license. fixed. On 2012-01-07 05:18:45, Ted Yu wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java, line 12 > < https://reviews.apache.org/r/2740/diff/8/?file=67068#file67068line12 > > > Add short javadoc and test category, please. test category - small? On 2012-01-07 05:18:45, Ted Yu wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 2 > < https://reviews.apache.org/r/2740/diff/8/?file=67070#file67070line2 > > > Please remove year. Done. On 2012-01-07 05:18:45, Ted Yu wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 29 > < https://reviews.apache.org/r/2740/diff/8/?file=67070#file67070line29 > > > You will add the real test, right ? > > Also, missing test category. This is actually a really good test. If testWALReplay works after compression is enabled, then the compression/decompression is working. This is the real test. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4232 ----------------------------------------------------------- On 2012-01-10 02:34:06, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-10 02:34:06) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-10 02:34:06.162265)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-10 02:34:06.162265) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Ted Yu added a comment -

          I got the following on my MacBook for 4608v9.txt:

          testReplayEditsWrittenViaHRegion(org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed)  Time elapsed: 2.009 sec  <<< FAILURE!
          java.lang.AssertionError
            at org.junit.Assert.fail(Assert.java:92)
            at org.junit.Assert.assertTrue(Assert.java:43)
            at org.junit.Assert.assertTrue(Assert.java:54)
            at org.apache.hadoop.hbase.regionserver.wal.TestWALReplay.testReplayEditsWrittenViaHRegion(TestWALReplay.java:289)
          
          Show
          Ted Yu added a comment - I got the following on my MacBook for 4608v9.txt: testReplayEditsWrittenViaHRegion(org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed) Time elapsed: 2.009 sec <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.hbase.regionserver.wal.TestWALReplay.testReplayEditsWrittenViaHRegion(TestWALReplay.java:289)
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-13 00:58:40.183584)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          fixed failing test. added a few new ones to detect LRU dictionary failure.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-13 00:58:40.183584) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- fixed failing test. added a few new ones to detect LRU dictionary failure. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-13 01:34:31.569679)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          fixed bug in dictionary causing another test to fail. passes small tests now. running med tests.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-13 01:34:31.569679) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- fixed bug in dictionary causing another test to fail. passes small tests now. running med tests. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-13 01:37:35.790343)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          removed debug printf.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-13 01:37:35.790343) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- removed debug printf. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Ted Yu added a comment -

          I tried to run TestWALReplayCompressed:

          Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed
          Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.008 sec
          
          Results :
          
          Tests run: 5, Failures: 0, Errors: 0, Skipped: 0
          
          [INFO] 
          [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
          [INFO] Tests are skipped.
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 3:50.838s
          

          Looks like the ShutdownHooks took a long time to finish:

          "main" prio=5 tid=104000800 nid=0x100601000 in Object.wait() [100600000]
             java.lang.Thread.State: WAITING (on object monitor)
          	at java.lang.Object.wait(Native Method)
          	at java.lang.Thread.join(Thread.java:1210)
          	- locked <78e887470> (a org.apache.hadoop.fs.FileSystem$ClientFinalizer)
          	at java.lang.Thread.join(Thread.java:1263)
          	at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)
          	at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
          	at java.lang.Shutdown.runHooks(Shutdown.java:79)
          	at java.lang.Shutdown.sequence(Shutdown.java:123)
          	at java.lang.Shutdown.exit(Shutdown.java:168)
          	- locked <7faf9d288> (a java.lang.Class for java.lang.Shutdown)
          	at java.lang.Runtime.exit(Runtime.java:90)
          	at java.lang.System.exit(System.java:921)
          	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:73)
          
          Show
          Ted Yu added a comment - I tried to run TestWALReplayCompressed: Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplayCompressed Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.008 sec Results : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:50.838s Looks like the ShutdownHooks took a long time to finish: "main" prio=5 tid=104000800 nid=0x100601000 in Object .wait() [100600000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) at java.lang. Thread .join( Thread .java:1210) - locked <78e887470> (a org.apache.hadoop.fs.FileSystem$ClientFinalizer) at java.lang. Thread .join( Thread .java:1263) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) at java.lang.Shutdown.runHooks(Shutdown.java:79) at java.lang.Shutdown.sequence(Shutdown.java:123) at java.lang.Shutdown.exit(Shutdown.java:168) - locked <7faf9d288> (a java.lang. Class for java.lang.Shutdown) at java.lang. Runtime .exit( Runtime .java:90) at java.lang. System .exit( System .java:921) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:73)
          Hide
          Li Pi added a comment -

          Are the shutdown hooks slower than TestWALReplay without compression?

          On Thu, Jan 19, 2012 at 4:12 PM, Zhihong Yu (Commented) (JIRA)

          Show
          Li Pi added a comment - Are the shutdown hooks slower than TestWALReplay without compression? On Thu, Jan 19, 2012 at 4:12 PM, Zhihong Yu (Commented) (JIRA)
          Hide
          Ted Yu added a comment -

          Similar result:

          Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplay
          Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.865 sec
          
          Results :
          
          Tests run: 5, Failures: 0, Errors: 0, Skipped: 0
          
          [INFO] 
          [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
          [INFO] Tests are skipped.
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 3:46.105s
          
          Show
          Ted Yu added a comment - Similar result: Running org.apache.hadoop.hbase.regionserver.wal.TestWALReplay Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.865 sec Results : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:46.105s
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4508
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10102>

          '/less' should be removed.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10103>

          javadoc needs update.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10104>

          Either remove the word 'a' or change it into 'an'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10105>

          Please change ourKV to keyval or something similar.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10106>

          Update javadoc to match the context parameter.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10107>

          I think adding 'the effect of compression would be good' at the end would make the sentence more easily understandable.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment10112>

          Remove whitespace.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment10113>

          This javadoc is more suitable for the init() method.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment10114>

          Please include e in new IOE.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
          <https://reviews.apache.org/r/2740/#comment10111>

          Please include e in the new IOE.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
          <https://reviews.apache.org/r/2740/#comment10108>

          Please remove year.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
          <https://reviews.apache.org/r/2740/#comment10109>

          Please put this line at the end of line 34.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
          <https://reviews.apache.org/r/2740/#comment10110>

          'ad' should be 'add'

          • Ted

          On 2012-01-13 01:37:35, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-13 01:37:35)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4508 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10102 > '/less' should be removed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10103 > javadoc needs update. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10104 > Either remove the word 'a' or change it into 'an' src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10105 > Please change ourKV to keyval or something similar. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10106 > Update javadoc to match the context parameter. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10107 > I think adding 'the effect of compression would be good' at the end would make the sentence more easily understandable. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment10112 > Remove whitespace. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment10113 > This javadoc is more suitable for the init() method. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment10114 > Please include e in new IOE. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java < https://reviews.apache.org/r/2740/#comment10111 > Please include e in the new IOE. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java < https://reviews.apache.org/r/2740/#comment10108 > Please remove year. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java < https://reviews.apache.org/r/2740/#comment10109 > Please put this line at the end of line 34. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java < https://reviews.apache.org/r/2740/#comment10110 > 'ad' should be 'add' Ted On 2012-01-13 01:37:35, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-13 01:37:35) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Lars Hofhansl added a comment -

          It occurred to me yesterday that we should clear the dictionaries after each successful memstore flush...?
          Otherwise we might have to go further back in the log than necessary in order to replay.

          I realize memstore flushes a pre region, whereas the WAL is per region server, still it seems prudent to reset the dictionary after each flush. Thoughts?

          Show
          Lars Hofhansl added a comment - It occurred to me yesterday that we should clear the dictionaries after each successful memstore flush...? Otherwise we might have to go further back in the log than necessary in order to replay. I realize memstore flushes a pre region, whereas the WAL is per region server, still it seems prudent to reset the dictionary after each flush. Thoughts?
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 34

          > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line34>

          >

          > '/less' should be removed.

          fixed.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 42

          > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line42>

          >

          > javadoc needs update.

          fixed.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 43

          > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line43>

          >

          > Either remove the word 'a' or change it into 'an'

          fixed.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 78

          > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line78>

          >

          > Please change ourKV to keyval or something similar.

          fixed.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 82

          > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line82>

          >

          > Update javadoc to match the context parameter.

          fixed.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 94

          > <https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line94>

          >

          > I think adding 'the effect of compression would be good' at the end would make the sentence more easily understandable.

          fixed

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 60

          > <https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line60>

          >

          > Remove whitespace.

          fixed.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 154

          > <https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line154>

          >

          > This javadoc is more suitable for the init() method.

          fixed.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 186

          > <https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line186>

          >

          > Please include e in new IOE.

          fixed. I assume you mean store it as the cause.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 93

          > <https://reviews.apache.org/r/2740/diff/12/?file=68711#file68711line93>

          >

          > Please include e in the new IOE.

          fixed above.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 2

          > <https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line2>

          >

          > Please remove year.

          fixed above.

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 35

          > <https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line35>

          >

          > Please put this line at the end of line 34.

          fixed

          On 2012-01-20 22:56:07, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 53

          > <https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line53>

          >

          > 'ad' should be 'add'

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4508
          -----------------------------------------------------------

          On 2012-01-13 01:37:35, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-13 01:37:35)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 34 > < https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line34 > > > '/less' should be removed. fixed. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 42 > < https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line42 > > > javadoc needs update. fixed. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 43 > < https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line43 > > > Either remove the word 'a' or change it into 'an' fixed. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 78 > < https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line78 > > > Please change ourKV to keyval or something similar. fixed. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 82 > < https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line82 > > > Update javadoc to match the context parameter. fixed. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 94 > < https://reviews.apache.org/r/2740/diff/12/?file=68704#file68704line94 > > > I think adding 'the effect of compression would be good' at the end would make the sentence more easily understandable. fixed On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 60 > < https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line60 > > > Remove whitespace. fixed. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 154 > < https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line154 > > > This javadoc is more suitable for the init() method. fixed. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 186 > < https://reviews.apache.org/r/2740/diff/12/?file=68710#file68710line186 > > > Please include e in new IOE. fixed. I assume you mean store it as the cause. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java, line 93 > < https://reviews.apache.org/r/2740/diff/12/?file=68711#file68711line93 > > > Please include e in the new IOE. fixed above. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 2 > < https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line2 > > > Please remove year. fixed above. On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 35 > < https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line35 > > > Please put this line at the end of line 34. fixed On 2012-01-20 22:56:07, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 53 > < https://reviews.apache.org/r/2740/diff/12/?file=68712#file68712line53 > > > 'ad' should be 'add' fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4508 ----------------------------------------------------------- On 2012-01-13 01:37:35, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-13 01:37:35) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-24 09:00:37.768707)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          CHANGES.txt 1d7238e
          bin/hbase 350abef
          bin/hbase-daemon.sh 5c42ac1
          dev-support/findHangingTest.sh PRE-CREATION
          pom.xml 6566a1c
          src/docbkx/book.xml c67ca06
          src/docbkx/configuration.xml 7fd90e7
          src/docbkx/ops_mgt.xml f93c9f2
          src/docbkx/performance.xml e61248f
          src/docbkx/preface.xml 10fa755
          src/docbkx/troubleshooting.xml 0b7c93a
          src/docbkx/upgrading.xml c0642f5
          src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon 24caabd
          src/main/java/org/apache/hadoop/hbase/HBaseConfiguration.java 0477be8
          src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c
          src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 8ec5042
          src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 6cdeec1
          src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/client/Delete.java 51bbc63
          src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8cd9bd0
          src/main/java/org/apache/hadoop/hbase/client/HConnection.java 0e78d96
          src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 852a810
          src/main/java/org/apache/hadoop/hbase/client/HTable.java 839d79b
          src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 0bc9577
          src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 4135e55
          src/main/java/org/apache/hadoop/hbase/client/RowMutation.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 9b568e3
          src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java 0d4a9e4
          src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java ba3414d
          src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java f25ba11
          src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java b47423c
          src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 9002a0f
          src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java 3ad6cd5
          src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 07ddbca
          src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4327a44
          src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java 39c73f5
          src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java bd574b2
          src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormat.java 3dcbf74
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java e6f8a6e
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084
          src/main/java/org/apache/hadoop/hbase/master/LoadBalancerFactory.java 89685bb
          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 3938fa7
          src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 9de1784
          src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 667a8b1
          src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 2dfc3e7
          src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 2dd497b
          src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java 493dcdb
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java fb4ec05
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 3917d40
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionThriftServer.java 18b6c13
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java c840e7c
          src/main/java/org/apache/hadoop/hbase/regionserver/OperationStatus.java b6f7456
          src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 7cee17c
          src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff
          src/main/java/org/apache/hadoop/hbase/regionserver/Store.java b928731
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java bd6f70d
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java e8e95ed
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java a25ca32
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRootHandler.java fa38ad6
          src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java 490694c
          src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java 97dd8e6
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 43bfba0
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java 7fe0ae5
          src/main/java/org/apache/hadoop/hbase/rest/MultiRowResource.java 2ba6a0d
          src/main/java/org/apache/hadoop/hbase/rest/RowResource.java dade6a8
          src/main/java/org/apache/hadoop/hbase/rest/TableResource.java cc719bc
          src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java f93c81d
          src/main/java/org/apache/hadoop/hbase/rest/transform/Base64.java f991121
          src/main/java/org/apache/hadoop/hbase/rest/transform/NullTransform.java 8492cc6
          src/main/java/org/apache/hadoop/hbase/rest/transform/Transform.java 9f33bab
          src/main/java/org/apache/hadoop/hbase/thrift/HThreadedSelectorServerArgs.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java 690a57f
          src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 3fa5d41
          src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/thrift/generated/AlreadyExists.java 0479e31
          src/main/java/org/apache/hadoop/hbase/thrift/generated/BatchMutation.java be902c9
          src/main/java/org/apache/hadoop/hbase/thrift/generated/ColumnDescriptor.java 04b42fe
          src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java 9e31c61
          src/main/java/org/apache/hadoop/hbase/thrift/generated/IOError.java 778e869
          src/main/java/org/apache/hadoop/hbase/thrift/generated/IllegalArgument.java 9ae5340
          src/main/java/org/apache/hadoop/hbase/thrift/generated/Mutation.java 7aa9bcd
          src/main/java/org/apache/hadoop/hbase/thrift/generated/TCell.java ed420d4
          src/main/java/org/apache/hadoop/hbase/thrift/generated/TRegionInfo.java 161dedc
          src/main/java/org/apache/hadoop/hbase/thrift/generated/TRowResult.java 0f31e5e
          src/main/java/org/apache/hadoop/hbase/thrift/generated/TScan.java 3b894db
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumn.java 3e116e7
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java 8390015
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java 424a87b
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDelete.java 68b4f8e
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDeleteType.java 2abdee0
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TGet.java b1a1a12
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java 272a4a5
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIOError.java 283d430
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIllegalArgument.java 254fbe5
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIncrement.java 3cc82e9
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TPut.java 97ab5dc
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TResult.java 73c8340
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TScan.java d76c355
          src/main/java/org/apache/hadoop/hbase/thrift2/generated/TTimeRange.java ad9fdc7
          src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java 11dfbef
          src/main/java/org/apache/hadoop/hbase/util/Threads.java 6f81b62
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java 9b83840
          src/main/resources/hbase-webapps/static/favicon.ico PRE-CREATION
          src/main/resources/hbase-webapps/static/hbase_logo.png 03fa793
          src/site/resources/images/favicon.ico 161bcf7
          src/site/resources/images/hbase_logo.png 03fa793
          src/site/resources/images/hbase_logo.svg PRE-CREATION
          src/site/resources/images/hbase_logo_med.gif 36d3e3c
          src/site/resources/images/hbase_small.gif 3275765
          src/site/xdoc/index.xml 9157d6a
          src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java dada051
          src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 80d69b4
          src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java c1a077f
          src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java bb077d0
          src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java ab80020
          src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 0d38ac9
          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 5b64895
          src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java 5e3e994
          src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan.java 46e1bee
          src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java cc0f30f
          src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java c359f4b
          src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a348f0c
          src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 32ad7e8
          src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 42db18b
          src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java 0a34371
          src/test/java/org/apache/hadoop/hbase/regionserver/TestRSStatusServlet.java 64e61bb
          src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java 2d87567
          src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java 853a35f
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 6e89cc4
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/rest/TestTransform.java 2e2ba4c
          src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java 12247d0
          src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java 477141f
          src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 0b45ac1

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 09:00:37.768707) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) CHANGES.txt 1d7238e bin/hbase 350abef bin/hbase-daemon.sh 5c42ac1 dev-support/findHangingTest.sh PRE-CREATION pom.xml 6566a1c src/docbkx/book.xml c67ca06 src/docbkx/configuration.xml 7fd90e7 src/docbkx/ops_mgt.xml f93c9f2 src/docbkx/performance.xml e61248f src/docbkx/preface.xml 10fa755 src/docbkx/troubleshooting.xml 0b7c93a src/docbkx/upgrading.xml c0642f5 src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon 24caabd src/main/java/org/apache/hadoop/hbase/HBaseConfiguration.java 0477be8 src/main/java/org/apache/hadoop/hbase/HConstants.java 5120a3c src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 8ec5042 src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 6cdeec1 src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/Delete.java 51bbc63 src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8cd9bd0 src/main/java/org/apache/hadoop/hbase/client/HConnection.java 0e78d96 src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 852a810 src/main/java/org/apache/hadoop/hbase/client/HTable.java 839d79b src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 0bc9577 src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 4135e55 src/main/java/org/apache/hadoop/hbase/client/RowMutation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 9b568e3 src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java 0d4a9e4 src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java ba3414d src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java f25ba11 src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java b47423c src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 9002a0f src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java 3ad6cd5 src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 07ddbca src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4327a44 src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java 39c73f5 src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java bd574b2 src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormat.java 3dcbf74 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java e6f8a6e src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 src/main/java/org/apache/hadoop/hbase/master/LoadBalancerFactory.java 89685bb src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 3938fa7 src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 9de1784 src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 667a8b1 src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 2dfc3e7 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 2dd497b src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java 493dcdb src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java fb4ec05 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 3917d40 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionThriftServer.java 18b6c13 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java c840e7c src/main/java/org/apache/hadoop/hbase/regionserver/OperationStatus.java b6f7456 src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 7cee17c src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff src/main/java/org/apache/hadoop/hbase/regionserver/Store.java b928731 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java bd6f70d src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java e8e95ed src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java a25ca32 src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRootHandler.java fa38ad6 src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java 490694c src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java 97dd8e6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 43bfba0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java 7fe0ae5 src/main/java/org/apache/hadoop/hbase/rest/MultiRowResource.java 2ba6a0d src/main/java/org/apache/hadoop/hbase/rest/RowResource.java dade6a8 src/main/java/org/apache/hadoop/hbase/rest/TableResource.java cc719bc src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java f93c81d src/main/java/org/apache/hadoop/hbase/rest/transform/Base64.java f991121 src/main/java/org/apache/hadoop/hbase/rest/transform/NullTransform.java 8492cc6 src/main/java/org/apache/hadoop/hbase/rest/transform/Transform.java 9f33bab src/main/java/org/apache/hadoop/hbase/thrift/HThreadedSelectorServerArgs.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java 690a57f src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 3fa5d41 src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/thrift/generated/AlreadyExists.java 0479e31 src/main/java/org/apache/hadoop/hbase/thrift/generated/BatchMutation.java be902c9 src/main/java/org/apache/hadoop/hbase/thrift/generated/ColumnDescriptor.java 04b42fe src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java 9e31c61 src/main/java/org/apache/hadoop/hbase/thrift/generated/IOError.java 778e869 src/main/java/org/apache/hadoop/hbase/thrift/generated/IllegalArgument.java 9ae5340 src/main/java/org/apache/hadoop/hbase/thrift/generated/Mutation.java 7aa9bcd src/main/java/org/apache/hadoop/hbase/thrift/generated/TCell.java ed420d4 src/main/java/org/apache/hadoop/hbase/thrift/generated/TRegionInfo.java 161dedc src/main/java/org/apache/hadoop/hbase/thrift/generated/TRowResult.java 0f31e5e src/main/java/org/apache/hadoop/hbase/thrift/generated/TScan.java 3b894db src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumn.java 3e116e7 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java 8390015 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java 424a87b src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDelete.java 68b4f8e src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDeleteType.java 2abdee0 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TGet.java b1a1a12 src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java 272a4a5 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIOError.java 283d430 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIllegalArgument.java 254fbe5 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIncrement.java 3cc82e9 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TPut.java 97ab5dc src/main/java/org/apache/hadoop/hbase/thrift2/generated/TResult.java 73c8340 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TScan.java d76c355 src/main/java/org/apache/hadoop/hbase/thrift2/generated/TTimeRange.java ad9fdc7 src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java 11dfbef src/main/java/org/apache/hadoop/hbase/util/Threads.java 6f81b62 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java 9b83840 src/main/resources/hbase-webapps/static/favicon.ico PRE-CREATION src/main/resources/hbase-webapps/static/hbase_logo.png 03fa793 src/site/resources/images/favicon.ico 161bcf7 src/site/resources/images/hbase_logo.png 03fa793 src/site/resources/images/hbase_logo.svg PRE-CREATION src/site/resources/images/hbase_logo_med.gif 36d3e3c src/site/resources/images/hbase_small.gif 3275765 src/site/xdoc/index.xml 9157d6a src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java dada051 src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 80d69b4 src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java c1a077f src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java bb077d0 src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java ab80020 src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggregateProtocol.java 0d38ac9 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 5b64895 src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java 5e3e994 src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan.java 46e1bee src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java cc0f30f src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java c359f4b src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a348f0c src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 32ad7e8 src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java 42db18b src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java 0a34371 src/test/java/org/apache/hadoop/hbase/regionserver/TestRSStatusServlet.java 64e61bb src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java 2d87567 src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java 853a35f src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 6e89cc4 src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/rest/TestTransform.java 2e2ba4c src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java 12247d0 src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java 477141f src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 0b45ac1 Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          @Lars

          Unless we know when exactly the dictionary is flushed, we can't rebuild the original HLog, can't we?

          Show
          Li Pi added a comment - @Lars Unless we know when exactly the dictionary is flushed, we can't rebuild the original HLog, can't we?
          Hide
          Lars Hofhansl added a comment -

          The flush will place a special WAL entry. See HLog.completeCacheFlush(...).
          The compressor could take this as a flag to reset the dictionary.

          Show
          Lars Hofhansl added a comment - The flush will place a special WAL entry. See HLog.completeCacheFlush(...). The compressor could take this as a flag to reset the dictionary.
          Hide
          Todd Lipcon added a comment -

          Why reset on flush? Seems to me we need to reset on log roll, but not flush.

          Show
          Todd Lipcon added a comment - Why reset on flush? Seems to me we need to reset on log roll, but not flush.
          Hide
          Lars Hofhansl added a comment -

          On recovery we'd always to have scan the entire log from the beginning. Maybe that's not a big deal, because log size in limited?

          Show
          Lars Hofhansl added a comment - On recovery we'd always to have scan the entire log from the beginning. Maybe that's not a big deal, because log size in limited?
          Hide
          Todd Lipcon added a comment -

          Don't we already have to scan the entire log from the beginning on recovery? Log splitting splits entire segments, afaik. Am I forgetting about some index structure or something?

          Show
          Todd Lipcon added a comment - Don't we already have to scan the entire log from the beginning on recovery? Log splitting splits entire segments, afaik. Am I forgetting about some index structure or something?
          Hide
          Lars Hofhansl added a comment -

          You know more about that than I do
          I'm saying that we do not need to scan the entire log, especially if we add some custom log replaying tools (for example replaying for region).
          If we're not careful now we shut ourselves out from future optimizations.
          Might not be a big deal as the logs are rolled anyway and that naturally limits the amount of WALEdit we have to scan go back to find a dictionary.

          Show
          Lars Hofhansl added a comment - You know more about that than I do I'm saying that we do not need to scan the entire log, especially if we add some custom log replaying tools (for example replaying for region). If we're not careful now we shut ourselves out from future optimizations. Might not be a big deal as the logs are rolled anyway and that naturally limits the amount of WALEdit we have to scan go back to find a dictionary.
          Hide
          Nicolas Spiegelberg added a comment -

          I think, if we want to avoid scanning the entire log and seek as an optimization, we should put more effort into rolling logs at a lower size threshold and having log GC be size-based and get rid of (or greatly raise) the file-count-based pressure.

          In production, the major bottleneck for us in log replay (after distributed log splitting) has been IO dominated. We normally don't max out CPU. Anything we can do to minimize IO size at the expense of CPU would be beneficial to reduction.

          As an aside, do we currently compress the output of our log split? Having the output of the resulting per-region logs be in LZO or GZ format will decrease our reply time, perhaps more than this optimization will. That said, this feature is very useful, just want to make sure that we're not missing less cool but potentially more beneficial optimizations.

          Show
          Nicolas Spiegelberg added a comment - I think, if we want to avoid scanning the entire log and seek as an optimization, we should put more effort into rolling logs at a lower size threshold and having log GC be size-based and get rid of (or greatly raise) the file-count-based pressure. In production, the major bottleneck for us in log replay (after distributed log splitting) has been IO dominated. We normally don't max out CPU. Anything we can do to minimize IO size at the expense of CPU would be beneficial to reduction. As an aside, do we currently compress the output of our log split? Having the output of the resulting per-region logs be in LZO or GZ format will decrease our reply time, perhaps more than this optimization will. That said, this feature is very useful, just want to make sure that we're not missing less cool but potentially more beneficial optimizations.
          Hide
          Todd Lipcon added a comment -

          Nope, we don't currently compress the log-split output. Good idea, Nicolas. We can use both compression mechanisms there - LZO/Snappy on top of the dictionary compression should be very good. The dictionary compression alone will be a big improvement there, though, since we'll save len(region key) bytes per edit guaranteed.

          Show
          Todd Lipcon added a comment - Nope, we don't currently compress the log-split output. Good idea, Nicolas. We can use both compression mechanisms there - LZO/Snappy on top of the dictionary compression should be very good. The dictionary compression alone will be a big improvement there, though, since we'll save len(region key) bytes per edit guaranteed.
          Hide
          Li Pi added a comment -

          I need to run a test against LZO or GZ. I wouldn't be surprised if 4608 is more efficient on some inputs - it's very well tailored for certain kinds of data.

          Show
          Li Pi added a comment - I need to run a test against LZO or GZ. I wouldn't be surprised if 4608 is more efficient on some inputs - it's very well tailored for certain kinds of data.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-24 22:26:21.830142)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          last diff was against the wrong (non-trunk) branch.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:26:21.830142) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- last diff was against the wrong (non-trunk) branch. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02 Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-24 22:27:32.723446)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8
          src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:27:32.723446) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java c92cc02 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18.791094)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18.791094) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4585
          -----------------------------------------------------------

          Nice work.
          Will try out the Compressor tool.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10215>

          Should we verify that length is larger than pos ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10216>

          I would expect different implementations to be instantiated based on the prefix of path.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10217>

          Why do we instantiate Configuration again (there is already one @ line 113) ?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10218>

          Typo, should read 'to start reading from'.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10219>

          NOT_IN_DICTIONARY should be used here.

          • Ted

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4585 ----------------------------------------------------------- Nice work. Will try out the Compressor tool. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10215 > Should we verify that length is larger than pos ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10216 > I would expect different implementations to be instantiated based on the prefix of path. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10217 > Why do we instantiate Configuration again (there is already one @ line 113) ? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10218 > Typo, should read 'to start reading from'. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10219 > NOT_IN_DICTIONARY should be used here. Ted On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          Only got about halfway through. Will continue to look soon. Overall looking pretty good!

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10459>

          I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
          <https://reviews.apache.org/r/2740/#comment10460>

          rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java
          <https://reviews.apache.org/r/2740/#comment10461>

          Since this is so simple, I'd move it to be a static inner class of KVCompression above

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10462>

          I think we can merge this with the other class that just has static methods as well.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10463>

          this function requires that the whole log data fit in RAM - not a great assumption

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10464>

          why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10465>

          switch order of "in" and "offset" here.

          Perhaps clearer to name this as "uncompressIntoArray"?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10467>

          worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10466>

          *un*compressed value, right?

          • Todd

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- Only got about halfway through. Will continue to look soon. Overall looking pretty good! src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10459 > I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java < https://reviews.apache.org/r/2740/#comment10460 > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java < https://reviews.apache.org/r/2740/#comment10461 > Since this is so simple, I'd move it to be a static inner class of KVCompression above src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10462 > I think we can merge this with the other class that just has static methods as well. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10463 > this function requires that the whole log data fit in RAM - not a great assumption src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10464 > why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10465 > switch order of "in" and "offset" here. Perhaps clearer to name this as "uncompressIntoArray"? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10467 > worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10466 > *un*compressed value, right? Todd On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4736
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10469>

          If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29, we should be able to tell that the queue is full.
          This implies that readFile() would be called multiple times for a single file.

          • Ted

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4736 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10469 > If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29 , we should be able to tell that the queue is full. This implies that readFile() would be called multiple times for a single file. Ted On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:50:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>

          >

          > If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29, we should be able to tell that the queue is full.

          > This implies that readFile() would be called multiple times for a single file.

          That's beside the point. Using a queue here is just silly. reading a file should probably be a different interface altogether rather than writing to a queue – ie it should be a pull interface, not a push.

          I also mentioned to Li offline that it would make sense to add a metadata header to the HLog sequencefiles which indicates that they're compressed. In that case, this code could just use the existing log reader code and log writer code, but vary the output between compressed/uncompressed using the configuration flag.

          • Todd

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4736
          -----------------------------------------------------------

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:50:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100 > > > If we use http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ArrayBlockingQueue.html#offer%28E,%20long,%20java.util.concurrent.TimeUnit%29 , we should be able to tell that the queue is full. > This implies that readFile() would be called multiple times for a single file. That's beside the point. Using a queue here is just silly. reading a file should probably be a different interface altogether rather than writing to a queue – ie it should be a pull interface, not a push. I also mentioned to Li offline that it would make sense to add a metadata header to the HLog sequencefiles which indicates that they're compressed. In that case, this code could just use the existing log reader code and log writer code, but vary the output between compressed/uncompressed using the configuration flag. Todd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4736 ----------------------------------------------------------- On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Kannan Muthukkaruppan added a comment -

          Li: Is there a writeup/description of the scheme that this patch is implementing? If not, would you mind giving a quick overview. Thanks much.

          Show
          Kannan Muthukkaruppan added a comment - Li: Is there a writeup/description of the scheme that this patch is implementing? If not, would you mind giving a quick overview. Thanks much.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4852
          -----------------------------------------------------------

          I tried to use the command line tool to compress an HLog written by 0.92 and got the follwoing:

          Exception in thread "main" java.lang.NullPointerException
          at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192)
          at org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104)
          at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64)

          Also, if you use the command line tool with no arguments, it should print its help (right now it prints an IndexOutOfBOundsException).

          I'll try again with an hlog written by trunk - I'm guessing the hlog serialization version might have changed or something.

          • Todd

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4852 ----------------------------------------------------------- I tried to use the command line tool to compress an HLog written by 0.92 and got the follwoing: Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.close(SequenceFileLogReader.java:192) at org.apache.hadoop.hbase.regionserver.wal.Compressor.readFile(Compressor.java:104) at org.apache.hadoop.hbase.regionserver.wal.Compressor.main(Compressor.java:64) Also, if you use the command line tool with no arguments, it should print its help (right now it prints an IndexOutOfBOundsException). I'll try again with an hlog written by trunk - I'm guessing the hlog serialization version might have changed or something. Todd On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4853
          -----------------------------------------------------------

          I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think.

          I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10650>

          invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10651>

          I think the better way of expressing this usage would be:

          WALCompressor [-u | -c] <input> <output>

          -u - uncompresses the input log
          -c - compresses the output log

          Exactly one of -u or -c must be specified

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment10649>

          this code doesn't work properly. Here's what you want to do:

          Configuration conf = new Configuration();
          FileSystem fs = path.getFileSystem(conf);

          • Todd

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 ----------------------------------------------------------- I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think. I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10650 > invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10651 > I think the better way of expressing this usage would be: WALCompressor [-u | -c] <input> <output> -u - uncompresses the input log -c - compresses the output log Exactly one of -u or -c must be specified src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment10649 > this code doesn't work properly. Here's what you want to do: Configuration conf = new Configuration(); FileSystem fs = path.getFileSystem(conf); Todd On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          The compression uses 2 byte dictionary indices, so the first 255 entries should start off with 0x00. This might be causing it.

          @Karthik, I'll try to get documentation out when I'm less busy. This quarter is pretty painful so far.

          Show
          Li Pi added a comment - The compression uses 2 byte dictionary indices, so the first 255 entries should start off with 0x00. This might be causing it. @Karthik, I'll try to get documentation out when I'm less busy. This quarter is pretty painful so far.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5066
          -----------------------------------------------------------

          Nice patch and good job ! I have two questions inline and maybe I just misunderstood the code.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11122>

          WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11120>

          Should the data be added back to the dict in this case?
          dict.addEntry(data) ?

          • Liyin

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5066 ----------------------------------------------------------- Nice patch and good job ! I have two questions inline and maybe I just misunderstood the code. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11122 > WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11120 > Should the data be added back to the dict in this case? dict.addEntry(data) ? Liyin On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5068
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11123>

          Look like there are side effect to call findEntry() since you will put the data into the dictionary.

          • Liyin

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5068 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11123 > Look like there are side effect to call findEntry() since you will put the data into the dictionary. Liyin On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>

          >

          > I would expect different implementations to be instantiated based on the prefix of path.

          I figured people would only use this on their local machine. I guess the path can actually point to HDFS. Got any examples of how to do this easily?

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 116

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line116>

          >

          > Why do we instantiate Configuration again (there is already one @ line 113) ?

          Hmm. Good point. Waste of heap, but I wasn't really optimizing the command line tool. Fixed!

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 71

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line71>

          >

          > Should we verify that length is larger than pos ?

          I don't think it makes a difference.

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 169

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line169>

          >

          > Typo, should read 'to start reading from'.

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4585
          -----------------------------------------------------------

          On 2012-01-24 22:29:18, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-01-24 22:29:18)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112 > > > I would expect different implementations to be instantiated based on the prefix of path. I figured people would only use this on their local machine. I guess the path can actually point to HDFS. Got any examples of how to do this easily? On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 116 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line116 > > > Why do we instantiate Configuration again (there is already one @ line 113) ? Hmm. Good point. Waste of heap, but I wasn't really optimizing the command line tool. Fixed! On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 71 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line71 > > > Should we verify that length is larger than pos ? I don't think it makes a difference. On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 169 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line169 > > > Typo, should read 'to start reading from'. fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4585 ----------------------------------------------------------- On 2012-01-24 22:29:18, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-01-24 22:29:18) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 8370ef8 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45.411924)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          fixed as per ted yu's review

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45.411924) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- fixed as per ted yu's review Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5113
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11203>

          FileSystem has the following methods:

          /** Returns the configured filesystem implementation.*/
          public static FileSystem get(Configuration conf) throws IOException {

          public static FileSystem get(URI uri, Configuration conf) throws IOException {

          I think the second get() should allow you to read HLog on hdfs

          • Ted

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5113 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11203 > FileSystem has the following methods: /** Returns the configured filesystem implementation.*/ public static FileSystem get(Configuration conf) throws IOException { public static FileSystem get(URI uri, Configuration conf) throws IOException { I think the second get() should allow you to read HLog on hdfs Ted On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-15 05:23:04, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>

          >

          > FileSystem has the following methods:

          >

          > /** Returns the configured filesystem implementation.*/

          > public static FileSystem get(Configuration conf) throws IOException {

          >

          > public static FileSystem get(URI uri, Configuration conf) throws IOException {

          >

          > I think the second get() should allow you to read HLog on hdfs

          see my earlier comment on this review: path.getFilesystem(conf) is what you want to use

          • Todd

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5113
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-15 05:23:04, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112 > > > FileSystem has the following methods: > > /** Returns the configured filesystem implementation.*/ > public static FileSystem get(Configuration conf) throws IOException { > > public static FileSystem get(URI uri, Configuration conf) throws IOException { > > I think the second get() should allow you to read HLog on hdfs see my earlier comment on this review: path.getFilesystem(conf) is what you want to use Todd ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5113 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Ted Yu added a comment -

          @Li:
          Do you have time to address Todd and Liying's comments ?

          Thanks

          Show
          Ted Yu added a comment - @Li: Do you have time to address Todd and Liying's comments ? Thanks
          Hide
          Li Pi added a comment -

          Doing so right now. Will be done before weekend.

          Show
          Li Pi added a comment - Doing so right now. Will be done before weekend.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 37

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line37>

          >

          > I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue.

          fixed. legacy name. <3 eclipse.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 207

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line207>

          >

          > *un*compressed value, right?

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 28

          > <https://reviews.apache.org/r/2740/diff/16/?file=70701#file70701line28>

          >

          > Since this is so simple, I'd move it to be a static inner class of KVCompression above

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 152

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line152>

          >

          > why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 174

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line174>

          >

          > switch order of "in" and "offset" here.

          >

          > Perhaps clearer to name this as "uncompressIntoArray"?

          fixed.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 44

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line44>

          >

          > I think we can merge this with the other class that just has static methods as well.

          Compressor contains static methods for general purpose compression. KeyValueCompression.java contains static methods for compressing the KeyValue type. Should I merge them?

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 185

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line185>

          >

          > worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary

          done

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96>

          >

          > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.

          This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary.

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>

          >

          > this function requires that the whole log data fit in RAM - not a great assumption

          old one. will do eventually...

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 37 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line37 > > > I'd rename this class to KeyValueCompression or even KVCompression. Then rename readFields to just "read" – since this is just utility functions, not actually an instance of a compressed keyvalue. fixed. legacy name. <3 eclipse. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 207 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line207 > > > *un*compressed value, right? fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java, line 28 > < https://reviews.apache.org/r/2740/diff/16/?file=70701#file70701line28 > > > Since this is so simple, I'd move it to be a static inner class of KVCompression above fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 152 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line152 > > > why is this split into two if/elses? looks like the top clauses can be combined, as can the bottom clauses fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 174 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line174 > > > switch order of "in" and "offset" here. > > Perhaps clearer to name this as "uncompressIntoArray"? fixed. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 44 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line44 > > > I think we can merge this with the other class that just has static methods as well. Compressor contains static methods for general purpose compression. KeyValueCompression.java contains static methods for compressing the KeyValue type. Should I merge them? On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 185 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line185 > > > worth a comment here to explain that the "status" byte actually has the high-order byte of the dictionary entry in the case that it's in the dictionary done On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96 > > > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here. This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary. On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100 > > > this function requires that the whole log data fit in RAM - not a great assumption old one. will do eventually... Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 02:29:24, Liyin Tang wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 42

          > <https://reviews.apache.org/r/2740/diff/16/?file=70705#file70705line42>

          >

          > Look like there are side effect to call findEntry() since you will put the data into the dictionary.

          >

          This is intentional. When we look for an entry, that means we intend to compress with it. If we don't find it, then its inserted into the dictionary.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5068
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 02:29:24, Liyin Tang wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 42 > < https://reviews.apache.org/r/2740/diff/16/?file=70705#file70705line42 > > > Look like there are side effect to call findEntry() since you will put the data into the dictionary. > This is intentional. When we look for an entry, that means we intend to compress with it. If we don't find it, then its inserted into the dictionary. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5068 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 01:33:09, Liyin Tang wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 230

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line230>

          >

          > Should the data be added back to the dict in this case?

          > dict.addEntry(data) ?

          This is taken care of during findentry.

          On 2012-02-14 01:33:09, Liyin Tang wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 192

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line192>

          >

          > WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function.

          This is part of the way HBase stores data uncompressed. It doesn't use an vInt.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5066
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 01:33:09, Liyin Tang wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 230 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line230 > > > Should the data be added back to the dict in this case? > dict.addEntry(data) ? This is taken care of during findentry. On 2012-02-14 01:33:09, Liyin Tang wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 192 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line192 > > > WritableUtils.getVIntSize could help you to decide how many bytes are need for the entry. So you don't need to pass down sizeBytes in this function. This is part of the way HBase stores data uncompressed. It doesn't use an vInt. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5066 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-15 05:23:04, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112>

          >

          > FileSystem has the following methods:

          >

          > /** Returns the configured filesystem implementation.*/

          > public static FileSystem get(Configuration conf) throws IOException {

          >

          > public static FileSystem get(URI uri, Configuration conf) throws IOException {

          >

          > I think the second get() should allow you to read HLog on hdfs

          Todd Lipcon wrote:

          see my earlier comment on this review: path.getFilesystem(conf) is what you want to use

          fixed. hopefully this should work.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5113
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-15 05:23:04, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 112 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line112 > > > FileSystem has the following methods: > > /** Returns the configured filesystem implementation.*/ > public static FileSystem get(Configuration conf) throws IOException { > > public static FileSystem get(URI uri, Configuration conf) throws IOException { > > I think the second get() should allow you to read HLog on hdfs Todd Lipcon wrote: see my earlier comment on this review: path.getFilesystem(conf) is what you want to use fixed. hopefully this should work. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5113 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think.

          >

          > I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed.

          checked it out. looks like in YCSB workloads the 0x00 bytes are actually indexes pointing to the 0th entry of the dictionary.

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 52

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52>

          >

          > invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments

          fixed.

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 86-88

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86>

          >

          > this code doesn't work properly. Here's what you want to do:

          >

          > Configuration conf = new Configuration();

          > FileSystem fs = path.getFileSystem(conf);

          >

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4853
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-07 02:58:00, Todd Lipcon wrote: > I tried the compression tool on a log created by YCSB in "load" mode with the standard dataset. Since the values are fairly large here (100 bytes) it didn't get a huge compression ratio - from about 64MB down to 52MB (~20%). But still not bad. I looked at the resulting data using xxd and it looks like there's still a number of places where we could use variable length integers instead of non-variable length. I wrote a quick C program to count the number of 0x00 bytes in the log and found about 3MB worth (~5%). Since the actual table data is all human-readable text in this case, all of the 0x00s should be able to be compressed away, I think. > > I also tested on a YCSB workload where each row has 1000 columns of 4 bytes each (similar to an indexing workload) and the compression ratio was 60% (64M down to 25M) with another 4.2MB of 0x00 bytes which could probably be removed. checked it out. looks like in YCSB workloads the 0x00 bytes are actually indexes pointing to the 0th entry of the dictionary. On 2012-02-07 02:58:00, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 52 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line52 > > > invert the order of these || clauses - otherwise you get an out-of-bounds just running the tool with no arguments fixed. On 2012-02-07 02:58:00, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, lines 86-88 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line86 > > > this code doesn't work properly. Here's what you want to do: > > Configuration conf = new Configuration(); > FileSystem fs = path.getFileSystem(conf); > fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-07 02:58:00, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 74

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line74>

          >

          > I think the better way of expressing this usage would be:

          >

          > WALCompressor [-u | -c] <input> <output>

          >

          > -u - uncompresses the input log

          > -c - compresses the output log

          >

          > Exactly one of -u or -c must be specified

          >

          >

          fixed

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4853
          -----------------------------------------------------------

          On 2012-02-15 04:57:45, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-15 04:57:45)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-07 02:58:00, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 74 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line74 > > > I think the better way of expressing this usage would be: > > WALCompressor [-u | -c] <input> <output> > > -u - uncompresses the input log > -c - compresses the output log > > Exactly one of -u or -c must be specified > > fixed Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4853 ----------------------------------------------------------- On 2012-02-15 04:57:45, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-15 04:57:45) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressionContext.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java e46a7a0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-25 06:20:23, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 226

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line226>

          >

          > NOT_IN_DICTIONARY should be used here.

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4585
          -----------------------------------------------------------

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-25 06:20:23, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 226 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line226 > > > NOT_IN_DICTIONARY should be used here. fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4585 ----------------------------------------------------------- On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100

          > <https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100>

          >

          > this function requires that the whole log data fit in RAM - not a great assumption

          Li Pi wrote:

          old one. will do eventually...

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 100 > < https://reviews.apache.org/r/2740/diff/16/?file=70702#file70702line100 > > > this function requires that the whole log data fit in RAM - not a great assumption Li Pi wrote: old one. will do eventually... fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20.464648)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          addresses changes by reviewers above.

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20.464648) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- addresses changes by reviewers above. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-01 02:29:54, Todd Lipcon wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96

          > <https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96>

          >

          > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here.

          Li Pi wrote:

          This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary.

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review4732
          -----------------------------------------------------------

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-01 02:29:54, Todd Lipcon wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java, line 96 > < https://reviews.apache.org/r/2740/diff/16/?file=70700#file70700line96 > > > rather than using keyVal.getRow(), keyVal.getFamily(), keyVal.getQualifer(), you should use the versions of those functions that just return offsets and lengths (eg getKeyOffset, getKeyLength). Then expand the writeCompressed API to take (byte[] buf, int off, int len). Otherwise you're making needless copies/garbage here. Li Pi wrote: This is gonna take a while. Since I'm currently relying on default Array.HashCode. Will need to use Bytes.HashCode and do a wrapper for insertion into the dictionary. fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review4732 ----------------------------------------------------------- On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          Li Pi added a comment -

          @Kannan - heres the quick overview on 4608:

          When writing the HLog, it checks a set of dictionaries for the key, cf, qualifier, tablename, and regionname. If these items happen to be in the dictionary, it writes the index, instead of the item. If the item is not in the dictionary, it is added to the dictionary.

          When reading from the HLog, it works in the opposite manner. When it encounters an uncompressed item, it adds it to the dictionary. If it encounters an index, it just fetches what it needs from the dictionary.

          The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes per index. (shorts). There is a seperate dictionary for every different field (e.g. one for tablenames, one for regionnames...).

          The dictionary merely must be consistent, if given a bunch of things in a certain order, it should always assign them the same indices, and always evict in the exact same fashion.

          This seems to work fairly well - and noticeably cuts down our write sizes on the vast majority of workloads.

          Show
          Li Pi added a comment - @Kannan - heres the quick overview on 4608: When writing the HLog, it checks a set of dictionaries for the key, cf, qualifier, tablename, and regionname. If these items happen to be in the dictionary, it writes the index, instead of the item. If the item is not in the dictionary, it is added to the dictionary. When reading from the HLog, it works in the opposite manner. When it encounters an uncompressed item, it adds it to the dictionary. If it encounters an index, it just fetches what it needs from the dictionary. The dictionary itself is a simple LRU dictionary, that by default, uses 2 bytes per index. (shorts). There is a seperate dictionary for every different field (e.g. one for tablenames, one for regionnames...). The dictionary merely must be consistent, if given a bunch of things in a certain order, it should always assign them the same indices, and always evict in the exact same fashion. This seems to work fairly well - and noticeably cuts down our write sizes on the vast majority of workloads.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5254
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11473>

          This comment should also be placed at the beginning of compressFile().

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11472>

          Typo: should be output.getFileSystem(outconf)

          • Ted

          On 2012-02-21 19:29:20, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-21 19:29:20)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5254 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11473 > This comment should also be placed at the beginning of compressFile(). src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11472 > Typo: should be output.getFileSystem(outconf) Ted On 2012-02-21 19:29:20, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-21 19:29:20) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 763fe89 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/
          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12.923539)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Changes
          -------

          fixed typos

          Summary
          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.
          https://issues.apache.org/jira/browse/HBase-4608

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd
          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing
          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12.923539) Review request for hbase, Eli Collins and Todd Lipcon. Changes ------- fixed typos Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-21 23:30:35, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 61

          > <https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line61>

          >

          > This comment should also be placed at the beginning of compressFile().

          removed the comment, not necessary anymore.

          On 2012-02-21 23:30:35, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 88

          > <https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line88>

          >

          > Typo: should be output.getFileSystem(outconf)

          fixed.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5254
          -----------------------------------------------------------

          On 2012-02-22 03:46:12, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-21 23:30:35, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 61 > < https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line61 > > > This comment should also be placed at the beginning of compressFile(). removed the comment, not necessary anymore. On 2012-02-21 23:30:35, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 88 > < https://reviews.apache.org/r/2740/diff/18/?file=78498#file78498line88 > > > Typo: should be output.getFileSystem(outconf) fixed. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5254 ----------------------------------------------------------- On 2012-02-22 03:46:12, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5265
          -----------------------------------------------------------

          This looks great. Some small comments below.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11488>

          Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go?

          Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later).

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11489>

          The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11490>

          Why is the tool called WALCompressor in the usage but the class I invoke is Compressor?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11491>

          This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11492>

          Doc the '@return'

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
          <https://reviews.apache.org/r/2740/#comment11493>

          Doc the return

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/2740/#comment11494>

          White space

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
          <https://reviews.apache.org/r/2740/#comment11495>

          When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
          <https://reviews.apache.org/r/2740/#comment11496>

          I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
          <https://reviews.apache.org/r/2740/#comment11497>

          Class comment on what this is about?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java
          <https://reviews.apache.org/r/2740/#comment11498>

          Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11499>

          Why would I ever let go of terms in the dictionary? Should you explain why in class comment?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11501>

          Should this be static? Does it need reference to outer class?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java
          <https://reviews.apache.org/r/2740/#comment11502>

          Class comment? Should this be static?

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
          <https://reviews.apache.org/r/2740/#comment11503>

          Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here.

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
          <https://reviews.apache.org/r/2740/#comment11504>

          Should be just called Dictionary. Its in the wal package. No need of the redundant prefix?

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java
          <https://reviews.apache.org/r/2740/#comment11505>

          This will run all the tests in TestWALReplay? Nice.

          • Michael

          On 2012-02-22 03:46:12, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5265 ----------------------------------------------------------- This looks great. Some small comments below. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11488 > Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go? Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later). src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11489 > The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent. src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11490 > Why is the tool called WALCompressor in the usage but the class I invoke is Compressor? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11491 > This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here? src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11492 > Doc the '@return' src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java < https://reviews.apache.org/r/2740/#comment11493 > Doc the return src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/2740/#comment11494 > White space src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java < https://reviews.apache.org/r/2740/#comment11495 > When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle? src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java < https://reviews.apache.org/r/2740/#comment11496 > I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null? src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java < https://reviews.apache.org/r/2740/#comment11497 > Class comment on what this is about? src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java < https://reviews.apache.org/r/2740/#comment11498 > Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance? src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11499 > Why would I ever let go of terms in the dictionary? Should you explain why in class comment? src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11501 > Should this be static? Does it need reference to outer class? src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java < https://reviews.apache.org/r/2740/#comment11502 > Class comment? Should this be static? src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java < https://reviews.apache.org/r/2740/#comment11503 > Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java < https://reviews.apache.org/r/2740/#comment11504 > Should be just called Dictionary. Its in the wal package. No need of the redundant prefix? src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java < https://reviews.apache.org/r/2740/#comment11505 > This will run all the tests in TestWALReplay? Nice. Michael On 2012-02-22 03:46:12, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION Diff: https://reviews.apache.org/r/2740/diff Testing ------- Thanks, Li
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line37>

          >

          > Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go?

          >

          > Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later).

          Included!

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 47

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line47>

          >

          > The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent.

          System.out.println("Exactly one of -u or -c must be specified"); should take care of the required thing.

          Help now takes both short and long forms. Everything else just takes short forms.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 66

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line66>

          >

          > Why is the tool called WALCompressor in the usage but the class I invoke is Compressor?

          Probably should be called compressor.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 79

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line79>

          >

          > This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here?

          Not really. All that matters is whether compression is on or off.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line108>

          >

          > Doc the '@return'

          fixed.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 141

          > <https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line141>

          >

          > Doc the return

          fixed.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1671

          > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1671>

          >

          > White space

          fixed.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1675

          > <https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1675>

          >

          > When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle?

          Its called when an logwriter is created. We will start compression a log in the middle if we happen to call it at that time. But that shouldn't happen.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 270

          > <https://reviews.apache.org/r/2740/diff/19/?file=78624#file78624line270>

          >

          > I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null?

          HLogKey has two different formats. If len < 0, that means we're reading the old version of the HLog.

          Keycontext is the compression context that holds the dictionaries used in compression. If it isn't null, that means compression is enabled.

          If len > 0, we're on version 1. We can't compress version 1, but the code for reading version 1 is still in there, for transitioning from earlier HLogs. Compression should never be enabled if we're reading in version 1 Hlogs, because there shouldn't be any version 1 hlogs.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 119

          > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line119>

          >

          > Class comment on what this is about?

          Just a tuple class for holding the various dictionaries used in compression.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 141

          > <https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line141>

          >

          > Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance?

          Correct. We clear it so we can recycle this instance instead of having to create a new dictionary. Not sure if this makes a huge difference in terms of performance.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 29

          > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line29>

          >

          > Why would I ever let go of terms in the dictionary? Should you explain why in class comment?

          We let go of terms in the dictionary since we have only an finite amount of space, and ability to reference terms of the dictionary.

          If we're using a 2 byte key, that limits our reference space to 65536. We could end up using vints for entries into the dictionary, but this could end up with it growing pretty huge.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 64

          > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line64>

          >

          > Should this be static? Does it need reference to outer class?

          It doesn't need to reference the outer class. Made static.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 168

          > <https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line168>

          >

          > Class comment? Should this be static?

          made static.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 176

          > <https://reviews.apache.org/r/2740/diff/19/?file=78627#file78627line176>

          >

          > Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here.

          We just burp if compression is on and we get fed an uncompressed file. This should be easy to change though - on the read side.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 28

          > <https://reviews.apache.org/r/2740/diff/19/?file=78629#file78629line28>

          >

          > Should be just called Dictionary. Its in the wal package. No need of the redundant prefix?

          Sure. But we have WALActionsListener and a bunch of other things starting with WAL. I figured we can just have that as well.

          Renamed to dictionary.

          On 2012-02-22 05:11:37, Michael Stack wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 38

          > <https://reviews.apache.org/r/2740/diff/19/?file=78634#file78634line38>

          >

          > This will run all the tests in TestWALReplay? Nice.

          Yup. thats exactly what it does.

          • Li

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2740/#review5265
          -----------------------------------------------------------

          On 2012-02-22 03:46:12, Li Pi wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2740/

          -----------------------------------------------------------

          (Updated 2012-02-22 03:46:12)

          Review request for hbase, Eli Collins and Todd Lipcon.

          Summary

          -------

          HLog compression. Has unit tests and a command line tool for compressing/decompressing.

          This addresses bug HBase-4608.

          https://issues.apache.org/jira/browse/HBase-4608

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java ead9a3b

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLRUDictionary.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 23d27fd

          src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2740/diff

          Testing

          -------

          Thanks,

          Li

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 37 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line37 > > > Should this javadoc here in the class include the notes you made for Kannan where you describe how it all works? If not here, where else will doc. on how the Compressor works go? > > Maybe you should purge all mention of WAL from this class – e.g. WALDictionary – because it seems like it could be easily generalized (I suppose we can do that later). Included! On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 47 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line47 > > > The way the usage is written, -u and -c are optional. You should fix that. Looks like they are required going by fact that args.length needs to be 3. Also, it looks like you take --help, the long form, or -u/-c the short forms. Either take all short forms or take both long and short form to be consistent. System.out.println("Exactly one of -u or -c must be specified"); should take care of the required thing. Help now takes both short and long forms. Everything else just takes short forms. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 66 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line66 > > > Why is the tool called WALCompressor in the usage but the class I invoke is Compressor? Probably should be called compressor. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 79 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line79 > > > This does not need to be an HBaseConfiguration? There are no configs in hbase-site.xml that might effect whats going on here? Not really. All that matters is whether compression is on or off. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 108 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line108 > > > Doc the '@return' fixed. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java, line 141 > < https://reviews.apache.org/r/2740/diff/19/?file=78622#file78622line141 > > > Doc the return fixed. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1671 > < https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1671 > > > White space fixed. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java, line 1675 > < https://reviews.apache.org/r/2740/diff/19/?file=78623#file78623line1675 > > > When is this called? Post construction? Should it be part of constructor? What happens if its called part way through the writing of a WAL? Will we start compressing a WAL in the middle? Its called when an logwriter is created. We will start compression a log in the middle if we happen to call it at that time. But that shouldn't happen. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java, line 270 > < https://reviews.apache.org/r/2740/diff/19/?file=78624#file78624line270 > > > I don't follow whats going on here. What happens when len >= 0? Why is it < 0? Whats that mean? Whats v2 of hlogkey? What if keyContext is not null? HLogKey has two different formats. If len < 0, that means we're reading the old version of the HLog. Keycontext is the compression context that holds the dictionaries used in compression. If it isn't null, that means compression is enabled. If len > 0, we're on version 1. We can't compress version 1, but the code for reading version 1 is still in there, for transitioning from earlier HLogs. Compression should never be enabled if we're reading in version 1 Hlogs, because there shouldn't be any version 1 hlogs. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 119 > < https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line119 > > > Class comment on what this is about? Just a tuple class for holding the various dictionaries used in compression. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java, line 141 > < https://reviews.apache.org/r/2740/diff/19/?file=78625#file78625line141 > > > Why do I clear this? Why not just throw it away? Does clearing make it so I can recycle this instance? Correct. We clear it so we can recycle this instance instead of having to create a new dictionary. Not sure if this makes a huge difference in terms of performance. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 29 > < https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line29 > > > Why would I ever let go of terms in the dictionary? Should you explain why in class comment? We let go of terms in the dictionary since we have only an finite amount of space, and ability to reference terms of the dictionary. If we're using a 2 byte key, that limits our reference space to 65536. We could end up using vints for entries into the dictionary, but this could end up with it growing pretty huge. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 64 > < https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line64 > > > Should this be static? Does it need reference to outer class? It doesn't need to reference the outer class. Made static. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java, line 168 > < https://reviews.apache.org/r/2740/diff/19/?file=78626#file78626line168 > > > Class comment? Should this be static? made static. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java, line 176 > < https://reviews.apache.org/r/2740/diff/19/?file=78627#file78627line176 > > > Why am I reading whether compression is on or off by looking at config? Why am I not looking into head of the WAL file and figure its compressed and then decompressing? Otherwise, if config is disabled but I'm fed a compressed file, do I just burp? See the white space added here. We just burp if compression is on and we get fed an uncompressed file. This should be easy to change though - on the read side. On 2012-02-22 05:11:37, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java, line 28 > < https://reviews.apache.org/r/2740/diff/19/?file=78629#file78629line28 > > > Should be just called Dictionary. Its in the wal package. No need of the redundant prefix? Sure. But we have WALActionsListener and a bunch of other things starting with WAL. I figured we can just have that as well. Renamed to dictionary. On 2012-02-22 05:11:37, Michael Stack wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplayCompressed.java, line 38 > < https://reviews.apache.org/r/2740/diff/19/?file=78634#file78634line38 > > > This will run all the tests in TestWALReplay? Nice. Yup. thats exactly what it does. Li ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/#review5265 ----------------------------------------------------------- On 2012-02-22 03:46:12, Li Pi wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2740/ ----------------------------------------------------------- (Updated 2012-02-22 03:46:12) Review request for hbase, Eli Collins and Todd Lipcon. Summary ------- HLog compression. Has unit tests and a command line tool for compressing/decompressing. This addresses bug HBase-4608. https://issues.apache.org/jira/browse/HBase-4608 Diffs ----- src/main/java/org/apache/hadoop/hbase/HConstants.java 35339b6 src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java c945a99 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 src/main/java/org/apache/hadoop/hbase/regionserver/wal/KeyValueCompression.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/LRUDictionary.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION