HBase
  1. HBase
  2. HBASE-3691

Add compressor support for 'snappy', google's compressor

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.7, 0.92.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added support for Google's Snappy compression codec.

      Description

      http://code.google.com/p/snappy/ is apache licensed.

      Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

      Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as "Zippy" in some presentations and the likes.)

      Lets get it in.

      1. 3691-addendum.txt
        0.7 kB
        stack
      2. hbase-snappy-0.90.6.patch
        5 kB
        Chris Waterson
      3. hbase-snappy-3691-trunk.patch
        1 kB
        Nicholas Telford
      4. hbase-snappy-3691-trunk-002.patch
        1 kB
        Nicholas Telford
      5. hbase-snappy-3691-trunk-003.patch
        6 kB
        Nichole Treadway
      6. hbase-snappy-3691-trunk-004.patch
        5 kB
        Nicholas Telford

        Issue Links

          Activity

          Hide
          Jonathan Gray added a comment -

          It's slightly faster for both compression and decompression when compared to LZO (169/434 vs. 250/500).

          I'm unsure of the difference in compression ratios but we can ship with it, yay

          Show
          Jonathan Gray added a comment - It's slightly faster for both compression and decompression when compared to LZO (169/434 vs. 250/500). I'm unsure of the difference in compression ratios but we can ship with it, yay
          Hide
          Nicholas Telford added a comment -

          As far as I can tell this is all that's required in Hbase to add support for Snappy. Since it's an optional runtime dependency, and we can guarantee the class name (SnappyCodec) and it's interface (CompressionCodec) we're not actually blocked by the addition of the CompressionCodec itself (HADOOP-7206).

          I've tested this against the preliminary support for Snappy in HADOOP-7206, as far as I can tell they're simply waiting on some licensing constraints to be resolved (which doesn't affect this patch).

          Show
          Nicholas Telford added a comment - As far as I can tell this is all that's required in Hbase to add support for Snappy. Since it's an optional runtime dependency, and we can guarantee the class name (SnappyCodec) and it's interface (CompressionCodec) we're not actually blocked by the addition of the CompressionCodec itself ( HADOOP-7206 ). I've tested this against the preliminary support for Snappy in HADOOP-7206 , as far as I can tell they're simply waiting on some licensing constraints to be resolved (which doesn't affect this patch).
          Hide
          Nicholas Telford added a comment -

          The patch itself.

          Show
          Nicholas Telford added a comment - The patch itself.
          Hide
          Nicholas Telford added a comment -

          Seems I'd accidentally based the patch against 0.90.2, not trunk.

          Re-based against trunk.

          Show
          Nicholas Telford added a comment - Seems I'd accidentally based the patch against 0.90.2, not trunk. Re-based against trunk.
          Hide
          stack added a comment -

          Nicolas: Any chance of a bit of doc. on what you did to make it work? Add a sentence or two here and I'll add it over into http://hbase.apache.org/book.html#compression on commit. Good stuff.

          Show
          stack added a comment - Nicolas: Any chance of a bit of doc. on what you did to make it work? Add a sentence or two here and I'll add it over into http://hbase.apache.org/book.html#compression on commit. Good stuff.
          Hide
          Nichole Treadway added a comment -

          Thanks for the patch...I made a few additional changes in HColumnDescriptor, and I updated the test files to include snappy.

          I noticed there are places in the hbase.avro classes where snappy support would need to be added in. Is it ok to add these changes in the patch, or do the avro classes need to be auto-generated somehow?

          Show
          Nichole Treadway added a comment - Thanks for the patch...I made a few additional changes in HColumnDescriptor, and I updated the test files to include snappy. I noticed there are places in the hbase.avro classes where snappy support would need to be added in. Is it ok to add these changes in the patch, or do the avro classes need to be auto-generated somehow?
          Hide
          Nichole Treadway added a comment -

          Accidentally selected wrong license option.

          Show
          Nichole Treadway added a comment - Accidentally selected wrong license option.
          Hide
          Nicholas Telford added a comment -

          Thanks Nichole, without your patch to HColumnDescriptor it wasn't possible to use snappy. I'd only tested it using CompressionTest, which I see now is not a complete enough test: it only tests that compression on an HFile works, not that Column Families can use it.

          One thing that does concern me: it seems as though in your patch the Algorithm implementation for SNAPPY has moved places in the enum. From the comments it sounds like it should be added as the last implementation to avoid breaking HFiles compressed with the other implementations. This looks like it may just be a merge glitch when you first applied my patch.

          Using Nichole's patch, the steps to getting Snappy working are currently:

          1. Install hadoop-snappy using these instructions: http://code.google.com/p/hadoop-snappy/
          2. You need to ensure the hadoop-snappy libs (incl. the native libs) are in the HBase classpath. Unless there are any other recommendations, I just symlinked the libs from HADOOP_HOME/lib to HBASE_HOME/lib. This needs to be done on all HBase nodes, as with LZO.
          3. Use CompressionTest to verify snappy support is enabled and the libs can be loaded:

            $ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy

          4. Create a column family with snappy compression and verify it:

            $ hbase shell
            > create 't1', { NAME => 'cf1', COMPRESSION => 'snappy' }
            > describe 't1'

          In the output of the "describe" command, you need to ensure it lists "COMPRESSION => 'snappy'"

          Show
          Nicholas Telford added a comment - Thanks Nichole, without your patch to HColumnDescriptor it wasn't possible to use snappy. I'd only tested it using CompressionTest, which I see now is not a complete enough test: it only tests that compression on an HFile works, not that Column Families can use it. One thing that does concern me: it seems as though in your patch the Algorithm implementation for SNAPPY has moved places in the enum. From the comments it sounds like it should be added as the last implementation to avoid breaking HFiles compressed with the other implementations. This looks like it may just be a merge glitch when you first applied my patch. Using Nichole's patch, the steps to getting Snappy working are currently: Install hadoop-snappy using these instructions: http://code.google.com/p/hadoop-snappy/ You need to ensure the hadoop-snappy libs (incl. the native libs) are in the HBase classpath. Unless there are any other recommendations, I just symlinked the libs from HADOOP_HOME/lib to HBASE_HOME/lib. This needs to be done on all HBase nodes, as with LZO. Use CompressionTest to verify snappy support is enabled and the libs can be loaded: $ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy Create a column family with snappy compression and verify it: $ hbase shell > create 't1', { NAME => 'cf1', COMPRESSION => 'snappy' } > describe 't1' In the output of the "describe" command, you need to ensure it lists "COMPRESSION => 'snappy'"
          Hide
          Nicholas Telford added a comment -

          Moved Compression.Algorithm.SNAPPY to end of enum to retain backwards compatibility with existing HFiles.

          Otherwise, patch is same as 003

          Show
          Nicholas Telford added a comment - Moved Compression.Algorithm.SNAPPY to end of enum to retain backwards compatibility with existing HFiles. Otherwise, patch is same as 003
          Hide
          stack added a comment -

          Committed to TRUNK. Thanks Nicolas and Nichole for the patches. Nicolas, I added your howto above to the book into the compression appendix. Thanks.

          Show
          stack added a comment - Committed to TRUNK. Thanks Nicolas and Nichole for the patches. Nicolas, I added your howto above to the book into the compression appendix. Thanks.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #1930 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1930/)

          Show
          Hudson added a comment - Integrated in HBase-TRUNK #1930 (See https://builds.apache.org/hudson/job/HBase-TRUNK/1930/ )
          Hide
          John Heitmann added a comment -

          In the new instructions this:

          COMPRESSION => 'snappy'

          should be this:

          COMPRESSION => 'SNAPPY'

          Show
          John Heitmann added a comment - In the new instructions this: COMPRESSION => 'snappy' should be this: COMPRESSION => 'SNAPPY'
          Hide
          stack added a comment -

          Thanks John. I fixed it in book (We should fix case sensitivity too for compressor names)

          Show
          stack added a comment - Thanks John. I fixed it in book (We should fix case sensitivity too for compressor names)
          Hide
          Chris Waterson added a comment -

          What is the likelihood that this could be back-ported to the 0.90.x branch?

          Show
          Chris Waterson added a comment - What is the likelihood that this could be back-ported to the 0.90.x branch?
          Hide
          stack added a comment -

          @Chris Have you tried the patch on 0.90? Does it work for you?

          Show
          stack added a comment - @Chris Have you tried the patch on 0.90? Does it work for you?
          Hide
          Chris Waterson added a comment -

          Yes, I have. I've applied hbase-snappy-0.90.5.patch and it seems to be working on a small (but heavily loaded) HBase 0.90.6 cluster.

          Show
          Chris Waterson added a comment - Yes, I have. I've applied hbase-snappy-0.90.5.patch and it seems to be working on a small (but heavily loaded) HBase 0.90.6 cluster.
          Hide
          Chris Waterson added a comment -

          Urg, "I've applied hbase-snappy-0.90.6.patch..."

          Show
          Chris Waterson added a comment - Urg, "I've applied hbase-snappy-0.90.6.patch..."
          Hide
          stack added a comment -

          Applied to 0.90 branch.

          Show
          stack added a comment - Applied to 0.90 branch.
          Hide
          Ted Yu added a comment -

          This test failure might be related:

          Running org.apache.hadoop.hbase.util.TestCompressionTest
          Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec <<< FAILURE!
          
          Show
          Ted Yu added a comment - This test failure might be related: Running org.apache.hadoop.hbase.util.TestCompressionTest Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec <<< FAILURE!
          Hide
          stack added a comment -

          Removed snappy test; the plumbing is not usually in place (Thanks for noticing Ted)

          Show
          stack added a comment - Removed snappy test; the plumbing is not usually in place (Thanks for noticing Ted)
          Hide
          stack added a comment -

          Applied addendum to 0.90 branch.

          Show
          stack added a comment - Applied addendum to 0.90 branch.

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              4 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development