Hadoop Common
  1. Hadoop Common
  2. HADOOP-6349

Implement FastLZCodec for fastlz/lzo algorithm

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: io
    • Labels:
      None
    • Tags:
      compression lzo fastlz io

      Description

      Per HADOOP-4874, FastLZ is a good (speed, license) alternative to LZO.

      1. hadoop-6349-4.patch
        107 kB
        Nicholas Carlini
      2. hadoop-6349-3.patch
        109 kB
        Nicholas Carlini
      3. hadoop-6349-2.patch
        72 kB
        Nicholas Carlini
      4. hadoop-6349-1.patch
        73 kB
        Eli Collins
      5. TestCodecPerformance.java
        8 kB
        Todd Lipcon
      6. HADOOP-6349-TestFastLZCodec.patch
        8 kB
        William Kinney
      7. HADOOP-6349.patch
        58 kB
        William Kinney
      8. TestCodecPerformance.java
        9 kB
        William Kinney
      9. testCodecPerfResults.tsv
        6 kB
        William Kinney

        Issue Links

          Activity

          Hide
          Ismael Juma added a comment -

          Thanks, that's good to know.

          Show
          Ismael Juma added a comment - Thanks, that's good to know.
          Hide
          Todd Lipcon added a comment -

          I think this has largely been superceded by the recent work on Snappy

          Show
          Todd Lipcon added a comment - I think this has largely been superceded by the recent work on Snappy
          Hide
          Ismael Juma added a comment -

          What is the status of this? Is anyone still working on it?

          Show
          Ismael Juma added a comment - What is the status of this? Is anyone still working on it?
          Hide
          Nicholas Carlini added a comment -

          Fixed the buffering issues. There is still work to do here, though. In the case where compress() is called with an array that is big enough, then instead of compressing to a temporary buffer and then to the byte array given, it should compress directly to that buffer if possible. It is possible to not do this if the compressor was changed to keep its state and resume compression from the middle, but that seems like it would be more work than it's worth, at very little cost. This would, however, mean that the buffer size used by the CompressorStream would need to be around 64k so that the majority of the time bytes could be written directly to it. (That applies to decompression, too.)

          Fixed a bug where it was possible for the end of stream mark to show up in the wrong place.

          Fixed another bug from the compressor where calling write(int) on the output stream n times would add 47*n bytes to the output stream. This is because each time write() is called, so is compress(), which means 26 bytes for a header block and 16 bytes of a header, and then 1 byte of uncompressed data. Fixed this by adding another case where if the buffer has fewer than 2^16 (default block size) bytes, then it'll return true from needsInput, and so it won't compress.

          Changed the way that the test codec performance will call write() several times instead of writing all at once. It now calls write() with a random length until it's out of input.

          Got rid of the uses of BigInteger ...

          Removed the moved code ... no idea why it was ever there.

          Made abbreviations in comments real words.

          FASTLZ_STRICT_ALIGN mode when set to false doesn't even work. Deleted.

          FASTLZ_SAFE mode set to false has no performance increases. All it does is make a few if statements happen (which don't have any loops or anything). Deleted.

          At some point, the header blocks (with block ID 1) should get removed. They have no purpose and just remain from the port from 6pack.c. Maybe even make the header smaller by removing the ID now that all blocks will have ID of 17. And the 'options' are only one bit for compressed or just uncompressible data.

          Show
          Nicholas Carlini added a comment - Fixed the buffering issues. There is still work to do here, though. In the case where compress() is called with an array that is big enough, then instead of compressing to a temporary buffer and then to the byte array given, it should compress directly to that buffer if possible. It is possible to not do this if the compressor was changed to keep its state and resume compression from the middle, but that seems like it would be more work than it's worth, at very little cost. This would, however, mean that the buffer size used by the CompressorStream would need to be around 64k so that the majority of the time bytes could be written directly to it. (That applies to decompression, too.) Fixed a bug where it was possible for the end of stream mark to show up in the wrong place. Fixed another bug from the compressor where calling write(int) on the output stream n times would add 47*n bytes to the output stream. This is because each time write() is called, so is compress(), which means 26 bytes for a header block and 16 bytes of a header, and then 1 byte of uncompressed data. Fixed this by adding another case where if the buffer has fewer than 2^16 (default block size) bytes, then it'll return true from needsInput, and so it won't compress. Changed the way that the test codec performance will call write() several times instead of writing all at once. It now calls write() with a random length until it's out of input. Got rid of the uses of BigInteger ... Removed the moved code ... no idea why it was ever there. Made abbreviations in comments real words. FASTLZ_STRICT_ALIGN mode when set to false doesn't even work. Deleted. FASTLZ_SAFE mode set to false has no performance increases. All it does is make a few if statements happen (which don't have any loops or anything). Deleted. At some point, the header blocks (with block ID 1) should get removed. They have no purpose and just remain from the port from 6pack.c. Maybe even make the header smaller by removing the ID now that all blocks will have ID of 17. And the 'options' are only one bit for compressed or just uncompressible data.
          Hide
          Nicholas Carlini added a comment -

          I believe I've fixed the buffering problems with very few changes to the compressor and decompressor. I'll need to verify that these changes didn't break anything else, but it seems like a simple issue to fix.

          Show
          Nicholas Carlini added a comment - I believe I've fixed the buffering problems with very few changes to the compressor and decompressor. I'll need to verify that these changes didn't break anything else, but it seems like a simple issue to fix.
          Hide
          Greg Roelofs added a comment -

          OK, with respect to the 20100730 patch, the biggest issue by far is the buffering, both compression and decompression, as we've discussed in person:

          • since FastLZ is block-based with an uncompressed block size of 64 KB, it should never require a buffer larger than 64 KB for either input or output, and certainly never the entirety of the file
            • it sounds like a (relatively) quick fix is possible, but otherwise it might be worth looking at SplittableCompressionCodec and/or BlockCompressorStream, per comments below

          The other items are mostly style-related:

          • move imported code to src/contrib, per offline discussion with Owen O'Malley
          • boilerplate should probably mention original MIT license
          • no src/java/core-default.xml change? (add to io.compression.codecs)
          • [cosmetic: nuke commented-out System.currentTimeMillis(), debug printfs, etc.]
          • [cosmetic: strip all trailing blanks on new, non-3rdparty/imported (i.e., non-src/contrib) files]
          • prefer to drop "J" prefix on fastlz/*.java (consistent with C, other codecs)
          • add package.html
            • 64 KB block size
            • mainly level 2 with fallback to level 1 for partial blocks
            • weird triple-self-inclusion of fastlz.c
            • possible future speed/etc. improvements:
              • use existing Adler-32 in java.util.zip or zlib or wherever?
              • parallel Adler-32 (and CRC-32) capability in zlib, pigz
          • FastLZCodec.java
            • originally was going to suggest that it should extend DefaultCodec (like GzipCodec) rather than copy it, but as a block-based codec, Splittable*Codec is better: we'll file separate JIRA for splittable support later (also/alternatively, per Arun Murthy, BlockCompressorStream?)
          • JFastLZ.java
            • DEFAULT_ARCHIVE_EXTENSION not used
            • readU32() BigInteger version should be nuked in favor of (fixed) readU32Fast() (not used much in any case, but...ewww)
            • why isn't all of "Moved." block (line 686) commented out/deleted?
            • invert "Never runs b/c" block so true path is first
              • does !FASTLZ_STRICT_ALIGN option even work? if not, nuke altogether
            • no need for double parens on while condition (line 768)
          • JFastLZCompressor.java
            • lose empty else in static block
            • "v. low" -> "very low", "b/c" -> "because", etc. (in general, avoid abbreviations in comments; English isn't everyone's primary language)
            • had hoped b/iBuff arraycopy ("so CompressorStream doesn't muck w/ our data") was unnecessary, but doesn't look like it: b[] comes straight from user via write() call
            • "XXX" ?
            • growOutputBuffer() used for apparently tiny deltas: should simply double size (unless strong statistical or theoretical arg why small increments are rare) [this is subsumed by whole fix-the-buffering issue above]
            • use "L" for long-constant appendage: lowercase looks too much like "1"
            • [cosmetic: need spaces on outside of for-loop and if-conditional parens]
            • why isn't "if (totalCompressed == 26)" block (line 356) deleted?
            • "enought"
            • looks like native checksum call is wasted if incompressible data: should be moved down to other half of "if (!nativeLoaded)" conditional (line 416) (more symmetric that way, too)
            • [cosmetic: prefer variables at top of class (SAFE_PACK)]
          • JFastLZDecompressor.java
            • [cosmetic: wrap long lines (native methods)]
            • lose empty else in static block
            • "v." -> "very", etc. (as above)
            • DEFAULT_BLOCK_SIZE hardcoded at 64K ... configured value completely ignored (OK, probably related to "native" FastLZ block size...never mind)

          Btw, this one is clearly much, much simpler than the LZMA patch, so if it comes down to one or the other, please concentrate on LZMA. If necessary, I (or someone) can pick this up and finish it without much trouble, I think.

          Show
          Greg Roelofs added a comment - OK, with respect to the 20100730 patch, the biggest issue by far is the buffering, both compression and decompression, as we've discussed in person: since FastLZ is block-based with an uncompressed block size of 64 KB, it should never require a buffer larger than 64 KB for either input or output, and certainly never the entirety of the file it sounds like a (relatively) quick fix is possible, but otherwise it might be worth looking at SplittableCompressionCodec and/or BlockCompressorStream, per comments below The other items are mostly style-related: move imported code to src/contrib, per offline discussion with Owen O'Malley boilerplate should probably mention original MIT license no src/java/core-default.xml change? (add to io.compression.codecs) [cosmetic: nuke commented-out System.currentTimeMillis(), debug printfs, etc.] [cosmetic: strip all trailing blanks on new, non-3rdparty/imported (i.e., non-src/contrib) files] prefer to drop "J" prefix on fastlz/*.java (consistent with C, other codecs) add package.html 64 KB block size mainly level 2 with fallback to level 1 for partial blocks weird triple-self-inclusion of fastlz.c possible future speed/etc. improvements: use existing Adler-32 in java.util.zip or zlib or wherever? parallel Adler-32 (and CRC-32) capability in zlib, pigz FastLZCodec.java originally was going to suggest that it should extend DefaultCodec (like GzipCodec) rather than copy it, but as a block-based codec, Splittable*Codec is better: we'll file separate JIRA for splittable support later (also/alternatively, per Arun Murthy, BlockCompressorStream?) JFastLZ.java DEFAULT_ARCHIVE_EXTENSION not used readU32() BigInteger version should be nuked in favor of (fixed) readU32Fast() (not used much in any case, but...ewww) why isn't all of "Moved." block (line 686) commented out/deleted? invert "Never runs b/c" block so true path is first does !FASTLZ_STRICT_ALIGN option even work? if not, nuke altogether no need for double parens on while condition (line 768) JFastLZCompressor.java lose empty else in static block "v. low" -> "very low", "b/c" -> "because", etc. (in general, avoid abbreviations in comments; English isn't everyone's primary language) had hoped b/iBuff arraycopy ("so CompressorStream doesn't muck w/ our data") was unnecessary, but doesn't look like it: b[] comes straight from user via write() call "XXX" ? growOutputBuffer() used for apparently tiny deltas: should simply double size (unless strong statistical or theoretical arg why small increments are rare) [this is subsumed by whole fix-the-buffering issue above] use "L" for long-constant appendage: lowercase looks too much like "1" [cosmetic: need spaces on outside of for-loop and if-conditional parens] why isn't "if (totalCompressed == 26)" block (line 356) deleted? "enought" looks like native checksum call is wasted if incompressible data: should be moved down to other half of "if (!nativeLoaded)" conditional (line 416) (more symmetric that way, too) [cosmetic: prefer variables at top of class (SAFE_PACK)] JFastLZDecompressor.java [cosmetic: wrap long lines (native methods)] lose empty else in static block "v." -> "very", etc. (as above) DEFAULT_BLOCK_SIZE hardcoded at 64K ... configured value completely ignored (OK, probably related to "native" FastLZ block size...never mind) Btw, this one is clearly much, much simpler than the LZMA patch, so if it comes down to one or the other, please concentrate on LZMA. If necessary, I (or someone) can pick this up and finish it without much trouble, I think.
          Hide
          Nicholas Carlini added a comment -

          Another patch. There is still debug code scattered about (commented out), as I might need to put it to use at some point. This code isn't tested as well as the last patch.

          Adds support for native compression/decompression. Native compression is 230% faster than java. Native decompression is 70% faster than java.

          Somewhat-large redesign of the compressor. Compression is now fifty times faster when compressing around 64MB. The compressor used to keep in memory all input it had previously processed, and arraycopy it to a new array every time it needed more space, so through the process of compressing 64MB of data calling write every 64k, it would end up copying ~32GB through memory (this is how it was for my test case). Instead compress 128MB of data and write every 1k, and you copy 8.8TB through memory.

          Also modified compressor to include an end-of-stream marker. This way the decompressor can set to "finished" so the stream can return -1. The end of stream mark is indicated by setting the four unused bytes after the input size to high in the last chunk of length 0. By this way, any decompressor which does not support the end of stream marker will never read those bytes and will just decompress an empty block and not notice anything is wrong.

          Adds another method to TestCodecPerformance which haves it load a (relatively small) input file to memory, and from it generate 64MB of data to compress. (It does this by taking random substrings from 16 to 128 bytes at random offsets until there are 64MB.) It then directly compresses the 64MB from memory to memory and times that. These times seem to be more reflective than timing the compression of "key %d value %d" and of timing the compression of random data. Right now this mode is enabled by calling it with the -input flag.

          Ported code for Adler32 to C, uses it when using native libraries.

          Added a constant in the compressor to allow for uncompressible data to instead be copied over byte for byte. This decreases the speed of the compressor by ~10% as it results in another memcpy, but it can more than double the speed of decompression.

          Here's what the new part of the test codec performance gives when given a log file. For comparison: DefaultCodec gets the size down to 11% and the BZip2Codec down to 8%.

          Previous patch:
          10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB.
          10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original).
          10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total compression time: 381868 ms (1716 KBps).
          10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total decompression time: 5051 ms (126 MBps).

          Current patch:
          Native C:
          10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB.
          10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original).
          10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total compression time: 3314 ms (193 MBps).
          10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total decompression time: 2861 ms (223 MBps).

          Current patch:
          Pure Java:
          10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB.
          10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original).
          10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total compression time: 7891 ms (81 MBps).
          10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total decompression time: 5077 ms (126 MBps).

          Show
          Nicholas Carlini added a comment - Another patch. There is still debug code scattered about (commented out), as I might need to put it to use at some point. This code isn't tested as well as the last patch. Adds support for native compression/decompression. Native compression is 230% faster than java. Native decompression is 70% faster than java. Somewhat-large redesign of the compressor. Compression is now fifty times faster when compressing around 64MB. The compressor used to keep in memory all input it had previously processed, and arraycopy it to a new array every time it needed more space, so through the process of compressing 64MB of data calling write every 64k, it would end up copying ~32GB through memory (this is how it was for my test case). Instead compress 128MB of data and write every 1k, and you copy 8.8TB through memory. Also modified compressor to include an end-of-stream marker. This way the decompressor can set to "finished" so the stream can return -1. The end of stream mark is indicated by setting the four unused bytes after the input size to high in the last chunk of length 0. By this way, any decompressor which does not support the end of stream marker will never read those bytes and will just decompress an empty block and not notice anything is wrong. Adds another method to TestCodecPerformance which haves it load a (relatively small) input file to memory, and from it generate 64MB of data to compress. (It does this by taking random substrings from 16 to 128 bytes at random offsets until there are 64MB.) It then directly compresses the 64MB from memory to memory and times that. These times seem to be more reflective than timing the compression of "key %d value %d" and of timing the compression of random data. Right now this mode is enabled by calling it with the -input flag. Ported code for Adler32 to C, uses it when using native libraries. Added a constant in the compressor to allow for uncompressible data to instead be copied over byte for byte. This decreases the speed of the compressor by ~10% as it results in another memcpy, but it can more than double the speed of decompression. Here's what the new part of the test codec performance gives when given a log file. For comparison: DefaultCodec gets the size down to 11% and the BZip2Codec down to 8%. Previous patch: 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB. 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original). 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total compression time: 381868 ms (1716 KBps). 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total decompression time: 5051 ms (126 MBps). Current patch: Native C: 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB. 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original). 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total compression time: 3314 ms (193 MBps). 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total decompression time: 2861 ms (223 MBps). Current patch: Pure Java: 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB. 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original). 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total compression time: 7891 ms (81 MBps). 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total decompression time: 5077 ms (126 MBps).
          Hide
          Nicholas Carlini added a comment -

          Attached an update patch.

          Fixed the checksum mismatch. It was possible for the decompressor to run out of input after reading the header bytes but not notice if the block ID was 1. So if there were fewer than 26 bytes in the input (but more than 16) and the byte ID was 1 then it wouldn't notice and just use whatever happened to be in the buffer at the time.

          Fixed a bug in the decompressor where it would incorrectly indicate it was finished if at the end of decompressing a block there was no more input left to decompress and decompress() was then called again (TestCodec seed 1333275328, 2011623221, -1402938700 or -1990280158; generate 50,000 records). Actually, the decompressor never returns finished now. This is because the only time the decompressor should return true is if it somehow knows the end of the stream has been reached and it doesn't, it just guesses that if it has read all the bytes it currently has then it's done, which is not the case.

          Implemented getRemaining().

          Removed iOff from both the compressor and decompressor. It was initialized to zero from the start and was only ever modified after that by setting it to 0.

          Modified TestCodec to accept a seed as an argument.

          Removed the rest of the carriage returns.

          I will be adding a native version over the next few days and will upload that patch when it's done.

          Show
          Nicholas Carlini added a comment - Attached an update patch. Fixed the checksum mismatch. It was possible for the decompressor to run out of input after reading the header bytes but not notice if the block ID was 1. So if there were fewer than 26 bytes in the input (but more than 16) and the byte ID was 1 then it wouldn't notice and just use whatever happened to be in the buffer at the time. Fixed a bug in the decompressor where it would incorrectly indicate it was finished if at the end of decompressing a block there was no more input left to decompress and decompress() was then called again (TestCodec seed 1333275328, 2011623221, -1402938700 or -1990280158; generate 50,000 records). Actually, the decompressor never returns finished now. This is because the only time the decompressor should return true is if it somehow knows the end of the stream has been reached and it doesn't, it just guesses that if it has read all the bytes it currently has then it's done, which is not the case. Implemented getRemaining(). Removed iOff from both the compressor and decompressor. It was initialized to zero from the start and was only ever modified after that by setting it to 0. Modified TestCodec to accept a seed as an argument. Removed the rest of the carriage returns. I will be adding a native version over the next few days and will upload that patch when it's done.
          Hide
          Eli Collins added a comment -

          Here's some performance data I collected a while back. I compressed and uncompressed a 135mb file (~ hdfs block size) of json data using the default command line utility for the codec, compiled with -Wall -O3 -fomit-frame-pointer on a nehalem-based system. I used an in-memory file system so the times were stable and took the best of 5 runs. These should be taken with a grain of salt since the same command line utility was not used with each codec. According to [1] zippy is 22% faster than lzo, fastlz is also 22% faster than lzo in the data below so ballpark performance is reasonable. We could optimize the native version further, curious to see what the overhead of calling out to jni is.

          codec fastlz(1) fastlz(2) quicklz(1) quicklz(2) quicklz(3) lzo (default) lzo (-9) lzf
          size 70m 63M 58M 49M 48M 61M 47M 69M
          compress 1.092s 1.113s 0.913s 1.409s 5.343s 1.414s 20.495s 1.110s
          decompress 0.649s 0.639s 0.697s 0.699s 0.486s 0.630s 0.665s 0.729s

          1. http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff

          Show
          Eli Collins added a comment - Here's some performance data I collected a while back. I compressed and uncompressed a 135mb file (~ hdfs block size) of json data using the default command line utility for the codec, compiled with -Wall -O3 -fomit-frame-pointer on a nehalem-based system. I used an in-memory file system so the times were stable and took the best of 5 runs. These should be taken with a grain of salt since the same command line utility was not used with each codec. According to [1] zippy is 22% faster than lzo, fastlz is also 22% faster than lzo in the data below so ballpark performance is reasonable. We could optimize the native version further, curious to see what the overhead of calling out to jni is. codec fastlz(1) fastlz(2) quicklz(1) quicklz(2) quicklz(3) lzo (default) lzo (-9) lzf size 70m 63M 58M 49M 48M 61M 47M 69M compress 1.092s 1.113s 0.913s 1.409s 5.343s 1.414s 20.495s 1.110s decompress 0.649s 0.639s 0.697s 0.699s 0.486s 0.630s 0.665s 0.729s 1. http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff
          Hide
          Eli Collins added a comment -

          Patch attached, this was as far as I got before getting distracted.

          • Re-based on trunk. Cleans up carriage returns and indenting, removes most gratuitous uses of "this"
          • Fixes some bugs in TestCodecPerformance.java
          • Got JFastLZCompressor working, at least for the seeds that were breaking (the finished method was incorrect)
          • Made some progress on JFastLZDecompressor but found a seed (see the one used in TestCodecPerformance) that causes a checksum missmatch and getRemaining needs to be implemented.
          Show
          Eli Collins added a comment - Patch attached, this was as far as I got before getting distracted. Re-based on trunk. Cleans up carriage returns and indenting, removes most gratuitous uses of "this" Fixes some bugs in TestCodecPerformance.java Got JFastLZCompressor working, at least for the seeds that were breaking (the finished method was incorrect) Made some progress on JFastLZDecompressor but found a seed (see the one used in TestCodecPerformance) that causes a checksum missmatch and getRemaining needs to be implemented.
          Hide
          Nicholas Carlini added a comment -

          Eli - did you make an updated patch? If you haven't that's okay – I can rebase it on trunk if you haven't yet.

          Show
          Nicholas Carlini added a comment - Eli - did you make an updated patch? If you haven't that's okay – I can rebase it on trunk if you haven't yet.
          Hide
          Eli Collins added a comment -

          I don't think so. I pinged William a while back picked up the patch and fixed a bug or two but haven't had the cycles recently, will try to rebase on trunk and post soon. Would be awesome to have you pick this up.

          Show
          Eli Collins added a comment - I don't think so. I pinged William a while back picked up the patch and fixed a bug or two but haven't had the cycles recently, will try to rebase on trunk and post soon. Would be awesome to have you pick this up.
          Hide
          Nicholas Carlini added a comment -

          Is anyone currently working on this? I'm currently working on adding LZMA compression (HADOOP-6837), but after I finish that if no one else is then I'll work on this.

          Show
          Nicholas Carlini added a comment - Is anyone currently working on this? I'm currently working on adding LZMA compression ( HADOOP-6837 ), but after I finish that if no one else is then I'll work on this.
          Hide
          Todd Lipcon added a comment -

          The old TestCodecPerformance depended on RandomDatum which no longer exists. This new version just uses java.util.Random. However, I'm getting the following exception:

          Exception in thread "main" java.io.IOException: write beyond end of stream
          at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:61)
          at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
          at java.io.DataOutputStream.write(DataOutputStream.java:90)
          at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
          at org.apache.hadoop.io.compress.TestCodecPerformance.runCodecPerformance(TestCodecPerformance.java:161)
          at org.apache.hadoop.io.compress.TestCodecPerformance.runCodecPerformances(TestCodecPerformance.java:91)
          at org.apache.hadoop.io.compress.TestCodecPerformance.main(TestCodecPerformance.java:271)

          (the other codecs work fine, so don't think it's a harness issue. Just commented out the others at the moment.)

          Show
          Todd Lipcon added a comment - The old TestCodecPerformance depended on RandomDatum which no longer exists. This new version just uses java.util.Random. However, I'm getting the following exception: Exception in thread "main" java.io.IOException: write beyond end of stream at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:61) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.hadoop.io.compress.TestCodecPerformance.runCodecPerformance(TestCodecPerformance.java:161) at org.apache.hadoop.io.compress.TestCodecPerformance.runCodecPerformances(TestCodecPerformance.java:91) at org.apache.hadoop.io.compress.TestCodecPerformance.main(TestCodecPerformance.java:271) (the other codecs work fine, so don't think it's a harness issue. Just commented out the others at the moment.)
          Hide
          William Kinney added a comment -

          Todd found my issue, no -O flag for gcc. Results:

          File 1 - 152MB text:
          Pack:
          java/jfastlz: 2.616 s
          c/fastlz: 0.973 s
          Unpack:
          java/jfastlz: 1.710 s
          c/fastlz: 0.901 s

          File 2 - 511MB binary:
          Pack:
          java/jfastlz: 3.221 s
          c/fasltz: 1.644 s
          Unpack:
          java/jfastlz: 3.611 s
          c/fastlz: 2.121 s

          Show
          William Kinney added a comment - Todd found my issue, no -O flag for gcc. Results: File 1 - 152MB text: Pack: java/jfastlz: 2.616 s c/fastlz: 0.973 s Unpack: java/jfastlz: 1.710 s c/fastlz: 0.901 s File 2 - 511MB binary: Pack: java/jfastlz: 3.221 s c/fasltz: 1.644 s Unpack: java/jfastlz: 3.611 s c/fastlz: 2.121 s
          Hide
          Todd Lipcon added a comment -

          Wait, Java is faster than C? Something seems wrong here... not that Java can't be fast, but I've never seen it outperform C for stuff like this.

          Show
          Todd Lipcon added a comment - Wait, Java is faster than C? Something seems wrong here... not that Java can't be fast, but I've never seen it outperform C for stuff like this.
          Hide
          William Kinney added a comment -

          Lazy test of 5 iterations (discard max and min values), using `time` command, average 'real' value:

          File 1 - 152MB text:
          Pack:
          java/jfastlz: 2.4998 s
          c/fastlz: 2.4432 s
          Unpack:
          java/jfastlz: 1.5800 s
          c/fastlz: 1.8510 s

          File 2 - 511MB binary:
          Pack:
          java/jfastlz: 3.1514 s
          c/fasltz: 4.3208 s
          Unpack:
          java/jfastlz: 3.0174 s
          c/fastlz: 4.3452 s

          JVM: Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)
          gcc: gcc (GCC) 4.3.2 20081105 (Red Hat 4.3.2-7)
          Kernel: 2.6.27.24-170.2.68.fc10.x86_64
          1 x E5405, 8GB RAM, 10kRPM SAS RAID 5 (~70MB/s hdparm -t)

          The tests of course include file i/o operations in the time results, so it is not a strict comparison of the pack/unpack performance.

          Show
          William Kinney added a comment - Lazy test of 5 iterations (discard max and min values), using `time` command, average 'real' value: File 1 - 152MB text: Pack: java/jfastlz: 2.4998 s c/fastlz: 2.4432 s Unpack: java/jfastlz: 1.5800 s c/fastlz: 1.8510 s File 2 - 511MB binary: Pack: java/jfastlz: 3.1514 s c/fasltz: 4.3208 s Unpack: java/jfastlz: 3.0174 s c/fastlz: 4.3452 s JVM: Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode) gcc: gcc (GCC) 4.3.2 20081105 (Red Hat 4.3.2-7) Kernel: 2.6.27.24-170.2.68.fc10.x86_64 1 x E5405, 8GB RAM, 10kRPM SAS RAID 5 (~70MB/s hdparm -t) The tests of course include file i/o operations in the time results, so it is not a strict comparison of the pack/unpack performance.
          Hide
          Matei Zaharia added a comment -

          Out of curiosity, William, do you know how the pure-Java FastLZ's performance compares to the C version's?

          Show
          Matei Zaharia added a comment - Out of curiosity, William, do you know how the pure-Java FastLZ's performance compares to the C version's?
          Hide
          William Kinney added a comment -

          Fixed issue and uploaded new patch: HADOOP-6349.patch. Was incorrectly reading past allowed input length during decompression.

          HADOOP-6349-TestFastLZCodec.patch - Tests FastLZCodec with the two known bug producing seeds Todd mentioned, in addition to testing 2k random seeds and data lengths (10k to 110k). I ran this with 50k iterations w/o issue.

          Show
          William Kinney added a comment - Fixed issue and uploaded new patch: HADOOP-6349 .patch. Was incorrectly reading past allowed input length during decompression. HADOOP-6349 -TestFastLZCodec.patch - Tests FastLZCodec with the two known bug producing seeds Todd mentioned, in addition to testing 2k random seeds and data lengths (10k to 110k). I ran this with 50k iterations w/o issue.
          Hide
          Todd Lipcon added a comment -

          There seems to be a bug in the code somewhere - I set the count = 100000 in the TestCodec class (and commented out all the other codecs but fastlz) and it now crashes with a heap space error. If I run the test with -Xmx4g, it crashes with an EOFException. The seed -400483164 reproduces this reliably. Another seed that produces a different stack trace is -717812827

          We should have a "slow" version of TestCodec which iterates through a few thousand different random seeds and random data lengths to ensure that the codec is robust.

          Show
          Todd Lipcon added a comment - There seems to be a bug in the code somewhere - I set the count = 100000 in the TestCodec class (and commented out all the other codecs but fastlz) and it now crashes with a heap space error. If I run the test with -Xmx4g, it crashes with an EOFException. The seed -400483164 reproduces this reliably. Another seed that produces a different stack trace is -717812827 We should have a "slow" version of TestCodec which iterates through a few thousand different random seeds and random data lengths to ensure that the codec is robust.
          Hide
          William Kinney added a comment -

          Patch contains new files FastLZCodec, JFastLZ, JFastLZCompressor and JFastLZDecompressor. Also contains a modification to TestCodec, to add FastLZCodec to the test suite.

          JFastLZ is a pure java port of FastLZ.

          testCodecPerfResults.tsv is a TSV result from a simple performance test to show the speed and pack ratio of FastLZ as compared to the other Codecs. TestCodecPerformance.java is the code which generated the results.

          Show
          William Kinney added a comment - Patch contains new files FastLZCodec, JFastLZ, JFastLZCompressor and JFastLZDecompressor. Also contains a modification to TestCodec, to add FastLZCodec to the test suite. JFastLZ is a pure java port of FastLZ. testCodecPerfResults.tsv is a TSV result from a simple performance test to show the speed and pack ratio of FastLZ as compared to the other Codecs. TestCodecPerformance.java is the code which generated the results.
          Hide
          Hong Tang added a comment -

          Jira related to this: HADOOP-4874.

          Show
          Hong Tang added a comment - Jira related to this: HADOOP-4874 .
          Hide
          Doug Cutting added a comment -

          The other codec to consider is LZF, since there's also a pure-Java implementation:

          http://oldhome.schmorp.de/marc/liblzf.html

          http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/

          Show
          Doug Cutting added a comment - The other codec to consider is LZF, since there's also a pure-Java implementation: http://oldhome.schmorp.de/marc/liblzf.html http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/

            People

            • Assignee:
              Unassigned
              Reporter:
              William Kinney
            • Votes:
              9 Vote for this issue
              Watchers:
              37 Start watching this issue

              Dates

              • Created:
                Updated:

                Development