Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It looks like the lzo bindings are infected by lzo's GPL and must be removed from Hadoop.

      1. h4874.patch
        117 kB
        Owen O'Malley

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          Should we file an issue with http://issues.apache.org/jira/browse/LEGAL to double-check this?

          We might move the lzo codec to a Sourceforge project, under GPL, so that folks can still get it.

          Also, we can replace lzo with something like http://www.fastlz.org/.

          Show
          Doug Cutting added a comment - Should we file an issue with http://issues.apache.org/jira/browse/LEGAL to double-check this? We might move the lzo codec to a Sourceforge project, under GPL, so that folks can still get it. Also, we can replace lzo with something like http://www.fastlz.org/ .
          Hide
          Owen O'Malley added a comment -

          This patch removes lzo codec.

          Show
          Owen O'Malley added a comment - This patch removes lzo codec.
          Hide
          Arun C Murthy added a comment -

          +1 (sigh! smile)

          Show
          Arun C Murthy added a comment - +1 (sigh! smile )
          Hide
          Owen O'Malley added a comment -

          I just committed this.

          Show
          Owen O'Malley added a comment - I just committed this.
          Hide
          Owen O'Malley added a comment -

          Based on the benchmarks done by the QuickLz guys at http://www.quicklz.com/, it looks like fastlz, which has a usable mit license, or liblzf, which has a bsd license, may be the best replacement for lzo. (Quicklz claims to be faster than either, but it is gpl too.)

          Times to compress and decompress 1gb using the quicklz benchmark numbers:
          quicklz (gpl): 3.8 + 3.5 = 7.3 secs; 47.9%
          lzf (bsd): 5.8 + 2.9 = 8.7 secs; 51.9%
          fastlz (mit): 6.3 + 2.6 = 8.9 secs; 50.7%
          lzo (gpl): 6.6 + 2.5 = 9.1 secs; 48.3%
          zlib: 23.2 + 6.6 = 29.8 secs; 37.6%

          Show
          Owen O'Malley added a comment - Based on the benchmarks done by the QuickLz guys at http://www.quicklz.com/ , it looks like fastlz, which has a usable mit license, or liblzf, which has a bsd license, may be the best replacement for lzo. (Quicklz claims to be faster than either, but it is gpl too.) Times to compress and decompress 1gb using the quicklz benchmark numbers: quicklz (gpl): 3.8 + 3.5 = 7.3 secs; 47.9% lzf (bsd): 5.8 + 2.9 = 8.7 secs; 51.9% fastlz (mit): 6.3 + 2.6 = 8.9 secs; 50.7% lzo (gpl): 6.6 + 2.5 = 9.1 secs; 48.3% zlib: 23.2 + 6.6 = 29.8 secs; 37.6%
          Hide
          Doug Cutting added a comment -

          The fastlz guy has benchmarks showing he's faster decompressing than lzf.

          http://www.fastlz.org/lzf.htm

          YMMV, but either look fine. If we could find something that has a command-line executable that is already distributed with linux that might be a tiebreaker, but I don't see any such. Or if we could find a Java implementation of either.

          There's a java LZF at:

          http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/

          This is under EPL and MPL, both category B in http://www.apache.org/legal/3party.html.

          I can't find a java implementation of fastlz, but we could probably write one if we wanted. There's not much code there. So I guess this tilts things in favor of lzf?

          Show
          Doug Cutting added a comment - The fastlz guy has benchmarks showing he's faster decompressing than lzf. http://www.fastlz.org/lzf.htm YMMV, but either look fine. If we could find something that has a command-line executable that is already distributed with linux that might be a tiebreaker, but I don't see any such. Or if we could find a Java implementation of either. There's a java LZF at: http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/ This is under EPL and MPL, both category B in http://www.apache.org/legal/3party.html . I can't find a java implementation of fastlz, but we could probably write one if we wanted. There's not much code there. So I guess this tilts things in favor of lzf?
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #698 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/698/ )
          Hide
          Hong Tang added a comment -

          besides speed, other factors may also matter, such as compression ratio, decompression speed, memory footprint, etc.

          BTW, are lzf and fastlz also block based (as LZO) or stream based (as GZIP)?

          Show
          Hong Tang added a comment - besides speed, other factors may also matter, such as compression ratio, decompression speed, memory footprint, etc. BTW, are lzf and fastlz also block based (as LZO) or stream based (as GZIP)?
          Hide
          Doug Cutting added a comment -

          > BTW, are lzf and fastlz also block based (as LZO) or stream based (as GZIP)?

          Dunno. There's not much code to them, so it should be easy to find out. Does it matter much? We block things in the container file format anyway.

          Show
          Doug Cutting added a comment - > BTW, are lzf and fastlz also block based (as LZO) or stream based (as GZIP)? Dunno. There's not much code to them, so it should be easy to find out. Does it matter much? We block things in the container file format anyway.
          Hide
          Tatu Saloranta added a comment -

          I know this issue is closed, but I was wondering if anyone might be interested in Java version of fastlz. I read through C code, and it seems simple enough to convert easily to Java. I am thinking of trying to do that for other purposes (on-the-fly xml/json compression); but if there was interest by others that could be a reusable component.

          Show
          Tatu Saloranta added a comment - I know this issue is closed, but I was wondering if anyone might be interested in Java version of fastlz. I read through C code, and it seems simple enough to convert easily to Java. I am thinking of trying to do that for other purposes (on-the-fly xml/json compression); but if there was interest by others that could be a reusable component.
          Hide
          Arun C Murthy added a comment -

          Tatu - please open a new jira for fastlz and attach your patch there... thanks!

          Show
          Arun C Murthy added a comment - Tatu - please open a new jira for fastlz and attach your patch there... thanks!
          Hide
          Tatu Saloranta added a comment -

          Thanks, will do.

          Show
          Tatu Saloranta added a comment - Thanks, will do.
          Hide
          Tatu Saloranta added a comment -

          Actually, I only now had time to spend on this: and ended up testing LZF (http://oldhome.schmorp.de/marc/liblzf.html), ported by H2 team (http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/).
          Turns out LZF is pretty good at speed, although one has to be careful with choosing good buffer sizes, hash table size, and ideally reuse buffers too if possible. If so, it can be bit faster on decompression, and a lot faster on compression.
          Numbers I saw (this is just initial testing) indicated up to twice as fast compression, and maybe 30% faster decompress.
          Compression ratio is not as good; whereas gzip would give raties of 81/93/97% (for content size of 2k/20k/200k), LZF would give 66/72/72% (ie. compresses down to 34/28/28% of original). Which is still pretty good of course.
          These with JSON data.

          LZF is block-based algorithm just like all others, including gzip, and is about as easy to wrap in input/output streams.

          I hope to find time to actually wrap existing code into bit better packaging (wrt buffer reuse and other optimizations). If so, it could be a reusable component. That may take some time, but in the meantime, source link above allows others to try out code as well if they want to.

          Show
          Tatu Saloranta added a comment - Actually, I only now had time to spend on this: and ended up testing LZF ( http://oldhome.schmorp.de/marc/liblzf.html ), ported by H2 team ( http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/ ). Turns out LZF is pretty good at speed, although one has to be careful with choosing good buffer sizes, hash table size, and ideally reuse buffers too if possible. If so, it can be bit faster on decompression, and a lot faster on compression. Numbers I saw (this is just initial testing) indicated up to twice as fast compression, and maybe 30% faster decompress. Compression ratio is not as good; whereas gzip would give raties of 81/93/97% (for content size of 2k/20k/200k), LZF would give 66/72/72% (ie. compresses down to 34/28/28% of original). Which is still pretty good of course. These with JSON data. LZF is block-based algorithm just like all others, including gzip, and is about as easy to wrap in input/output streams. I hope to find time to actually wrap existing code into bit better packaging (wrt buffer reuse and other optimizations). If so, it could be a reusable component. That may take some time, but in the meantime, source link above allows others to try out code as well if they want to.
          Hide
          Arun C Murthy added a comment -

          Tatu, we'd really appreciate if you could open a jira for LZF and contribute a patch... thanks!

          Show
          Arun C Murthy added a comment - Tatu, we'd really appreciate if you could open a jira for LZF and contribute a patch... thanks!
          Hide
          Tatu Saloranta added a comment -

          Ok, I created HADOOP-6389 specifically for LZF.

          Show
          Tatu Saloranta added a comment - Ok, I created HADOOP-6389 specifically for LZF.

            People

            • Assignee:
              Owen O'Malley
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development