Issue Details (XML | Word | Printable)

Key: HADOOP-4874
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Owen O'Malley
Reporter: Owen O'Malley
Votes: 0
Watchers: 11
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Remove bindings to lzo

Created: 16/Dec/08 12:34 AM   Updated: Yesterday 10:31 PM
Return to search
Component/s: io
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works h4874.patch 2008-12-17 12:09 AM Owen O'Malley 117 kB
Issue Links:
Reference

Hadoop Flags: Reviewed
Resolution Date: 17/Dec/08 06:15 AM


 Description  « Hide
It looks like the lzo bindings are infected by lzo's GPL and must be removed from Hadoop.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doug Cutting added a comment - 16/Dec/08 01:08 AM
Should we file an issue with http://issues.apache.org/jira/browse/LEGAL to double-check this?

We might move the lzo codec to a Sourceforge project, under GPL, so that folks can still get it.

Also, we can replace lzo with something like http://www.fastlz.org/.


Owen O'Malley made changes - 17/Dec/08 12:08 AM
Field Original Value New Value
Link This issue is related to HADOOP-4887 [ HADOOP-4887 ]
Owen O'Malley added a comment - 17/Dec/08 12:09 AM
This patch removes lzo codec.

Owen O'Malley made changes - 17/Dec/08 12:09 AM
Attachment h4874.patch [ 12396257 ]
Arun C Murthy added a comment - 17/Dec/08 12:13 AM
+1 (sigh! smile)

Owen O'Malley made changes - 17/Dec/08 12:23 AM
Hadoop Flags [Reviewed]
Status Open [ 1 ] Patch Available [ 10002 ]
Repository Revision Date User Message
ASF #727294 Wed Dec 17 06:13:53 UTC 2008 omalley HADOOP-4874. Remove LZO codec because of licensing issues. (omalley)
Files Changed
DEL /hadoop/core/trunk/src/native/src/org/apache/hadoop/io/compress/lzo
MODIFY /hadoop/core/trunk/src/native/lib/Makefile.in
MODIFY /hadoop/core/trunk/src/native/src/org/apache/hadoop/io/compress/zlib/Makefile.in
MODIFY /hadoop/core/trunk/src/docs/src/documentation/content/xdocs/native_libraries.xml
MODIFY /hadoop/core/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
MODIFY /hadoop/core/trunk/src/test/org/apache/hadoop/io/compress/TestCodec.java
MODIFY /hadoop/core/trunk/src/native/configure.ac
DEL /hadoop/core/trunk/src/test/org/apache/hadoop/mapred/TestLzoTextInputFormat.java
MODIFY /hadoop/core/trunk/src/native/configure
MODIFY /hadoop/core/trunk/src/native/aclocal.m4
MODIFY /hadoop/core/trunk/src/native/Makefile.in
DEL /hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/LzoCodec.java
MODIFY /hadoop/core/trunk/src/test/org/apache/hadoop/io/TestSequenceFile.java
MODIFY /hadoop/core/trunk/src/test/org/apache/hadoop/io/FileBench.java
MODIFY /hadoop/core/trunk/src/native/Makefile.am
DEL /hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/lzo
MODIFY /hadoop/core/trunk/CHANGES.txt
DEL /hadoop/core/trunk/src/core/org/apache/hadoop/io/compress/LzopCodec.java
DEL /hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/LzoTextInputFormat.java
MODIFY /hadoop/core/trunk/src/native/config.h.in
MODIFY /hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml

Owen O'Malley added a comment - 17/Dec/08 06:15 AM
I just committed this.

Owen O'Malley made changes - 17/Dec/08 06:15 AM
Resolution Fixed [ 1 ]
Status Patch Available [ 10002 ] Resolved [ 5 ]
Repository Revision Date User Message
ASF #727434 Wed Dec 17 16:50:53 UTC 2008 omalley HADOOP-4874. Remove LZO codec because of licensing issues. (omalley)
Files Changed
MODIFY /hadoop/core/branches/branch-0.20/src/native/src/org/apache/hadoop/io/compress/zlib/Makefile.in
MODIFY /hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/native_libraries.xml
DEL /hadoop/core/branches/branch-0.20/src/native/src/org/apache/hadoop/io/compress/lzo
MODIFY /hadoop/core/branches/branch-0.20/src/native/lib/Makefile.in
MODIFY /hadoop/core/branches/branch-0.20/src/test/org/apache/hadoop/io/compress/TestCodec.java
MODIFY /hadoop/core/branches/branch-0.20/src/native/configure.ac
DEL /hadoop/core/branches/branch-0.20/src/test/org/apache/hadoop/mapred/TestLzoTextInputFormat.java
MODIFY /hadoop/core/branches/branch-0.20/src/native/aclocal.m4
MODIFY /hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
MODIFY /hadoop/core/branches/branch-0.20/src/native/configure
MODIFY /hadoop/core/branches/branch-0.20/src/test/org/apache/hadoop/io/TestSequenceFile.java
MODIFY /hadoop/core/branches/branch-0.20/src/native/Makefile.in
DEL /hadoop/core/branches/branch-0.20/src/core/org/apache/hadoop/io/compress/LzoCodec.java
MODIFY /hadoop/core/branches/branch-0.20/src/test/org/apache/hadoop/io/FileBench.java
MODIFY /hadoop/core/branches/branch-0.20/CHANGES.txt
DEL /hadoop/core/branches/branch-0.20/src/core/org/apache/hadoop/io/compress/LzopCodec.java
DEL /hadoop/core/branches/branch-0.20/src/mapred/org/apache/hadoop/mapred/LzoTextInputFormat.java
MODIFY /hadoop/core/branches/branch-0.20/src/native/Makefile.am
DEL /hadoop/core/branches/branch-0.20/src/core/org/apache/hadoop/io/compress/lzo
MODIFY /hadoop/core/branches/branch-0.20/src/docs/src/documentation/content/xdocs/site.xml
MODIFY /hadoop/core/branches/branch-0.20/src/native/config.h.in

Owen O'Malley added a comment - 17/Dec/08 05:46 PM
Based on the benchmarks done by the QuickLz guys at http://www.quicklz.com/, it looks like fastlz, which has a usable mit license, or liblzf, which has a bsd license, may be the best replacement for lzo. (Quicklz claims to be faster than either, but it is gpl too.)

Times to compress and decompress 1gb using the quicklz benchmark numbers:
quicklz (gpl): 3.8 + 3.5 = 7.3 secs; 47.9%
lzf (bsd): 5.8 + 2.9 = 8.7 secs; 51.9%
fastlz (mit): 6.3 + 2.6 = 8.9 secs; 50.7%
lzo (gpl): 6.6 + 2.5 = 9.1 secs; 48.3%
zlib: 23.2 + 6.6 = 29.8 secs; 37.6%


Doug Cutting added a comment - 17/Dec/08 06:31 PM
The fastlz guy has benchmarks showing he's faster decompressing than lzf.

http://www.fastlz.org/lzf.htm

YMMV, but either look fine. If we could find something that has a command-line executable that is already distributed with linux that might be a tiebreaker, but I don't see any such. Or if we could find a Java implementation of either.

There's a java LZF at:

http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/

This is under EPL and MPL, both category B in http://www.apache.org/legal/3party.html.

I can't find a java implementation of fastlz, but we could probably write one if we wanted. There's not much code there. So I guess this tilts things in favor of lzf?


Hudson added a comment - 22/Dec/08 03:15 PM

Tsz Wo (Nicholas), SZE made changes - 06/Jan/09 07:18 PM
Link This issue is related to HADOOP-4949 [ HADOOP-4949 ]
Hong Tang added a comment - 01/Feb/09 04:46 AM
besides speed, other factors may also matter, such as compression ratio, decompression speed, memory footprint, etc.

BTW, are lzf and fastlz also block based (as LZO) or stream based (as GZIP)?


Doug Cutting added a comment - 02/Feb/09 07:07 PM
> BTW, are lzf and fastlz also block based (as LZO) or stream based (as GZIP)?

Dunno. There's not much code to them, so it should be easy to find out. Does it matter much? We block things in the container file format anyway.


Nigel Daley made changes - 23/Apr/09 07:17 PM
Status Resolved [ 5 ] Closed [ 6 ]
Tatu Saloranta added a comment - 08/May/09 11:33 PM
I know this issue is closed, but I was wondering if anyone might be interested in Java version of fastlz. I read through C code, and it seems simple enough to convert easily to Java. I am thinking of trying to do that for other purposes (on-the-fly xml/json compression); but if there was interest by others that could be a reusable component.

Arun C Murthy added a comment - 08/May/09 11:48 PM
Tatu - please open a new jira for fastlz and attach your patch there... thanks!

Tatu Saloranta added a comment - 09/May/09 05:09 AM
Thanks, will do.

William Kinney made changes - 30/Oct/09 10:32 PM
Link This issue is blocked by HADOOP-6349 [ HADOOP-6349 ]
William Kinney made changes - 30/Oct/09 10:33 PM
Link This issue is blocked by HADOOP-6349 [ HADOOP-6349 ]
William Kinney made changes - 30/Oct/09 10:34 PM
Link This issue relates to HADOOP-6349 [ HADOOP-6349 ]
Tatu Saloranta added a comment - 24/Nov/09 07:35 AM
Actually, I only now had time to spend on this: and ended up testing LZF (http://oldhome.schmorp.de/marc/liblzf.html), ported by H2 team (http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/).
Turns out LZF is pretty good at speed, although one has to be careful with choosing good buffer sizes, hash table size, and ideally reuse buffers too if possible. If so, it can be bit faster on decompression, and a lot faster on compression.
Numbers I saw (this is just initial testing) indicated up to twice as fast compression, and maybe 30% faster decompress.
Compression ratio is not as good; whereas gzip would give raties of 81/93/97% (for content size of 2k/20k/200k), LZF would give 66/72/72% (ie. compresses down to 34/28/28% of original). Which is still pretty good of course.
These with JSON data.

LZF is block-based algorithm just like all others, including gzip, and is about as easy to wrap in input/output streams.

I hope to find time to actually wrap existing code into bit better packaging (wrt buffer reuse and other optimizations). If so, it could be a reusable component. That may take some time, but in the meantime, source link above allows others to try out code as well if they want to.


Arun C Murthy added a comment - 24/Nov/09 07:50 AM
Tatu, we'd really appreciate if you could open a jira for LZF and contribute a patch... thanks!

Tatu Saloranta added a comment - 24/Nov/09 10:31 PM
Ok, I created HADOOP-6389 specifically for LZF.