|
Owen O'Malley made changes - 17/Dec/08 12:08 AM
This patch removes lzo codec.
Owen O'Malley made changes - 17/Dec/08 12:09 AM
Owen O'Malley made changes - 17/Dec/08 12:23 AM
Owen O'Malley made changes - 17/Dec/08 06:15 AM
Based on the benchmarks done by the QuickLz guys at http://www.quicklz.com/
Times to compress and decompress 1gb using the quicklz benchmark numbers: The fastlz guy has benchmarks showing he's faster decompressing than lzf.
YMMV, but either look fine. If we could find something that has a command-line executable that is already distributed with linux that might be a tiebreaker, but I don't see any such. Or if we could find a Java implementation of either. There's a java LZF at: http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/ This is under EPL and MPL, both category B in http://www.apache.org/legal/3party.html I can't find a java implementation of fastlz, but we could probably write one if we wanted. There's not much code there. So I guess this tilts things in favor of lzf? Integrated in Hadoop-trunk #698 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/698/
Tsz Wo (Nicholas), SZE made changes - 06/Jan/09 07:18 PM
> BTW, are lzf and fastlz also block based (as LZO) or stream based (as GZIP)?
Dunno. There's not much code to them, so it should be easy to find out. Does it matter much? We block things in the container file format anyway.
Nigel Daley made changes - 23/Apr/09 07:17 PM
I know this issue is closed, but I was wondering if anyone might be interested in Java version of fastlz. I read through C code, and it seems simple enough to convert easily
Tatu - please open a new jira for fastlz and attach your patch there... thanks!
William Kinney made changes - 30/Oct/09 10:32 PM
William Kinney made changes - 30/Oct/09 10:33 PM
William Kinney made changes - 30/Oct/09 10:34 PM
Actually, I only now had time to spend on this: and ended up testing LZF (http://oldhome.schmorp.de/marc/liblzf.html
Turns out LZF is pretty good at speed, although one has to be careful with choosing good buffer sizes, hash table size, and ideally reuse buffers too if possible. If so, it can be bit faster on decompression, and a lot faster on compression. Numbers I saw (this is just initial testing) indicated up to twice as fast compression, and maybe 30% faster decompress. Compression ratio is not as good; whereas gzip would give raties of 81/93/97% (for content size of 2k/20k/200k), LZF would give 66/72/72% (ie. compresses down to 34/28/28% of original). Which is still pretty good of course. These with JSON data. LZF is block-based algorithm just like all others, including gzip, and is about as easy to wrap in input/output streams. I hope to find time to actually wrap existing code into bit better packaging (wrt buffer reuse and other optimizations). If so, it could be a reusable component. That may take some time, but in the meantime, source link above allows others to try out code as well if they want to. Tatu, we'd really appreciate if you could open a jira for LZF and contribute a patch... thanks!
Ok, I created HADOOP-6389 specifically for LZF.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We might move the lzo codec to a Sourceforge project, under GPL, so that folks can still get it.
Also, we can replace lzo with something like http://www.fastlz.org/
.