Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
2.0.4-alpha, 3.0.0-alpha1
-
None
-
[german@localhost lz4-read-only]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 10
CPU MHz: 2667.000
BogoMIPS: 5319.82
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 2048K
NUMA node0 CPU(s): 0-3[german@localhost lz4-read-only]$ uname -r
2.6.32-358.14.1.el6.x86_64[german@localhost lz4-read-only] $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 23 Stepping: 10 CPU MHz: 2667.000 BogoMIPS: 5319.82 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 2048K NUMA node0 CPU(s): 0-3 [german@localhost lz4-read-only] $ uname -r 2.6.32-358.14.1.el6.x86_64
Description
While analyzing compression performance of different Hadoop codecs I noticed that the LZ4 code was taken from revision 43 of https://code.google.com/p/lz4/. The latest version is r98 and there may be extra performance benefits we can gain from using r98.
We may involve the original LZ4 author Yann Collet on these discussions, as the current LZ4 code includes additional algorithms and parameters.
To start the investigation, I ran preliminary experiments with the Silesia corpus and there seems to be an improvement on throughput for compression and decompression in the latest release when compared with r43 (haven't done enough analysis to conclude anything statistically, but looks good).
Here is raw output using LZ4 from r43 with a SUBSET of the silesia corpus (http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia)
File: silesia/dickens
-
-
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Compressed 10192446 bytes into 6433123 bytes ==> 63.12%
Done in 0.07 s ==> 138.86 MB/s - Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Successfully decoded 10192446 bytes
Done in 0.02 s ==> 486.01 MB/s
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
-
File: silesia/mozilla
-
-
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Compressed 51220480 bytes into 26379814 bytes ==> 51.50%
Done in 0.25 s ==> 195.39 MB/s - Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Successfully decoded 51220480 bytes
Done in 0.12 s ==> 407.06 MB/s
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
-
File: silesia/mr
-
-
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Compressed 9970564 bytes into 5669268 bytes ==> 56.86%
Done in 0.04 s ==> 237.72 MB/s - Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Successfully decoded 9970564 bytes
Done in 0.02 s ==> 475.43 MB/s
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
-
File: silesia/nci
-
-
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Compressed 33553445 bytes into 5880292 bytes ==> 17.53%
Done in 0.08 s ==> 399.99 MB/s - Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
Successfully decoded 33553445 bytes
Done in 0.06 s ==> 533.32 MB/s
- Compression CLI using LZ4 algorithm , by Yann Collet (Jul 29 2013) ***
-
And here raw output of LZ4 from the latest release r98
File: silesia/dickens
-
-
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
Loading silesia/dickens...
1-LZ4_compress : 10192446 ->^M1-LZ4_compress : 10192446 -> 6434313 (63.13%), 172.3 MB/s
1-LZ4_decompress_fast : 10192446 ->^M1-LZ4_decompress_fast : 10192446 -> 676.0 MB/s^MLZ4_decompress_fast : 10192446 -> 676.0 MB/s
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
-
File: silesia/mozilla
-
-
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
Loading silesia/mozilla...
1-LZ4_compress : 51220480 ->^M1-LZ4_compress : 51220480 -> 26382113 (51.51%), 281.7 MB/s
1-LZ4_decompress_fast : 51220480 ->^M1-LZ4_decompress_fast : 51220480 -> 1003.1 MB/s^MLZ4_decompress_fast : 51220480 -> 1003.1 MB/s
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
-
File: silesia/mr
-
-
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
Loading silesia/mr...
1-LZ4_compress : 9970564 ->^M1-LZ4_compress : 9970564 -> 5669255 (56.86%), 268.3 MB/s
1-LZ4_decompress_fast : 9970564 ->^M1-LZ4_decompress_fast : 9970564 -> 788.7 MB/s^MLZ4_decompress_fast : 9970564 -> 788.7 MB/s
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
-
File: silesia/nci
-
-
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
Loading silesia/nci...
1-LZ4_compress : 33553445 ->^M1-LZ4_compress : 33553445 -> 5883923 (17.54%), 584.9 MB
1-LZ4_decompress_fast : 33553445 ->^M1-LZ4_decompress_fast : 33553445 -> 1208.3 MB/s^MLZ4_decompress_fast : 33553445 -> 1208.3 MB/s
- Full LZ4 speed analyzer , by Yann Collet (Jul 29 2013) ***
-
Attachments
Issue Links
- duplicates
-
HADOOP-9319 Update bundled lz4 source to latest version
-
- Closed
-