Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.1.0
-
None
-
ghx-label-3
Description
Currently we set the Deflater mode to BEST_COMPRESSION by default.
public static byte[] deflateCompress(byte[] input) { if (input == null) return null; ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length); // TODO: Benchmark other compression levels. DeflaterOutputStream stream = new DeflaterOutputStream(bos, new Deflater(Deflater.BEST_COMPRESSION));
In some experiments, we noticed that the fastest compression mode (BEST_SPEED) performs ~8x faster with only ~4% compression ratio penalty.
Here are some results on a real world table with 3000 partitions with incremental stats.
Time taken for serialization (seconds) | OutputBytes size (MB) | |
Gzip best compression | 92 | 194 |
Gzip fastest compression | 11 | 212 |
Gzip default compression | 57 | 195 |
No compression | 5 | 452 |