Details
-
Improvement
-
Status: Open
-
Low
-
Resolution: Unresolved
-
None
-
None
Description
A common question from users of compression is "which block size should I use". Until we figure out how to auto-tune the block size (or use something like zstd dictionary training), it might be useful to ship a tool similar to the one aweisberg created (gist mirror) for CASSANDRA-13241 that users could point at an existing sstable and it would output expected ratios for that sstable re-compressed with either different block sizes or a different compressor all together. For example maybe something like:
$ /cassandra/tools/bin/sstable-compression-estimate <foo> Compressor | Chunk Size | Ratio | Read Speed | Off-Heap Memory | ---------------------------------------------------------------- LZ4 | 4096 | 0.54 | 0.2 ms | 100kb | LZ4 | 8192 | 0.46 | 0.3 ms | 50kb | LZ4 | 16384 | 0.42 | 0.3 ms | 24kb | LZ4 | 32768 | 0.38 | 0.4 ms | 12kb | LZ4 | 65536 | 0.35 | 0.8 ms | 6kb | ---------------------------------------------------------------- Zstd | 4096 | 0.40 | 0.3 ms | 100kb | Zstd | 8192 | 0.34 | 0.4 ms | 50kb | Zstd | 16384 | 0.25 | 0.5 ms | 24kb | ...