Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2132

Support Quantile Compression q_compress column codec

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Quantile Compression (https://github.com/mwlon/quantile-compression) is a recent but stable compression algorithm for numerical sequences that averages 35%+ higher compression ratio than the next best codec (zstd), given the same compression time. It has fairly fast decompression speed, close to that of zstd. Compared to Parquet's built-in PFor-like integer compression algorithm, it achieves a much higher compression ratio at slower speed. Adding q_compress as a column codec for all numerical columns could substantially reduce the size of most Parquet files.

      q_compress is implemented in Rust, which has good interop with C++ and can run in JVM via JNI (e.g. https://github.com/pancake-db/pancake-scala-client).

      Attachments

        Activity

          People

            Unassigned Unassigned
            mwlon Martin Loncaric
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: