Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6216

[C++] Allow user to select the compression level

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.15.0
    • C++

    Description

      The compression level selected in Arrow for ZSTD is 1 which is the minimal compression level for the compressor. This leads to very high compression speed at the sacrifice of compression ratio.

      The user should be allowed to select the compression level as both speed and ratio are data specific.

      The proposed solution is to expose the knob via an environment variable such as ARROW_ZSTD_COMPRESSION_LEVEL.
      Example:
      export ARROW_ZSTD_COMPRESSION_LEVEL=10
      ./my_parquet_app

      Here is a test run with compression levels of 1, 2 and 5:
      Level   Time (s)   Size (mb)
      1          13.02       181
      2          13.10       177
      5          19.44       148

      Attachments

        Issue Links

          Activity

            People

              martinradev Martin Radev
              martinradev Martin Radev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 2h Original Estimate - 2h
                  2h
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 13h 40m
                  13h 40m